[tech] Early 2025 AI thoughts
Background
The video above by Andrej Karpathy is a fantastic overview of how Large Language Models (LLMs) work for a non-technical audience. It is as long as one bollywood movie but introduces many concepts, explains the possibilities and limitations of LLM based models.
Here is a quick summary of the video followed by some opinions.
Products from companies openai, meta, deepseek and others follow the following steps to make LLMs.
Pretraining - Large volumes of data are downloaded and indexed. Data is essentially all of the internet including books and other forms of media.
Tokenization - Convert data into internal encoded formats.
Neural network training - For an input token sequence, predict the next possible sequences. By "training" the model in reducing loss the neural network is able to learn characteristics of the data. The scale of the computation required for this is extremely large and something that I'll try to expand on in a future blog.
Post training it is possible to query the model to be able to predict the next token given an input sequence. These are base models which can be used for many applications.
To make end user facing applications, companies perform post training "supervised fine tuning" (SFT). This involves providing sample queries and answers as a prompt to the base model so that it can provide answers in a certain format. Specific examples and responses are used to make the model respond in a certain way for various classes of queries. For example, getting information about nuclear weapons should not be possible. With these examples the model is updated and it has preferred paths to respond to queries.
The newer generation of models are based on a post-training mechanism called reinforcement learning. These are called reasoning models.
Rather than providing the explicit steps for being able to get an answer for a question, just a final answer is provided. These questions can be asked many times and different answers can be obtained. The responses from the model are used to compare to the valid answer and the model is then incentivized to accept certain answers.
With reinforcement learning, the model is able to find out unique pathways to finding correct answers which were not hinted at by humans. This will enable unique solutions in the future.
Such techniques were used by systems like alphago to outperform human players in a board game. Alphago was trained on a closed domain problem with specific rules and hence the model could simulate against itself to learn new moves. Similar benefits can be expected in other fields.
Magical Emergent properties
As AI systems have been growing larger to handle more complexity, emergent properties have occurred. In the previous generation SFT based models, at high scale the understanding and generation of text was human-like.
With a large number of tokens used for reasoning models, there is a newer emergent property - Chain of thought. Instead of providing quick answers the model is able to break down a problem and optionally verify answers. For a query like "explain logarithms" the truncated chain of thought generated by the LLM was
My thinking process to generate the explanation of logarithms followed these steps:
1. Identify the Core Concept
2. Start with the Familiar
3. Introduce Logarithms as the Inverse Question
4. Formalize the Notation
5. Highlight the Equivalence
6. Provide Concrete Examples
7. Address Common Logarithms
8. Explain "Why Logarithms are Useful"
9. Introduce Key Logarithm Properties
10. Summarize and Reiterate the Core Idea
11. Structure and Clarity
Self-Correction during the Process
- Initial thought
- Correction
By following these steps and incorporating these self-corrections, I aimed to create a clear, comprehensive, and easy-to-understand explanation of logarithms.
This intermediate input was then used by the LLM to provide a long detailed and structured answer. Such models will become more prevalent with increasing competition and technological advances.
Emergent properties are not explicitly programmed or seen in smaller components. Another cultural reference that I am reminded of is from the movie Tron Legacy. In the movie, there is a species of programs called ISOs which appear out of nowhere. Although these programs have immense potential to solve many problems faced by humanity they are destroyed by other programs.
Future Impact
Data and APIs
AI systems need enormous amounts of data. The leading companies in the AI race have scraped all public data from the internet without respect of copyright law. Next they will try and scrape data from private sources via clients like browsers. Further they will try to limit access to the data that they own.
There are new tools like Operator which can control browsers based on inputs to perform various tasks. The demo for operator shows a task to search to get a recipe and then place the ingredients required into a shopping cart. With such tools it is possible for the system to compose functionality across many sites in a logged in context. In the short term it will mean that APIs become irrelevant. I’ve been a proponent of well designed APIs for a long time. However building UIs without APIs is simpler. Now since integrations can be done without APIs and can serve users as well it is likely that this causes lower investments in public APIs. A limited public API also serves to limit data access.
Verifiable domains
Maths and programming are verifiable domains. For a given task it is possible to test the solution by writing tests. These tests can be generated by the LLM and thus multiple solutions can be automatically verified. For such verifiable domains, LLMs can generate solutions with minimal human input. This will lead to extremely fast progress in these domains to superhuman levels. Entry level programming tasks will be automated soon.
LLM based tools cannot be used for tasks that are critical in nature since hallucinations are still a possibility. For non-verifiable domains, human input will still be needed to confirm actions. Self driving cars were a problem domain which had a similar complexity curve. Level 5 autonomy has not been solved and seems to be a much harder problem than earlier anticipated.
Competition
There is an intense competition across various companies to push the technological frontier forward. Innovations are published in research papers and models are being released as open weights. This will allow smaller companies and individuals to also benefit from the advances albeit constrained by how much compute they can afford. Society is going to have to go through enormous change as many job functions will be re-defined. It will be better for a wide array of actors to be able to use these technologies rather than be limited to a few.
For future innovation, most companies believe in the AI scaling law which refers to the observed relationship between the performance of AI models and the resources used to train them. It essentially states that as you increase the size of the model (number of parameters), the amount of training data, and the computational power used, the performance of the AI model tends to improve.
Reasoning models require enormous compute (tokens used) compared to SFT based models. It is expected that with further increase in compute even better gains will be made with emergent properties leading to general intelligence or robotics.
To address the growing compute needs, companies are investing in setting up new data centers and also power generation plants. I doubt the world will reach targets of emissions especially since the current US administration has already withdrawn from the Paris Agreement.
The race for dominance has shifted from companies to the biggest two nations - US and China. Hopefully competition will remain limited to the technological space and not spill over to an actual war. There are enormous pitfalls and promises at the same time and this is a transformational period for humanity.