The Shifting Sands of Software: From Code to AI-Powered Autonomy

The landscape of software is undergoing a fundamental transformation, marking a pivotal moment for those entering the industry. Andrej Karpathy, former Director of AI at Tesla, highlights this shift, emphasizing that software is changing “again,” a phenomenon he asserts hasn’t occurred at such a foundational level in 70 years. This evolution presents immense opportunities for writing and rewriting vast amounts of software.

Karpathy introduces a new classification of software, building upon his earlier concept of “Software 2.0.”

Software 1.0: This refers to the traditional code written directly by humans for computers. An example within Tesla’s autopilot was the extensive use of C++ code.
Software 2.0: This paradigm is characterized by neural networks, where the “code” is essentially the weights of the network, not directly written but rather tuned through data sets and optimizers. Karpathy notes that as Tesla’s autopilot improved, much of the functionality originally written in Software 1.0 migrated to Software 2.0, effectively “eating through the software stack”. Platforms like Hugging Face and Model Atlas serve as the “GitHub” for Software 2.0, allowing visualization and sharing of these neural network parameters.
Software 3.0: The latest and most fundamental shift, according to Karpathy, is the advent of large language models (LLMs) becoming programmable. In this new paradigm, prompts written in natural language, specifically English, act as programs for the LLM. This marks a new kind of computer and programming language.

LLMs: Utilities, Fabs, and Operating Systems

Karpathy explores various analogies to understand the nature of LLMs:

Utilities: LLMs share characteristics with utilities like electricity. Companies like OpenAI and Gemini invest heavily in “capex” to train LLMs (akin to building a power grid) and “opex” to serve that intelligence via APIs, with users paying per token (metered access). They are expected to provide low latency, high uptime, and consistent quality. Just as one might switch electricity sources, platforms like OpenRouter allow switching between different LLMs. The recent outages of LLMs, which led to an “intelligence brownout,” further highlight their utility-like nature and the world’s growing reliance on them.
Fabs: The substantial capital expenditure required to build LLMs also draws parallels to semiconductor fabrication plants (fabs). This suggests a concentration of deep tech, research, and development secrets within LLM labs.
Operating Systems: Karpathy believes the strongest analogy for LLMs is operating systems. They are not simple commodities but increasingly complex software ecosystems. Similar to how operating systems have closed-source (Windows, Mac OS) and open-source (Linux) alternatives, the LLM ecosystem is seeing competing closed-source providers and open-source alternatives like the LLaMA ecosystem. LLMs function like a new kind of computer, with the LLM itself acting as the CPU and context windows as memory, orchestrating compute for problem-solving. The ability to run LLM applications on different LLM providers (e.g., Cursor on GPT or Gemini) further strengthens this analogy.

Currently, LLM compute is expensive and centralized in the cloud, forcing a “time-sharing” model reminiscent of 1960s computing where users are “thin clients”. The “personal computing revolution” for LLMs has yet to happen, though initial signs like running LLMs on Mac minis offer a glimpse into this future. Karpathy also notes that interacting with an LLM directly via text feels like using a terminal, and a general graphical user interface (GUI) for LLMs is still largely uninvented.

A Shift in Technology Diffusion

A unique characteristic of LLMs, unlike past transformative technologies such as electricity or the internet, is their flipped diffusion pattern. Historically, governments and corporations were the first adopters due to cost and novelty, with diffusion to consumers occurring later. However, LLMs have seen widespread consumer adoption first (e.g., asking how to boil an egg), with corporations and governments lagging behind. This unprecedented accessibility means LLMs are “in the hands of all of us”.

The Psychology and Deficits of LLMs

Karpathy describes LLMs as “stochastic simulations of people,” trained on vast amounts of text, resulting in an emergent human-like psychology. They possess encyclopedic knowledge and memory, far exceeding that of any single human, akin to the savant in the movie Rain Man.

However, LLMs also exhibit “cognitive deficits”:

Hallucinations: They frequently “make up stuff” and lack a robust internal model of self-knowledge.
Jagged Intelligence: They can be superhuman in some problem-solving domains while making basic errors that no human would.
Anterograde Amnesia: Unlike humans who consolidate knowledge over time, LLMs do not natively “get smarter by default.” Their context windows act as “working memory” that needs explicit programming. This is illustrated by analogies to the movies Memento and 50 First Dates, where protagonists’ “weights are fixed and their context windows get wiped every single morning”.
Gullibility/Security Risks: LLMs are susceptible to prompt injection and data leakage.

Despite these deficits, LLMs are “extremely useful,” and the challenge lies in programming them to leverage their superhuman abilities while working around their limitations.

Opportunities: Partial Autonomy Apps and “Vibe Coding”

A significant opportunity lies in developing “partial autonomy apps”. Instead of directly interacting with an LLM like ChatGPT for tasks such as coding, dedicated applications like Cursor offer a more efficient experience. Key properties of effective LLM apps include:

Context Management: LLMs handle a significant amount of context.
Orchestration of Multiple LLMs: Apps orchestrate various LLM calls (e.g., embedding models, chat models, diff application models).
Application-Specific GUI: GUIs are crucial for auditing the work of fallible systems, allowing humans to go faster by visualizing changes (e.g., red/green diffs) and taking actions easily (e.g., command Y to accept).
Autonomy Slider: Users can adjust the level of autonomy given to the LLM, from tap completion to full agentic control, depending on task complexity. Perplexity, a successful LLM app, also demonstrates these features, including source citation and varying levels of research depth.

The goal is for software to become “partially autonomous,” with LLMs being able to see and act in ways humans can, while humans supervise and remain in the loop. This requires changes to traditional software interfaces designed for humans to become accessible to LLMs.

Karpathy emphasizes the importance of a fast “generation-verification loop” when cooperating with AIs. This means speeding up human verification, largely through effective GUIs, and “keeping the AI on the leash”. Large, unmanageable diffs generated by overly autonomous agents are counterproductive, as the human remains the bottleneck for verification and ensuring correctness and security. The analogy of an “overreactive agent” highlights the need for AI assistance that works in small, incremental, and auditable chunks. This often translates to being more concrete in prompts to increase the probability of successful verification.

Another critical shift is the rise of “vibe coding,” where natural language programming makes everyone a potential programmer. Karpathy shares his own experience of building a basic iOS app without knowing Swift, using an LLM to generate the code in a single day. He also built “Menu Genen,” an app that generates images for restaurant menu items, primarily through “vibe coding”. However, he notes that while the code was easy, the “devops stuff” like authentication, payments, and deployment—which involved clicking through browser interfaces designed for humans—was the truly difficult and time-consuming part. This highlights a crucial area for future development: building for agents.

Building for Agents

Karpathy advocates for designing software infrastructure that agents can directly interact with. This new “consumer and manipulator of digital information” is human-like but still a computer.

llm.txt: Similar to robots.txt for web crawlers, an llm.txt file could provide LLMs with a readable markdown summary of a domain, avoiding the error-prone process of parsing HTML.
LLM-Friendly Documentation: Documentation, currently written for humans with lists, bolding, and pictures, needs to be adapted for LLMs. Services like Vercel and Stripe are moving towards offering documentation in markdown, which is easily understood by LLMs. Crucially, documentation should replace “click” instructions with equivalent API calls (e.g., curl commands) that an LLM agent can directly execute.
LLM-Friendly Data Ingestion Tools: Tools that transform human-oriented data (e.g., GitHub repos) into LLM-friendly formats (e.g., concatenating files, providing directory structures) are highly valuable. Deep Wiki, for instance, generates documentation pages for GitHub repos, making them even more useful for LLMs.

While LLMs may eventually be able to navigate GUIs and click elements, meeting them halfway by providing accessible, structured information remains valuable due to current costs and difficulties.

The Iron Man Suit Analogy

Karpathy draws an analogy to the Iron Man suit, which functions as both an augmentation and an autonomous agent. The “autonomy slider” represents the spectrum between building augmentations (where humans are in control) and building fully autonomous agents. At this stage, with fallible LLMs, the focus should be on building “Iron Man suits” – partial autonomy products with custom GUIs and UX that facilitate a rapid human generation-verification loop. While full automation remains a long-term possibility, the immediate future involves gradually sliding the autonomy slider to the right over the next decade.

In conclusion, the industry is entering an “amazing time” where a massive amount of code needs to be rewritten, both by professionals and through new paradigms like “vibe coding”. LLMs, akin to early operating systems, are fallible “people spirits” with whom we must learn to collaborate effectively by adjusting our infrastructure and developing partial autonomy products.

Download Presentation of the talk from here