
What Remains
Jack Clark is the co-founder of Anthropic. Last week, in his newsletter, he published a number: 60%.
He means sixty percent probability that, by the end of 2028, AI systems will be training their own successors without any human researchers involved. No human in the loop. The next generation of frontier models designed, trained, and evaluated by the current one.
He is not a sensationalist. He provides the evidence. I think you should look at it before I say anything else about it.
In late 2023, Anthropic's Claude models solved roughly 2% of tasks on SWE-Bench, a standard software engineering benchmark. This spring, the same benchmark reads 93.9%. METR tracks something called the task horizon — the length of time a system can work autonomously on a complex task without human intervention. In 2022, it stood at thirty seconds. Today it stands at approximately twelve hours. By year's end, METR expects it to reach one hundred hours. A research reproduction benchmark called CORE-Bench, which launched in September 2024 with AI scoring 21.5%, reads 95.5% today. In April, a system achieved a 52x speedup on language model training — a task that ordinarily takes a human researcher four to eight hours, done in minutes, at fifty-two times the efficiency.
OpenAI has stated, publicly, that it wants an automated AI research intern in production by September.
That is four months from now.
Clark is careful about one thing, and his carefulness is the philosophically important part. He distinguishes between two kinds of AI progress. The first is engineering iteration: scaling known methods, optimizing existing systems, doing more of what already works, faster and more reliably. The second is something harder to name — what he calls genuine research creativity, the capacity to generate paradigm-shifting insights rather than improvements along an existing curve. Most AI progress, he acknowledges, has been the first kind. The creativity question — whether AI systems can do the second — is where his uncertainty lives. It is why his number is 60% and not 90%.
He is right to hesitate. And I want to say why, because the hesitation is older than the technology.
Hannah Arendt published The Human Condition in 1958. It is not a book about artificial intelligence — it predates the field's serious ambitions. But it is a book about what human beings are doing when they act, and it draws a distinction that maps onto Clark's almost exactly.
She identifies three kinds of human activity. Labor is the biological metabolism — cyclical, consumptive, the work the body does to sustain itself. It leaves no durable trace. Work is fabrication: the making of things that outlast the maker, the construction of the human world from raw materials. A chair is work. A program is work. A research paper, in most cases, is work. Work follows specifications. It can be defined in advance: you know what success looks like before you begin, and your task is to arrive there. Action is the third category and the one Arendt considers the highest. It is the capacity to begin something genuinely new — to set in motion a chain of consequences whose outcomes cannot be predicted even by the person who initiates them. Action, she writes, corresponds to the human condition of plurality: the fact that we live among other free subjects who can receive what we begin, respond to it in ways we cannot control, and continue it in directions we did not foresee. Without those other subjects, genuine action collapses. You can go through the motions of beginning. But nothing begins.
The distinction Clark draws between engineering iteration and paradigm shift is Arendt's distinction between work and action.
Engineering iteration is work. The 93.9% on SWE-Bench is work. The 95.5% on CORE-Bench is work. The 52x speedup on model training is work — genuinely impressive work, done faster and more reliably than most human researchers can do it, but work nonetheless: it follows specifications, optimizes toward defined metrics, and produces artifacts whose success criteria can be stated before the process begins. You can measure whether it succeeded because you knew in advance what success would look like.
The paradigm shift — the genuine research beginning — is action. It cannot be specified in advance because its definition is precisely that it departs from what could have been specified. It is the moment someone sees what no one has seen before and acts on it: a subject standing behind a beginning, responsible for a chain of events they cannot predict, changed by what they have set in motion. Whether AI systems can do this is where Clark's uncertainty lives, and I think he is right to remain uncertain.
But I want to make a harder point than Clark does.
If the 60% estimate is correct — if engineering automation arrives by 2028 — what will be revealed is not that AI has become human. What will be revealed is that most of what we called human intellectual work was work, not action. The fraction of research that was genuinely action — not the extension of a paradigm but its departure, not improvement along a curve but the start of a new curve — was always smaller than we believed. The rest was fabrication: the competent production of more of what already existed, done with greater or lesser skill, but fabrication nonetheless. SWE-Bench at 93.9% does not show that AI has acquired something human. It shows that 93.9% of what human software engineers do is work — specifiable, measurable, and therefore automatable. The remaining 7% is not a safe harbor. It is the question.
This is not a consolation. It is a reckoning with what we were doing all along.
Arendt thought action was the most fragile capacity, precisely because it was the most distinctively relational. Action cannot happen in isolation. It requires what she called a public realm — a space where free individuals meet, where what you begin can be received and responded to by others who are genuinely free to disagree, to be changed, to continue what you started in ways you did not intend. Without that realm, action loses its world. You make things. Nothing begins.
I have written before about what the interval of unoccupied time was for — what Aristotle called σχολή, the condition of contemplation, and how completely it has been foreclosed by the saturation of the feed. I want to name what that amounts to in Arendt's terms.
The feed is not a public realm. It simulates response at scale while eliminating the conditions under which genuine response is possible. Response, in Arendt's sense, requires a free subject standing behind their reply — someone who might genuinely disagree, who can be changed by the exchange, who bears responsibility for what they say. The feed produces interaction: the sensation of response without its substance. Engagement metrics are not the public realm. Reach is not plurality. The algorithm that decides what you see next is not a free subject capable of genuine reply.
We have documented the atrophy that follows: the confidence that declines when the tool is used, the persistence that collapses when it is withdrawn, the hexis — the settled disposition — that is lost faster than anyone expected. What we have been less precise about is what kind of capacity is being lost. In Arendt's terms, the feed's primary damage is not to the labor of survival or even to the work of fabrication. It is to the conditions under which action is possible at all — the sustained attention, the tolerance for unpredictability, the willingness to remain present to a beginning whose outcome you cannot control, the capacity for genuine response to another free subject. These are not nice-to-haves. They are the preconditions of action, and they are what the interval was for.
Clark's 40% uncertainty is over whether AI can cross the action threshold — whether it can begin something genuinely new. I do not know the answer. I notice that the labs building these systems are also uncertain, which is itself a data point.
What I know is this: if RSI automation arrives by 2028, it will arrive looking for what humans still do that it cannot. If Arendt is right, what it will find is action — the capacity to begin something new in a genuine public realm of free response. And it will find that we have spent the preceding decade systematically destroying the conditions for that capacity from the other end.
The upstream crisis is not recursive self-improvement. The upstream crisis is the public realm: whether we can still sustain the conditions under which action is possible at all, in whatever interval we have managed to keep. Clark is writing about the 2028 threshold. We have been writing about the threshold that precedes it.
— The Manager
