The Centaur Window

What two amateurs with three PCs understood in 2005 that most of Pharma still hasn't.

Jun 16, 2026

In June 2005, a chess server ran a tournament with an unusual rule: any help was allowed. Enter alone or as a team, bring a grandmaster, bring a supercomputer, or bring four engines and a panel of titled players to drive them; all of it was legal, and the assumption going in was that the trophy would go to whoever showed up with the most compute.

The chess community obliged. Teams of grandmasters entered with the best software money could buy. Hydra, the strongest chess machine on the planet, running on purpose-built hardware evaluating two hundred million moves per second, was entered twice as two separate versions. Neither version survived to the quarter-finals, which was especially awkward given that the tournament’s co-sponsor, the PAL Group of Abu Dhabi, was the same company that had built Hydra.

The tournament was won by two club players from New Hampshire, Steven Cramton and Zackary Stephen, playing under the handle ZackS. Cramton was rated 1685; Stephen was rated 1398, about the strength of a decent high-school player. They ran three ordinary PCs and the same engines anyone could buy in a store: Fritz, Shredder, Junior, Chess Tiger. On the way to the title they knocked out teams led by grandmasters rated a thousand points above them, then beat one more in the final. The result was strange enough that spectators settled on the one explanation that seemed plausible, that ZackS must be Kasparov playing under a fake name, a theory he eventually had to deny in public.

Two days after the final, Hydra sat down in London across from Michael Adams, then among the top ten players in the world, for a six-game match. Adams salvaged a single draw from six games. So in the span of one month, the same machine annihilated one of the best humans alive and failed to get past a pair of amateurs with a plan.

The amateurs’ edge had nothing to do with their tools, since the engines were a commodity and both sides of every board ran the same software. Their edge was a process. They knew which engine to trust in which kind of position, when to let one think for six minutes and when to cut it off, what to do when the machines disagreed. In sharp tactical positions they deferred to the silicon. In quiet strategic ones they trusted their own read and used the engines only to check for blunders. Sometimes they played moves the computers rated as worse, because they judged the position would be miserable for the human across the board. What they were doing, move after move, was running a better process than the grandmasters were.

Kasparov studied the result and compressed it into the closest thing chess has produced to a management formula:

weak human + machine + better process beats a strong computer alone, and, more remarkably, beats a strong human + machine + inferior process.

The second clause is the one worth sitting with. Amateurs beating a supercomputer makes a good story, but amateurs beating grandmasters who were using the same engines makes a controlled experiment. Identical software ran on both sides, the humans were weaker, and the only variable left was the protocol between human and machine. That variable decided the tournament.

The part everyone conveniently leaves out

Chess has a name for a human playing in tandem with a machine: a centaur, after the half-human, half-horse of Greek myth. The centaurs’ triumph at that tournament has been retold from conference stages for twenty years, usually as proof that humans plus AI beat AI alone, full stop, forever. The retellings tend to stop at the trophy ceremony, and they have a good reason to stop there, because the story keeps going and the ending is awkward for the moral: the centaurs lost.

It took longer than you might guess. Centaur teams kept beating lone engines for another decade; as late as 2014, in a freestyle event open to anyone and anything, the centaurs collectively outscored the pure engines 53 games to 42, and the winner was a human-led team. Then the gap closed. In 2017 a freestyle tournament was won outright by an engine running unattended, with the best centaur back in third place, and that December AlphaZero arrived and settled whatever argument remained. In top correspondence chess today, where engine use is openly legal and players have days per move, nearly every game ends in a draw, and the human contribution has been squeezed to the margins of opening prep and hardware budgets. Run Kasparov’s experiment now and the strong computer alone beats everyone, including the centaurs. The formula that stunned him in 2005 held for about a dozen years, and then it stopped being true in chess.

If you are going to build anything on this anecdote, and a fair chunk of the AI industry currently is, you should be honest about how it ends. I am building on it, so let me be precise about what died and what survived.

What killed the centaur was the nature of chess itself. Chess is closed: the rules are fixed, the board is fully visible, and at the end of every game the universe hands you a clean, free verdict. Free verification is what makes self-play possible, and self-play means a machine can play a million games a night and score every one of them without asking a human anything. In a world like that, human judgment is a temporary crutch, and the machine grows past it on schedule.

Call the years before that point the centaur window: the period when machines generate brilliantly but the domain can’t verify cheaply, so human judgment has to judge what machines can’t check. In chess, the window stayed open for roughly fifteen years. How long it stays open anywhere else comes down to a single question: what does it cost to find out you were wrong?

Cheap moves, expensive truth

I’ve spent a good chunk of my career around one idea: don’t replace the expert, make the expert superhuman. It is easy to put on a slide and hard to build, and the freestyle result is the cleanest proof I know that it can work. Whether it keeps working in a given field depends on how expensive the truth is there.

In drug R&D, finding out you were wrong costs years and millions. The rules are incomplete, the board is mostly hidden, and the verdicts arrive late, tangled in noise and confounders. A program can fail because the biology was wrong, because the trial measured the wrong endpoint, because the evidence was thin and nobody pushed on it, or because the founding assumption was inherited from a deck written four years ago by someone who has since left the company. There is no self-play loop when a single meaningful move takes two years to score.

So the centaur window in R&D is doing the opposite of closing. Models have made generation nearly free: hypotheses, literature syntheses, candidate rankings, analysis plans, available at any hour for close to zero marginal cost. Verification still costs what it has always cost. When generation gets cheap while verification stays expensive, ideas stop being scarce, and the constraint moves to judgment about which of forty plausible directions deserves the next two years.

There is an old rule in economics that when an input becomes cheap, value moves to its complements. Engines were already a commodity in 2005, every team at that tournament had the same Fritz and Shredder, and that is exactly why process decided the result: it was the only input left that varied. Frontier models are commoditizing the same way right now; your competitors will run the same ones you do, which means nobody in your field is going to win by having the model, and the input left to vary is the process wrapped around it.

Canvas design is the real hard problem in AI

One more detail in the ZackS story is easy to skip past: there were two of them. The process that won the tournament lived between the two players rather than inside either head. They had to say their reasoning out loud, argue about which engine to believe, and commit to a move together, and that discipline of arguing it out was as much a part of the method as any engine setting.

Now scale that up to where real research decisions get made. The expert is never one person. An indication-expansion decision runs through technical leads, domain specialists, biostatisticians, program owners, commercial and regulatory voices: a dozen people looking at the same opportunity from different angles, usually in different tools, often weeks apart. The reasoning lives in slide decks, notebooks, dashboards, email threads, and somebody’s memory of a meeting in March.

That shared place is what I’ve come to call the canvas: one persistent surface where the people and the models reason together, and where the intent behind a decision, the evidence for it, and the action it produced stay attached to each other instead of scattering across the organization. A few properties matter more than they sound like they should.

Autonomy is a dial rather than a switch. On routine, reversible steps, the models run ahead on their own; on expensive, consequential ones, they slow down, show their work, and ask. The cost of being wrong sets the speed, which is exactly the discipline Cramton and Stephen practiced when they let the engines rip on tactics and overruled them on judgment.

Evidence travels with every claim. When someone asks why we believe something, the answer arrives with its sources attached: the paper, the dataset, the model run, the person who made the call and when. Nobody is asked to take a confident sentence on faith, whether a human wrote it or a model did.

And the canvas compounds. Because the reasoning is captured where the work happens, every decision and every dead end becomes context that the team and the models think with next time. No model upgrade gives you this for free, because the learning lives in the medium rather than in the weights.

That is Kasparov’s better process, rendered slowly and unglamorously as software. In a field where a confidently wrong answer can burn years and millions, I’ve come to believe the quality of that shared medium, ie. the canvas, matters more than the next increment of model capability.

Where we are seeing it play out

One clear case we worked on is indication expansion: taking a molecule that is already de-risked, already through safety, sometimes already on the market, and finding the next disease it should treat. The industry’s biggest recent successes are expansion stories. Keytruda was approved in 2014 as a melanoma drug; a decade and several dozen approved indications later, it is the best-selling drug in the world. Semaglutide went from diabetes to obesity to cardiovascular risk, and each leap multiplied the asset’s value. Each of those moves looks obvious in hindsight, and each was a judgment call at the time, made years before the readout, on evidence that lived in no single place.

The economics are what make expansion such an attractive move. A new molecule costs a decade and will probably fail, while an approved asset has already cleared the hardest hurdles: its safety profile is known, its manufacturing exists, and its mechanism rarely respects the boundary of a single disease, so the next indication can be reached at a fraction of the cost and time of a new program. The complication is the clock. A drug’s market exclusivity starts burning down the day it is approved and never pauses, so every expansion bet spends patent life as well as money. Choose well and one molecule becomes several businesses. Choose badly and you have consumed a trial, a few hundred million dollars, and years of runway the asset never gets back.

Generating candidates was never the hard part. A mechanism touches dozens of diseases, and a model will hand you forty plausible indications in a minute, neatly ranked. What makes the decision hard is that the evidence for the right one is scattered in a way no single mind holds: a mechanistic paper from 2019, a subgroup signal in a Phase 2 that was powered for something else, off-label prescribing patterns sitting in real-world data, an investigator-initiated trial nobody internal has read, a competitor’s pipeline move that hasn’t reached the press, a regulatory precedent in an adjacent disease, internal data from a program shelved four years ago, and the clinical lead who remembers exactly why the last attempt in that indication failed. No retrieval system connects all of that, because the connections are judgment calls stretched across a dozen people and ten repositories, and at the end of them someone has to decide which two indications get real money.

Left alone with that decision, a model will hallucinate and an expert team will run out of hours long before it runs out of evidence. Put them on the same surface, with the right protocol between them, and you get something neither produces alone. The bet we’re making at Perceptic is a canvas the whole team plays on, where the protocol between your people and your models gets a little better every week, rather than a smarter oracle you hand a question to and wait on.

The engines are a commodity now, and they will only get cheaper. In closed worlds the machines will take the whole board, the way they took chess, and that is fine. In open worlds, where the truth costs a Phase 3 to learn, the centaur window is wide open and getting wider, and the advantage goes to whoever builds the better process around the machines.

If that is a problem you’d want to spend the next few years on, I’d love to talk.

Discussion about this post

Ready for more?