Why an AI Gold Medal in Math Signals the End of as We Know It


Last week OpenAI quietly disclosed that an experimental reasoning model scored gold‑medal marks on the 2025 International Mathematical Olympiad (IMO)-matching the best human contestants within the same 4.5‑hour window and without translating problems into a formal proof language.

The part most outlets missed: the team didn't rely on a Lean‑style formal verifier. Instead, they used another LLM as a "fuzzy" verifier capable of checking natural‑language proofs (see: this interview with Noam Brown). Generation and verification now take comparable compute, but the scope of tasks machines can grade has exploded beyond the narrow domains of compilers, type‑checkers, or theorem provers.


Verification Cost Was the Bottleneck-Until Now

Management thinkers from Drucker onward (and Coase, prior, with his Nature of the Firm) framed the firm as a tool for coordinating humans when information is costly and uncertainty is high; i.e., transaction costs make it make sense to centralize certain functions and bring them in-house. In practice, humans have often been the slow, expensive arbiters of "Did this meet spec?"

If an LLM swarm can cheaply certify outputs in plain English, an entire class of bespoke, outcome‑based contracts (what we might expect in a sort of "post-Coasean firm" economy) that were once unscalable suddenly become programmable (as in via computers), and therefore feasible and-in some important new cases-viable:

  • Marketing copy that passes a brand‑voice rubric.

  • A UX prototype that clears a heuristic acceptance suite.

  • A sales‑ops playbook that satisfies a revenue attribution test.

Verification sinks from days to minutes (and so on as human experts can be taken out of the loop); marginal cost approaches cloud inference fees.


Markets Eat Platforms

In a morning tweet‑storm (X-storm?) as I was trying to think through the broader implications of this breakthrough, I joked that Kalshi or Polymarket are Upwork's real competitors and GitHub is creeping into LinkedIn's lane. That bit of provocative posting actually hints at a deeper shift:

Legacy Model

  • Closed labor marketplaces (Upwork, Fiverr)

  • Salaries & hourly rates

  • Résumés & endorsements

  • Middle‑manager oversight

Fuzzy‑Verified Alternative

  • Public bounty boards with escrow & on‑chain reputation

  • Hyper‑financialized micro‑options on discrete outcomes

  • Immutable proof‑of‑work → Git commits + bounty receipts

  • Generator-Verifier agent loops gate‑keeping quality

When any outsider can see the escrow, attach an options contract to it, and trust an autonomous verifier to release funds, sourcing talent starts to look like trading liquid derivatives, not posting job reqs.


What a "Company" Still Needs (and No Longer Needs)

In this new context, you (an executive or capital provider) don't actually need

  • Wages or fixed salaries

  • Standing 1:1s

  • Slack, ClickUp, or 90 % of coordination SaaS built for the pre‑AI era

You do (now) need

  • Best‑of‑breed AI (generator + verifier)

  • Ambient logging of all work artifacts

  • Verifiable, pre‑negotiated outcomes

The corporation's traditional risk‑pooling function persists, but much of its coordination overhead melts away. Expect lean "capital‑formation vehicles" that spin up around a mission, stake bounties, clear them, and dissolve: DAO mechanics without the hype.


New Roles for Humans

Human Advantage (and Why It Matters)

  • Scope Architects: Translating fuzzy business goals into machine‑testable acceptance criteria.

  • Liability Underwriters: Holding the legal or financial risk when automated verification still carries false‑positive odds (follow Soren Larson for deeper thinking on this).

  • Exception Handlers: Tackling the non‑deterministic 5‑10 % where the model says "¯\_(ツ)_/¯".

  • Moral Governors: Deciding should we do X, even when the verifier says we can.

Scarcity migrates from rote expertise to meta‑expertise, e.g., judgment, narrative framing, and risk capital (and/or "taste" to use Twitter-verse catch-all).


A Playbook You Can Ship Today

  1. Start with a single GitHub issue. Define an outcome ("Homepage redesign merged to main"). Escrow payment with a tool like Boss.dev. Implement a CI check via a Github action with an LLM driving "fuzzy verification" (you can do this now, but per the aforementioned breakthrough, it'll get a whole lot better over the next year or so) to help the contractor iterate before they submit the PR; analogously, you could wire up a custom GPT to act as acceptance‑test verifier for non-programming tasks.

  2. Publish the bounty publicly. Watch unknown experts bid: Signal is broadcast via public proof-of-escrowed-funds.

  3. Record everything. Transcripts, commits, AI critiques all become verifiable provenance.

  4. Layer prediction‑market hedges. Let third parties stake on success/failure; their trades surface hidden information and align incentives.

  5. Rinse & repeat. Each closed loop becomes portfolio proof, reputation credit, and raw data to fine‑tune your next verifier.


Why Does Any of This Matter?

"The meeting culture and Slack pings of the early 2020s will seem like lead paint in hindsight.

The IMO result is bigger than a math headline. AFAIK, it's the first public evidence that general‑purpose, machine‑grade verification of fuzzy knowledge work is viable. Once the cost of trusting outcomes collapses, markets overtake hierarchies, and (some) 20th c. expert career paths fragment into liquid micro‑options traded in real time.

Peter Drucker called management "the organ of society charged with making knowledge productive." Fuzzy verification lets us price knowledge productivity directly. The next revolutionary management framework will be shaped more like an order book than an org chart.


Now what?

If you're building the future of work (and/or dismantling the old), test a bounty this quarter. Tag me with what you learn. Let's replace project status meetings with settlement tickers. The market is open.