Our AI Future is "Fuzzy

July 31, 2025

5 min read

Article

Christian Ulstrup

Principal & Founder, GSD at Work LLC

AI implementation expert helping leaders get tangible results from generative AI

Why an AI Gold Medal in Math Signals the End of as We Know It

Last week OpenAI quietly disclosed that an experimental reasoning model scored gold‑medal marks on the 2025 International Mathematical Olympiad (IMO)-matching the best human contestants within the same 4.5‑hour window and without translating problems into a formal proof language.

The part most outlets missed: the team didn't rely on a Lean‑style formal verifier. Instead, they used another LLM as a "fuzzy" verifier capable of checking natural‑language proofs (see: this interview with Noam Brown). Generation and verification now take comparable compute, but the scope of tasks machines can grade has exploded beyond the narrow domains of compilers, type‑checkers, or theorem provers.

Verification Cost Was the Bottleneck-Until Now

Management thinkers from Drucker onward (and Coase, prior, with his Nature of the Firm) framed the firm as a tool for coordinating humans when information is costly and uncertainty is high; i.e., transaction costs make it make sense to centralize certain functions and bring them in-house. In practice, humans have often been the slow, expensive arbiters of "Did this meet spec?"

If an LLM swarm can cheaply certify outputs in plain English, an entire class of bespoke, outcome‑based contracts (what we might expect in a sort of "post-Coasean firm" economy) that were once unscalable suddenly become programmable (as in via computers), and therefore feasible and-in some important new cases-viable:

Marketing copy that passes a brand‑voice rubric.
A UX prototype that clears a heuristic acceptance suite.
A sales‑ops playbook that satisfies a revenue attribution test.

Verification sinks from days to minutes (and so on as human experts can be taken out of the loop); marginal cost approaches cloud inference fees.

Markets Eat Platforms

In a morning tweet‑storm (X-storm?) as I was trying to think through the broader implications of this breakthrough, I joked that Kalshi or Polymarket are Upwork's real competitors and GitHub is creeping into LinkedIn's lane. That bit of provocative posting actually hints at a deeper shift:

Legacy Model

Closed labor marketplaces (Upwork, Fiverr)
Salaries & hourly rates
Résumés & endorsements
Middle‑manager oversight

Fuzzy‑Verified Alternative

Public bounty boards with escrow & on‑chain reputation
Hyper‑financialized micro‑options on discrete outcomes
Immutable proof‑of‑work → Git commits + bounty receipts
Generator-Verifier agent loops gate‑keeping quality

When any outsider can see the escrow, attach an options contract to it, and trust an autonomous verifier to release funds, sourcing talent starts to look like trading liquid derivatives, not posting job reqs.

What a "Company" Still Needs (and No Longer Needs)

In this new context, you (an executive or capital provider) don't actually need

Wages or fixed salaries
Standing 1:1s
Slack, ClickUp, or 90 % of coordination SaaS built for the pre‑AI era

You do (now) need

Best‑of‑breed AI (generator + verifier)
Ambient logging of all work artifacts
Verifiable, pre‑negotiated outcomes

The corporation's traditional risk‑pooling function persists, but much of its coordination overhead melts away. Expect lean "capital‑formation vehicles" that spin up around a mission, stake bounties, clear them, and dissolve: DAO mechanics without the hype.

New Roles for Humans

Human Advantage (and Why It Matters)

Scope Architects: Translating fuzzy business goals into machine‑testable acceptance criteria.
Liability Underwriters: Holding the legal or financial risk when automated verification still carries false‑positive odds (follow Soren Larson for deeper thinking on this).
Exception Handlers: Tackling the non‑deterministic 5‑10 % where the model says "¯\_(ツ)_/¯".
Moral Governors: Deciding should we do X, even when the verifier says we can.

Scarcity migrates from rote expertise to meta‑expertise, e.g., judgment, narrative framing, and risk capital (and/or "taste" to use Twitter-verse catch-all).

A Playbook You Can Ship Today

Start with a single GitHub issue. Define an outcome ("Homepage redesign merged to main"). Escrow payment with a tool like Boss.dev. Implement a CI check via a Github action with an LLM driving "fuzzy verification" (you can do this now, but per the aforementioned breakthrough, it'll get a whole lot better over the next year or so) to help the contractor iterate before they submit the PR; analogously, you could wire up a custom GPT to act as acceptance‑test verifier for non-programming tasks.
Publish the bounty publicly. Watch unknown experts bid: Signal is broadcast via public proof-of-escrowed-funds.
Record everything. Transcripts, commits, AI critiques all become verifiable provenance.
Layer prediction‑market hedges. Let third parties stake on success/failure; their trades surface hidden information and align incentives.
Rinse & repeat. Each closed loop becomes portfolio proof, reputation credit, and raw data to fine‑tune your next verifier.

Why Does Any of This Matter?

"The meeting culture and Slack pings of the early 2020s will seem like lead paint in hindsight.

The IMO result is bigger than a math headline. AFAIK, it's the first public evidence that general‑purpose, machine‑grade verification of fuzzy knowledge work is viable. Once the cost of trusting outcomes collapses, markets overtake hierarchies, and (some) 20th c. expert career paths fragment into liquid micro‑options traded in real time.

Peter Drucker called management "the organ of society charged with making knowledge productive." Fuzzy verification lets us price knowledge productivity directly. The next revolutionary management framework will be shaped more like an order book than an org chart.
Now what?

If you're building the future of work (and/or dismantling the old), test a bounty this quarter. Tag me with what you learn. Let's replace project status meetings with settlement tickers. The market is open.

Enterprise

Ready to put these insights into action?

Triple-A Transformation Program

14-week program to revolutionize your operations with AI (Triple-A = Accelerated AI Adoption)

Starts at 1 BTC for 100+ person organizationsStarting price

💡 Not sure if this is right for you? Our consultation is always free.

This content was originally published on LinkedInView original post