03. [under construction] The Goal - Make dangerous researchers switch jobs

If you’re sold on the basic mechanism after the candy-wrapper example, great! We’re glad it was an accessible illustration of how private incentives, paired with smart legislation, can quickly and cheaply reshape behavior at scale.

As you can probably guess by our title, our real focus at Extinction Bounties is applying this same fine-insured bounty (FIB) model, with some small modifications, to something much more serious: Existential risks arising from novel technological research - the so-called “black balls” of philosopher Nick Bostrom’s Vulnerable World Hypothesis, like misaligned AI, engineered pandemics, and “grey goo” nanotechnology.

We’re not here to convince you that these dangers exist. Other people have already done that far better than we could:

For a thorough overview of existential risks (x-risks) in general, 80,000 Hours’ The case for reducing existential risks is our favorite starting point.
For a thorough introduction to the x-risk posed by the development of artificial intelligence (AI), AISafety.Info is our go-to primer on the topic.

Our goal here is merely to discuss how to deter these kinds of risks using the mechanism we propose, and why we think it has some unique properties that make it a powerful deterrent for a class of risks that has so far proven very challenging to deal with at scale.

(One last thing: Unsafe AI is our “default” x-risk for discussion on this site, not because it is very differnt in kind from e.g. gain-of-function research from the point of view of our mechanism, but because experts we respect agree it poses the greatest threat of any novel technological development in the near future. If you disagree with us on that feel free to mentally replace “unsafe AI” with the Torment Nexus while reading. Okay, onward!)

Now, it’s theoretically possible that one person, working in total isolation from society, could develop a superhuman AI from scratch. But this seems unlikely in a world so interconnected that no one person can make a commercial pencil by themselves. In practice, one would need capital to purchase the eqiupment needed (GPUs, etc.) as well as labor (coworkers) in many specializations to help run the project. Both of these create very reliable “heat signatures” of what someone might be trying to do.

If you agree with us that risking humanity for a science project, even slightly, should be considered a criminal act, then such corporations should be considered more akin to organized crime like mafia or drug cartels than e.g. Walmart or Tencent. Our standard policing regime does not deal with organized crime well; organized criminals get very good at hiding their tracks and enforcing a code of silence among their members. In a sense you could say they have specialized their labor in the production of crime.

Why AI Developers Would Think Twice

Right now, building cutting-edge artificial intelligence feels like a high-stakes race. Labs and companies are under enormous pressure to innovate fast, publish breakthroughs, and capture market share—even at the risk of dangerous outcomes. But imagine if the rules suddenly changed. Imagine if even trying to develop an unsafe AI could personally bankrupt you.

That’s precisely the promise of the kind of Fine-Insured Bounty (FIB) system we are proposing here at Extinction Bounties. Here’s how:

1. “Pause or Perish”

Suppose we established an FIB that any development of unsafe or prohibited AI technology incurs massive fines. (The exact details of how this is determined can, and probably should, be left a little fuzzy: Generally, currently-existing AI models appear to be fine, as would be wrappers or even multi-agent workflows based around them. But training an even more powerful fundamental model is probably fine-worthy. Etc.) These fines aren’t just symbolic—they’re calibrated to match the potential societal harm of a rogue or catastrophic AI. When it comes to the probable end of the human race if we screw up, a fine equal to all of the money in the world might feel downright lenient.

Whoever exposes the violation first claims the entire bounty from the fines paid. This transforms the risk calculus dramatically. Instead of “publish or perish,” software developers everywhere now face “pause or perish.”

The message becomes clear: reckless development isn’t brave or ambitious—it’s financial suicide.

2. No One to Trust

With large bounties at stake, AI labs become environments of extreme scrutiny and suspicion. Can you trust your co-worker, your intern, your boss? Anyone could secretly document wrongdoing and report the lab to collect a life-changing reward for not just them, but everyone they know.

Your own insurers monitoring your compliance have very good incentives to keep you honest—they pay the fines if you slip up. They will demand to audit you heavily if you do anything within orbit of AI, or you can kiss your premium rates goodbye. And from whom the insurers pull away, to whom the bounty hunters come to play.

Suddenly, secrecy becomes incredibly costly and difficult to maintain. Transparency and cautious compliance are now inarguably the safest, most prudent paths forward.

3. Insurance as the New Regulator

Every AI lab which doesn’t want to end up with their entire staff imprisoned for life must carry insurance against potential FIB fines, lest they get caught. Insurers aren’t passive—they’re motivated to avoid payouts at all costs. They’ll audit AI labs closely, enforce strict safety measures, and rapidly raise premiums if they suspect risky behavior.

This creates a powerful, dynamic, and responsive private regulation layer: insurers effectively police their clients, ensuring safe practices aren’t just recommended—they’re financially mandatory. You can bet the insurers will be reacting to the dizzying rate of new tech news as fast or faster than you, whatever that rate ends up being.

4. Internalizing Catastrophic Risk

Current-day AI developers rarely bear the full societal costs of dangerous mistakes. A misaligned superintelligent AI would devastate humanity—but plenty of its would-be creators might be willing to gamble on this, or even see it as the morally imperative option (I’m serious!).

FIBs change that. By matching fines to expected harms, developers internalize the risks they impose on society. Now, taking safety shortcuts no longer saves money or accelerates careers—it destroys them.

Conclusion: A Culture of Safety

Fine-Insured Bounties don’t just punish wrongdoing—they transform incentives completely. In a FIB world, AI developers would think twice, three times, and then pause altogether before engaging in unsafe practices.

The result? A culture shift toward prudence, transparency, and responsibility—exactly what we need as AI capabilities advance at an unprecedented pace.

Next let’s talk about the international context, secretly the hardest part: Making It Global - Treaties, Coalitions, Extraditions.