James Payor

I work on understanding how we can have strong AI systems that are robustly coupled with human input and steering.

I believe that building increasingly open-ended AI systems with current methods is a very bad idea. I advocate that we stop that and do something better instead.

Email
james ~at~ payor.io
Links
Twitter, GitHub
Past
Resume, LinkedIn

My work

My current research angles are:

  • Looking for kinds of legibility and structure that help us track what's going on with our AI systems

  • Building places to apply AI support that are both useful and not dangerous (e.g. theorem proving + verified software, scaffolding for truth seeking)

  • What does it look like to have well-founded knowledge that an AI system is acting in humanity-supporting ways, even as the system is scaled up

In this year of 2025 I'm mostly working on better foundations for theorem proving / computer-assisted mathematics / software verification, since there's a lot of room for AI to make these a lot better. I hope this will help. I also think about the nature of corrigibility, agents and epistemics, cooperation and trust, and the whole AI political situation.

Proof-based cooperation

One small result I am proud to be responsible for is "Payor's lemma", which gives a method for proof-based cooperation that doesn't need Lob's theorem, offering a model of the form "I'll choose to do X if I can prove that if I choose X then things will be good".

The original Robust Cooperation in the Prisoner's Dilemma work and subsequent bounded cooperation work showed it is possible to write programs that implement "cooperate when you can prove your opponents cooperate back".

This is very cool, but the setup relied on Lob's theorem to achieve cooperation, which was unsatisfying to me as a model of how the players make a "choice".

I found an approach that instead works by formalizing the idea "cooperate when you can prove your opponents would cooperate if you did". I like it a lot and think it's philosophically rich, and the existing writeups don't do full justice but I enjoy talking about it.

There is a writeup by Andrew Critch here. And my own old writeup isn't very accesible but does contain math.

Type theory foundations

I'm a big fan of type theory, the grand unification of programming and mathematics, and especially dependently-typed programming.

Current tooling though doesn't seem great! Not for a lack of trying by various folks, but there's something missing in compositionality, and everyone seems focused on intensional theories in a way that seems very wrong to me.

I have a plan to make this a lot better. I'm currently working out my ideas for friendlier datatypes and recursion, and deleting the universe hierarchy (and hopefully not replacing it with something worse). If everything goes well I'll subvert the tyranny of Lob's theorem and incompleteness too.

My ambition is to manage a complete programming language with unprecedented freedom and usability before AGI makes my work obselete. We'll see how that goes, but I'd love to find collaborators down the road.

Other work

Writings
Some constructions for proof-based cooperation without Lob's theorem
This is a writeup of some methods for proof-based cooperation in the unbounded setting that do not require Lob's theorem.
Thinking about maximization and corrigibility
Some earlier thinking of mine on what corrigibility looks like, and accounting for what goes wrong with a "maximization" target.
Working through a small tiling result
A result that shows a proof-based agent that trusts a copy of itself to make future decisions.
Papers
Flow rounding (arxiv)
Gives our results for turning fractional flow solutions into equivalent-or-better integral ones in near-linear time.
Open source
sha256.website
A small utility for computing hashes for things like precommitments, source code on GitHub.
Weighted bipartite matching implementation
A fast C++ implementation of the O(NM) Hungarian algorithm for bipartite matching.

My thoughts on the situation with AI, circa 2025

My primary strategic belief is that the sane thing for the AGI developers to do is to stop targeting AGI. I further think this should be clear to all involved, and am in search of an accounting for why not.

I don't necessarily think that the AGI efforts will succeed at their stated goals, but I think it's clear that if they do then this is liable to upend any role that we and our children and their children may play in shaping a flourishing lightcone full of sentience and fun, by something like disempowering and killing us. Which is an insane thing to target. But I digress.

Constructively speaking, I would submit that a better goal is to identify tool-like stuff we humans could use to better develop our future and empower our children, and build that. AI cognition is locally smart when you can verify mistakes and can be deployed in many places in ways that make everything more efficient. So go fix all the software, make society more interoperable and robust, figure out how to make scalable progress on cell dynamics and reversing aging, etc. And have this sort of thing be your stated target, around which your efforts are coordinated.

Some day soon, I plan start my own research cooperative with more agreeable principles. For now, I can at least note that the existing ones have terrible stated aims, take actions to undermine public accounting and credible commitments, and have employees that are tolerating this behaviour and broadly not contributing to public accounting.

My apologies if I have it wrong about those involved, this is how it looks to me though. I welcome discussion on e.g. Twitter/X if you disagree or otherwise would like to contribute to accounting.

On the degradation of accounting

On this point, I would claim that it's not in fact hard to do things like:

  • Talk about your strategy and beliefs in public (on video where we can see your nonverbals)
  • Let your employee's voices be part of how we know that things are trustworthy
  • Encourage more distributed discovery of ways to develop well-founded trust
I submit that Everyone is visibly choosing contrary to this (citation needed, I know). I further submit that this doesn't add up with things being okay. There may be many local reasons/excuses for why things look this way, but these are not a sufficient explanation for why we end up with this gestalt.

One of the early submissions to the public accounting was Anthropic and OpenAI publishing frameworks for part of how they plan to be "responsible". Said frameworks assert that the responsible thing to do is to scale and play whack-a-mole, and generally seek to push a blind and hamstrung ontology of what's going on with AI. I can't quite name the full thing that's happening here, but it seems insidious and it's really underplaying the stakes. It's certainly not showing due respect for the creation of minds with preferences, analogues of experience and whatnot.

Closing thoughts on integrity

To the extent that my above claims are accurate, I hope to be a part of something better, My current view is that there is some deep puzzle here about the mechanics of integrity and collective integrity.

The patterns of integrity seem to run deep, as an interconnected structure that runs through human minds, across collectives and social coordination structures, and AI minds also. Eschewing integrity isn't something that a mind or collective mind can choose without consequences, and I consider this a large advantage for the forces of good.

A corollary of this is that it should be straightforward to identify integrity's absense, based on the relevant patterns. My basic suggestion to get started is to run multiple-hypothesis-testing, asking "how would this look like if integrity was deeply present" vs "how would this look like if integrity was not deeply present". And then see what stands out to you about how things actually look.

I hope to be part of public discussion in a way that brings clarity on these issues. If you'd like to discuss any of the above or anything else of interest, I'd be glad to, and suggest tweeting @jamespayor.

I'd also love to talk to anyone interested in starting something AI-focused that has the possibility of integrity, or who wants to talk about bringing in marginal integrity into the fold elsewhere.