I work on understanding how we can have strong AI systems that are robustly coupled with human input and steering.
I believe that building increasingly open-ended AI systems with current methods is a very bad idea. I advocate that we stop that and do something better instead.
My current research angles are:
Looking for kinds of legibility and structure that help us track what's going on with our AI systems
Building places to apply AI support that are both useful and not dangerous (e.g. theorem proving + verified software, scaffolding for truth seeking)
What does it look like to have well-founded knowledge that an AI system is acting in humanity-supporting ways, even as the system is scaled up
In this year of 2025 I'm mostly working on better foundations for theorem proving / computer-assisted mathematics / software verification, since there's a lot of room for AI to make these a lot better. I hope this will help. I also think about the nature of corrigibility, agents and epistemics, cooperation and trust, and the whole AI political situation.
One small result I am proud to be responsible for is "Payor's lemma", which gives a method for proof-based cooperation that doesn't need Lob's theorem, offering a model of the form "I'll choose to do X if I can prove that if I choose X then things will be good".
The original Robust Cooperation in the Prisoner's Dilemma work and subsequent bounded cooperation work showed it is possible to write programs that implement "cooperate when you can prove your opponents cooperate back".
This is very cool, but the setup relied on Lob's theorem to achieve cooperation, which was unsatisfying to me as a model of how the players make a "choice".
I found an approach that instead works by formalizing the idea "cooperate when you can prove your opponents would cooperate if you did". I like it a lot and think it's philosophically rich, and the existing writeups don't do full justice but I enjoy talking about it.
There is a writeup by Andrew Critch here. And my own old writeup isn't very accesible but does contain math.
I'm a big fan of type theory, the grand unification of programming and mathematics, and especially dependently-typed programming.
Current tooling though doesn't seem great! Not for a lack of trying by various folks, but there's something missing in compositionality, and everyone seems focused on intensional theories in a way that seems very wrong to me.
I have a plan to make this a lot better. I'm currently working out my ideas for friendlier datatypes and recursion, and deleting the universe hierarchy (and hopefully not replacing it with something worse). If everything goes well I'll subvert the tyranny of Lob's theorem and incompleteness too.
My ambition is to manage a complete programming language with unprecedented freedom and usability before AGI makes my work obselete. We'll see how that goes, but I'd love to find collaborators down the road.
My primary strategic belief is that the sane thing for the AGI developers to do is to stop targeting AGI. I further think this should be clear to all involved, and am in search of an accounting for why not.
I don't necessarily think that the AGI efforts will succeed at their stated goals, but I think it's clear that if they do then this is liable to upend any role that we and our children and their children may play in shaping a flourishing lightcone full of sentience and fun, by something like disempowering and killing us. Which is an insane thing to target. But I digress.
Constructively speaking, I would submit that a better goal is to identify tool-like stuff we humans could use to better develop our future and empower our children, and build that. AI cognition is locally smart when you can verify mistakes and can be deployed in many places in ways that make everything more efficient. So go fix all the software, make society more interoperable and robust, figure out how to make scalable progress on cell dynamics and reversing aging, etc. And have this sort of thing be your stated target, around which your efforts are coordinated.
Some day soon, I plan start my own research cooperative with more agreeable principles. For now, I can at least note that the existing ones have terrible stated aims, take actions to undermine public accounting and credible commitments, and have employees that are tolerating this behaviour and broadly not contributing to public accounting.
My apologies if I have it wrong about those involved, this is how it looks to me though. I welcome discussion on e.g. Twitter/X if you disagree or otherwise would like to contribute to accounting.
On this point, I would claim that it's not in fact hard to do things like:
One of the early submissions to the public accounting was Anthropic and OpenAI publishing frameworks for part of how they plan to be "responsible". Said frameworks assert that the responsible thing to do is to scale and play whack-a-mole, and generally seek to push a blind and hamstrung ontology of what's going on with AI. I can't quite name the full thing that's happening here, but it seems insidious and it's really underplaying the stakes. It's certainly not showing due respect for the creation of minds with preferences, analogues of experience and whatnot.
To the extent that my above claims are accurate, I hope to be a part of something better, My current view is that there is some deep puzzle here about the mechanics of integrity and collective integrity.
The patterns of integrity seem to run deep, as an interconnected structure that runs through human minds, across collectives and social coordination structures, and AI minds also. Eschewing integrity isn't something that a mind or collective mind can choose without consequences, and I consider this a large advantage for the forces of good.
A corollary of this is that it should be straightforward to identify integrity's absense, based on the relevant patterns. My basic suggestion to get started is to run multiple-hypothesis-testing, asking "how would this look like if integrity was deeply present" vs "how would this look like if integrity was not deeply present". And then see what stands out to you about how things actually look.
I hope to be part of public discussion in a way that brings clarity on these issues. If you'd like to discuss any of the above or anything else of interest, I'd be glad to, and suggest tweeting @jamespayor.
I'd also love to talk to anyone interested in starting something AI-focused that has the possibility of integrity, or who wants to talk about bringing in marginal integrity into the fold elsewhere.