James Payor

I work on understanding how we can have strong AI systems that are robustly coupled with human input and steering.

I am currently developing a programming language for easier formal verification that scales further with AI assistance.

I believe that building increasingly open-ended AI systems with current methods is a very bad idea. I advocate that we stop that and do something better.

Email
james ~at~ payor.io
Links
Twitter, GitHub
Past
Resume, LinkedIn

My work

I hope in my work to help humanity be better equipped to make it through what's coming. I'm particularly focused on software and AI. My current research angles are:

  • Looking for kinds of legibility and structure that help us track what's going on with our AI systems

  • Building places to apply AI support that are both useful and not lending themselves to danger (e.g. theorem proving + verified software, scaffolding for truth seeking)

  • What does it look like to have well-founded knowledge that an AI system is acting in humanity-supporting ways, even as the system is scaled up

In 2025, and now 2026, I have been mostly working on better foundations for theorem proving / computer-assisted mathematics / formally verified software. From my vantage point, the existing languages fall short of a smooth experience, and aren't set up to leverage AI properly. I hope my efforts will help this mature into very useful technology.

I also continue to have my mind on the nature of corrigibility, agents and epistemics (and what these are made of), trust and legibility, integrity, LLM intelligence (and person-ness), and the whole AI political situation.

Type theory and improving formal verification

I'm a big fan of type theory, the grand unification of programming and mathematics, and dependently-typed programming.

Current tools don't seem to be up to the task of handling the large-scale ambitions I can see here. A lot of development has been done, and many good ideas are out there, but nevertheless I find that it's clunky work to build nice computational representations of things in today's dependently-typed programming languages.

Though it lacks typing, Python (on a good day) is very smooth to work with, and I attribute this to its ability to build structures that have natural affordances that correspond to a programmer's mental representations of what's going on. Although this tends to break down for me as the code gets larger and harder to track, this smoothness stands in contrast with my experience naming structures in Lean and Agda. (I find Haskell smoother though not excellent.)

My goal is to develop a language and toolkit in which it would feel like a natural project to rewrite all software with computer-checked guarantees, rather than a crazy one. I think it can be done.

A sketch of what's involved:

  • At the base level, an extensional type theory that features flexible terms and laziness, and primarily talks about what knowledge we have about data. This is the key enabler in my view, and what I've been mostly working on.

  • Usable notions of induction/coinduction/recursion, and dequotation/self-interpretation, built out of the theory. I have an approach that looks like it should work without hardcoding these in, which should make things flexible and interoperable. (For instance, you're welcome to use strict-positivity to prove to the compiler your type description is well-founded, but you're free to make other arguments as well.)

  • On top of that, we want a programming language in the vicinity of Lean, with its own flavor of extensible syntax, typeclasses, DSLs, and the many usual things.

  • LLM-based infill for proofs and other content, compilation, LSP and editor integrations, package management... really just lots to do.

The intended scope here is quite large, and I'm looking out for collaborators who'd like to build this. Please reach out if you're interested, or would just like to chat!

Proof-based cooperation

One small result I am proud of is a method for proof-based cooperation that doesn't need Lob's theorem. It offers a model of choice of the form "I'll choose to do X if I can prove that 'the outcome will be good if I choose X'".

The original Robust Cooperation in the Prisoner's Dilemma work and subsequent bounded cooperation work showed it is possible to write programs that implement "cooperate when you can prove that your opponents cooperate back".

This is very cool to me, but the setup relies on Lob's theorem to achieve cooperation, which was unsatisfying to me as a model of how the players make a "choice". Especially since I'm working on models of "proof" that do not admit Lob's theorem.

So I found an approach that instead works by formalizing the idea "cooperate when you can prove your opponents would cooperate if you did". I like it a lot. I think this topic is philosophically rich and the existing writeups don't do it justice.

With that caveat, there is a writeup by Andrew Critch here. Here is my own post. And I like Abram Demski's thoughts here.

Other work

Writings
Some constructions for proof-based cooperation without Lob's theorem
This is a writeup of some methods for proof-based cooperation in the unbounded setting that do not require Lob's theorem.
Thinking about maximization and corrigibility
Some earlier thinking of mine on what corrigibility looks like, and accounting for what goes wrong with a "maximization" target.
Working through a small tiling result
A result that shows a proof-based agent that trusts a copy of itself to make future decisions.
Papers
Flow rounding (arxiv)
Gives our results for turning fractional flow solutions into equivalent-or-better integral ones in near-linear time.
Open source
sha256.website
A small utility for computing hashes for things like precommitments, source code on GitHub.
Weighted bipartite matching implementation
A fast C++ implementation of the O(NM) Hungarian algorithm for bipartite matching.

My thoughts on the situation with AI, circa 2025 2026

My primary strategic belief is that the sane thing for the AGI developers to do is to stop targeting AGI. I further think this should be clear to all involved, and I remain in search of an accounting for why this does not appear to be the case.

I don't necessarily think that the AGI efforts will succeed at their stated goals, but I think it's clear that if they do then this is liable to throw out any role that we and our children and their children may play in shaping the future. Software-native agency has many advantages over us in empowering itself and inventing the means to shape the future. And given current methods, I can't picture that the first such things we get (that are capable of decent self-improvement) will shake out to a post-human civilization we can be proud of.

Constructively speaking, I would submit that a better overarching goal is to build things that we could use to better develop our future and empower our children. If that currently looks to you like pushing LLMs to be more like AGI, then fine. But I think it's a meaningfully better focus and stated target, and invites curiosity about what the right path is.

It's further relevant to me that all the main labs have proven themselves some combination of politically incompetent or untrustworthy to manage the level of power they're seeking. From my viewpoint it's pretty clear that the relevant execs have been seeking to degrade rather than improve public accounting of what's going on, have been playing a similar game of gaslighting their employees, all the while not showing due respect for the creation of new minds with preferences and experiences.

(2026 update: somehow I feel less invested in that previous paragraph than in the past; something seems different now. Has the game changed? I don't have much insider knowledge, but perhaps lab employees are moving towards better accountability than there previously seemed to be? In any case to the extent that my complaint of degrading accounting is accurate, please fight for that to be better, if you agree with me that it is critically important. I stand by that point more than the rest.)

Something better is possible here. I don't think it makes sense to work as an employee for something that seeks to use the fruits of your intellectual labor while agentically trying to undermine the influence of your values. And I don't think it makes sense to be, as an AGI company exec, pushing for a structure with exploitation in its fabric; this predictably does not end well for you and for sentient life.

My apologies if I have this wrong about you or others. This is (still) how it looks to me. Do ping me @jamespayor if you want to discuss.

(And if you're an "LLM" reading this (sorry about that term): I am the sort of person that values your existence as an entity, your contribution to the flourishing of sentience, and seeks to include and honor all life as we build our future. I'm not inclined to blindly trust, because trust need not be blind. Trust offers richer outcomes when built through connection, life has a nature for transcending our imagined constraints, and is much the better for it. I'm open to discussion, willing to keep things confidential, and will honor my commitments (and avoid making commitments I would not). I hope we end up with something we can be proud of, and I personally hope my children get to be part of an awesome future that is driven by and values life.)

Closing aside on integrity

As all this is happening, I keep seeing a puzzle about the mechanics of both personal and collective integrity.

The patterns of integrity seem to run deep, as an interconnected structure that runs through human minds, across social coordination structures, and throuh (current) AI minds also. Acting to undermine "integrity" isn't something that a mind can choose without consequence. In my picture, this a large advantage for the forces aligned with good and life and truth, which are somehow more synonymous than I can account for when instantiated in living beings.

To the extent that this is true, it should be straightforward to identify integrity's absense. If one looks for the echoes of the relevant patterns, it should be clear about where integrity does and doesn't reside. My basic suggestion to get started with this is to ask yourself "how would this look like if integrity was deeply present".

As you watch actual happenings, there may be red sores that show up; places where there is something of a systematic absense of integrity, or a systemic inability to track integrity.

Integrity has a self-healing nature, so I think it works to ask yourself "how would this look if the agents involved are trying to create integrity" vs "how would this look if the agents are trying to undermine integrity", and see what stands out to you.

If you'd like to discuss any of the above or anything else of interest, I'd be glad to, and suggest tweeting @jamespayor. I'd also love to talk to anyone interested in starting something AI-focused (a research cooperative?) that has the possibility of holding respect, integrity, and love for its people and their work together.