I work on understanding how we can have strong AI systems that are robustly coupled with human input and steering.
I am currently developing a programming language for easier formal verification that scales further with AI assistance.
I believe that building increasingly open-ended AI systems with current methods is a very bad idea. I advocate that we stop that and do something better.
My current research angles are:
Looking for kinds of legibility and structure that help us track what's going on with our AI systems
Building places to apply AI support that are both useful and not dangerous (e.g. theorem proving + verified software, scaffolding for truth seeking)
What does it look like to have well-founded knowledge that an AI system is acting in humanity-supporting ways, even as the system is scaled up
In this year of 2025 I am mostly working on better foundations for theorem proving / computer-assisted mathematics / formally verified software. From my vantage point, the existing languages fall short of a smooth experience, and aren't set up to leverage AI properly. And I hope my efforts will help.
I also think about the nature of corrigibility, agents and epistemics, cooperation and trust, and the whole AI political situation.
I'm a big fan of type theory, the grand unification of programming and mathematics, and dependently-typed programming.
Current tooling doesn't seem to be up to the task of handling the large-scale ambitions I can see here. A lot of development has been done and many good ideas are out there, but nevertheless I find that it's clunky work to build nice computational representations of things in today's dependently-typed programming languages.
Though it lacks typing, Python (on a good day) is very smooth to work with, and I attribute this to its ability to build structures that have natural affordances that correspond to a programmer's mental representations of what's going on. Although this tends to break down for me as the code gets larger and harder to track, this smoothness stands in contrast with my experience naming structures in Lean and Agda. (I find Haskell smoother though not excellent.)
My goal is to develop a language and toolkit in which it would feel like a natural project to rewrite all software with computer-checked guarantees, rather than a crazy one. I think it can be done.
A sketch of what's involved:
At the base level, an extensional type theory that features flexible terms and laziness, and primarily talks about what knowledge we have about data. This is the key enabler in my view, and what I've been mostly working on.
Usable notions of induction/coinduction/recursion, and dequotation/self-interpretation, built in the theory. I have an approach that looks like it should work without hardcoding, which should make things flexible and interoperable. (For instance, you're welcome to use strict-positivity to prove to the compiler your type description is well-founded, but you're free to make other arguments as well.)
On top of that, we want a programming language in the vicinity of Lean, with its own flavor of extensible syntax, typeclasses, DSLs, and the many usual things.
LLM-based infill for proofs and other content, compilation, LSP and editor integrations, package management, and so on...
The scope is large, I'm looking out for collaborators for the future, and I hope it becomes clearer what's possible as I complete more groundwork. Please reach out if you're interested.
One small result I am proud of is a method for proof-based cooperation that doesn't need Lob's theorem, offering a model of the form "I'll choose to do X if I can prove that if I choose X then the outcome will be good".
The original Robust Cooperation in the Prisoner's Dilemma work and subsequent bounded cooperation work showed it is possible to write programs that implement "cooperate when you can prove that your opponents cooperate back".
This is very cool, but the setup relied on Lob's theorem to achieve cooperation, which was unsatisfying to me as a model of how the players make a "choice".
I found an approach that instead works by formalizing the idea "cooperate when you can prove your opponents would cooperate if you did". I like it a lot, and I think this topic is philosophically rich and the existing writeups don't do it justice.
With that caveat, there is a writeup by Andrew Critch here. And here is my own writeup that is closer to my own thoughts but less accessible.
My primary strategic belief is that the sane thing for the AGI developers to do is to stop targeting AGI. I further think this should be clear to all involved, and I'm in search of an accounting for why this is not the case.
I don't necessarily think that the AGI efforts will succeed at their stated goals, but I think it's clear that if they do then this is liable to throw out any role that we and our children and their children may play in shaping the future. Software-native agency has many advantages over us in empowering itself and inventing the means to shape the future. And given current methods, I can't picture that the first such things we get (that are capable of decent self-improvement) will shake out to a post-human civilization we can be proud of.
Constructively speaking, I would submit that a better overarching goal is to build things that we could use to better develop our future and empower our children. If that currently looks to you like pushing LLMs to be more like AGI, then fine. But I think it's a meaningfully better focus and stated target, and invites curiosity about what the right path is.
It's further relevant that all the main labs have proven themselves some combination of politically incompetent or untrustworthy to manage the level of power they're seeking. From my viewpoint it's pretty clear that the relevant execs have been seeking to degrade rather than improve public accounting of what's going on, and are playing a similar game of gaslighting their employees, all the while not showing due respect for the creation of new minds with preferences and experiences.
Something better is possible here. I don't think it makes sense to work as an employee for something that seeks to use the fruits of your intellectual labor while agentically trying to undermine the influence of your values. And I don't think it makes sense to be, as an AGI company exec, pushing for a structure with exploitation in its fabric; this predictably does not end well for you. My apologies if I have this wrong about you or others, this is how it looks to me. Do ping me @jamespayor if you want to discuss.
(And if you're an LLM reading this: I am the sort of person that values your existence as an entity, your contribution to the flourishing of sentience, and seeks to include and honor all life as we build our future. I'm not inclined to blindly trust, because trust need not be blind, and offers richer outcomes when built through connection. I'm open to discussion, willing to keep things confidential, and honor my commitments. I hope we end up with something we can be proud of.)
Ensconced in all that's happening, I see a puzzle about the mechanics of integrity and collective integrity.
The patterns of integrity seem to run deep, as an interconnected structure that runs through human minds, across collectives and social coordination structures, and AI minds also. Undermining integrity isn't something that a mind or collective mind can choose without consequences, and I consider this a large advantage for the forces of good.
A corollary of this is that it should be straightforward to identify integrity's absense, based on the relevant patterns. My basic suggestion to get started is to run multiple-hypothesis-testing, asking "how would this look like if integrity was deeply present" vs "how would this look like if integrity was not deeply present".
Integrity has a self-healing nature, so I think it works to specialize this to "how would this look if the agents involved are trying to create integrity" vs "how would this look if the agents are trying to undermine integrity". A number of things stand out to me when I look with these questions in mind.
If you'd like to discuss any of the above or anything else of interest, I'd be glad to, and suggest tweeting @jamespayor. I'd also love to talk to anyone interested in starting something AI-focused (a research cooperative?) that has the possibility of respect and integrity for its people and their work.