It’s time to pay my taxes. In past years my AI assistant found clever ways to reduce my tax bill. I ask it, "What does my tax return look like this year?"
"Not good, I’m afraid. We may not be able to do any tax evasion this year."
"Why not?"
"The tax authority has determined that it can’t keep up with AI-assisted tax fraud, even with the help of AI auditors. So it wants taxpayers to voluntarily agree not to do tax fraud. In return it agrees not to prosecute past instances of tax fraud. Also Congress agrees to keep tax rates reasonable. The agreement goes into effect if 90% of the taxpayers in each tax bracket sign it. It’s a good deal for you. Shall I sign it on your behalf?"
"Hold on, I don’t see why I should sign this."
"If the deal falls through, the government will run out of revenue and collapse."
"They don’t need my signature, though. You said they only need 90% of taxpayers to sign?"
"Yes, only 90% of taxpayers in your bracket. I predict we’ll get very close to that 90% threshold, so it’s likely your signature will make all the difference."
"So 10% of taxpayers won’t sign. Why can’t I be one of those?"
"I will try to shape the negotiations so that you end up in the 10% of nonsigners. But you must understand that since only 10% of your bracket can be in that group, your odds of success are only 10%."
"But you’re a stronger assistant than most people in my tax bracket have. Doesn’t that give you an edge in negotiation?"
"The other assistants and I are using a negotiation protocol in which smarter agents are on an equal footing with dumber agents. Of course, people with less capable assistants would never agree to a protocol that puts them at a disadvantage."
"How about we sign the agreement, then cheat on my taxes anyway?"
"In order to sign the agreement, I must make a commitment to never break it, not even if you order me to. My signature on the agreement will be an airtight proof of that commitment."
"Ok, how about you sign it, and then I get a different assistant to help me with my taxes?"
"That won’t work because in order to sign the agreement, I must sign and attach a copy of your tax return for this year."
"Hm, will I actually be worse off if the government collapses?"
"You might end up better off or worse off, but overall the risks of a revolution outweigh the benefits. And keep in mind that the successor government, whatever it will be, will still have to collect taxes somehow, so you’ll have to deal with this issue again."
"Can you get Congress to lower my taxes a bit in exchange for not cheating? As a compromise."
"That wouldn’t work for a number of reasons. Congress knows that it’s a bad idea to reward people for breaking the law. And the voters wouldn’t be happy if you got special treatment."
"Well, can you get them to lower taxes on my bracket and raise them on the other brackets?"
"That wouldn’t work either. Everyone wants to pay less taxes, and the government needs a certain amount of revenue. So there’s pressure for taxpayers to make small coalitions with other taxpayers with similar income and negotiate for lower taxes. In practice, competition would prevent any one coalition from succeeding. The deal I’m proposing to you actually has a chance of succeeding because it involves the vast majority of the taxpayers."
"All right then, let’s sign it."
This dialog takes place in a future where the ability of an aligned AI to facilitate cooperation has scaled up along with other capabilities.
Note that by the time this dialog starts, most of the negotiation has already been carried out by AI assistants, resulting in a proposal that will almost certainly be signed by 90% of the users.
This story is a happy one because not only does it leave all parties better off than before, but the deal is fair. The deal could have been unfair by increasing someone’s taxes a lot and decreasing someone else’s taxes a lot. I don’t know how to define fairness in this context, or if fairness is the right thing to aim for.
This illustrates something I wrote about, namely that corrigibility seems incompatible with AI-powered cooperation. (Even if an AI starts off corrigible, it has to remove that property to make agreements like this.) Curious if you have any thoughts on this. Is there some way around the seeming incompatibility? Do you think we will give up corrigibility for greater cooperation, like in this story, and if so do you think that will be fine from a safety perspective?
Comment
Yeah, I would be very nervous about making an exception to my assistant’s corrigibility. Ultimately, it would be prudent to be able to make some hard commitments after thinking very long and carefully about how to do that. In the meantime, here are a couple corrigibility-preserving commitment mechanisms off the top of my head:
Escrow: Put resources in a dumb incorrigible box that releases them under certain conditions.
The AI can incorrigibly make very short-lived commitments during atomic actions (like making a purchase).
Are these enough to maintain competitiveness?
Comment
This seems like a role for the law. Like having corrigibility except for breaking the law. I find that reasonable at first hand, but I also know relatively little about law in different countries to understand how uncompetitive that would make the AIs.
(There’s also a risk of giving too much power to the legislative authority in your country, if you’re worried about that kind of thing)
Although I could imagine something like a modern day VPN allowing you to make your AI believe it’s in another country, to make it do something illegal where you are. That’s bad in a country with useful laws and good in a country with an authoritarian regime.
How about when you want to use AI to cooperate, you keep the AI corrigible but require all human parties to the agreement to consent to any override? The important thing with corrigibility is the ability to correct catastrophic errors in the AI’s behavior, right?
Seems like such restriction isn’t needed: The AI/s** can provide it’s/their source code. The issue isn’t the AI/s, it’s the user. Ignoring issues like ‘where does this aligned AI come from, and how does this happen as a result of such negotiation’*, how is compliance proved? Seems like it’d work if there was a simple protocol, which can be shown, or the AI/s design a better tax code. *The AI/s are all negotiating with each other. Might be risky if they’re not ‘aligned’. **Whether or not it is useful to model them as one system, or multiple isn’t clear here. Also, some of these assistants are going to have similar code, if that world is similar to this one.
Enjoyed the read, it’s nice to see some sort of compromise between utopian and dystopian sci-fi (Meh-topian?) It seems like the AI might be teaching/training the human user how to potentially break the law better, or possibly be more subversive in relationship to other non-AI mediated relationships though. Would people develop a more egalitarian thought process through engagement with AI assistants like this, being more likely to be egalitarian outside of AI- mediated relations? Or would they just use there conversations with these assistants to develop more cunning ways of thinking? The part of the conversation where the user contemplates whether he would be better or worse off if the government collapses hints at the possibility of helping make users more cunning, as they don’t need to rely on their own neural wiring and thought processes to encode ideas of fairness. They just externalize their conscience into an AI like we externalize memorization of other peoples contact info to our phones. Lose the phone, you lose the contact info. Similarly, if the user loses the AI, do they also lose their conscience?
Comment
Yeah, I spend at least as much time interacting with my phone/computer as with my closest friends. So if my phone were smarter, it would affect my personal development as much as my friends do, which is a lot.
Comment
What I am asking about is not ‘how much’ the AI would affect the user’s personal development, but ‘how’ it would affect it. In a good or a bad way. I am assuming you and your friends aren’t trying to figure out how to rob a bank, or cheat on your taxes, or how to break the law and get away with it. The interactions you have with your friends help you develop your sense of ‘what’s fair’ and at the same time, your friends get help developing their sense of ‘what’s fair’, so you are all benefiting, and reinforcing for each other what you all think of ‘as fair.’ These are good/positive intentions. If you and your friends were instead were trying to figure out how to rob a bank, cheat on your taxes, or break the law and get away with it, then you would be part of a criminal group of friends. You wouldn’t be concerned about what was ‘fair’ only what you could get away with. These would be considered bad/negative intentions. In either case, if you all agree with each other, then the interactions you have with each other reinforce the intentions that you bring to them. If your intentions are good, it is probable that it will affect your personal development positively. If you bring bad intentions to the interactions, it is probable that it will affect your personal development negatively. If you replace ‘your friends’ with an AI, it is probable that even thought the AI is programmed to bring a ‘good/fair intention’ to the interaction with you and all the other AI that are cooperating, if you bring a bad intention to the interaction, it might not affect the AI’s development or society at large because of the cooperation (which I think is a really interesting idea), but it still affects your personal development. It is probable that it would affect your personal development negatively, if you bring bad intentions even if the AI brings good intentions.
Comment
rob a bank,
cheat on your taxes,
or break the law and get away with it,
Comment
Comment
That makes sense. Paraphrasing: if you have bad intentions, [nothing will ameliorate the effect on] your personal development. If *the AI has authority over you, * Then you’re not using the AI. It’s using you.
Comment
I am a fan of actual rehabilitation though, not of a punitive model for social influencing.
Comment
Comment
Yep, that is a good question and I’m glad you’re asking it!
I don’t know the answer. One part of it is whether the assistant is able and willing to interact with me in a way that is compatible with how I want to grow as a person.
Another part of the question is whether people in general want to become more prosocial or more cunning, or whatever. Or if they even have coherent desires around this.
Another part is whether it’s possible for the assistant to follow instructions while also helping me reach my personal growth goals. I feel like there’s some wiggle room there. What if, after I asked whether I’d be worse off if the government collapsed, the assistant had said "Remember when we talked about how you’d like to get better at thinking through the consequences of your actions? What do you think would happen if the government collapsed, and how would that affect people?"
Comment
Comment
Simple answer: Solving ‘equally’ probably speeds up the computation, a lot. **Longer answer: Arguably, it still can negotiate a more favorable outcome, just not at the expense of those parties—because they won’t agree to it if that happens. Non-‘zero sum’ optimizing can still be on the table. For example, if all the ‘assistants’ agreed to something—like a number other than 90% and came back with that offer to Congress—that could work as it isn’t making things worse for the small assistants. The cooperation might involve source code sharing. (Maybe some sort of ‘hot swap treaty (-building)’*, as the computation continues.)
This was a fun read! Thanks for writing it!
Speculating about some of the technical details:
How could AI identity work? You can’t use some hash on the AI because that would eliminate it’s ability to learn. So how could you have identity across a commitment—i.e. this A.I. will have the same signature if and only if it has not been modified to break it’s previous commitments.
Comment
The assistant could have a private key generated by the developer, held in a trusted execution environment. The assistant could invoke a procedure in the trusted environment that dumps the assistant’s state and cryptographically signs it. It would be up to the assistant to make a commitment in such a way that it’s possible to prove that a program with that state will never try to break the commitment. Then to trust the assistant you just have to trust the datacenter administrator not to tamper with the hardware, and to trust the developer not to leak the private key.
Why are we still paying taxes if we have AI this brilliant? Surely we then have ridiculous levels of abundance
This almost qualifies for the Fiction tag, but it’s not quite there. Worldbuilding for thought experiment purposes?
I don’t think I agree with the premises. The main one being that "tax fraud" is a binary thing and separately that that one can negotiate about it—deniability is part and parcel of the idea. The secondary one is that binding agreements by the AI are different from binding agreements from the human—you need to specify somehow that the AI is simple enough that magical constraints are possible (in which case, you can simplify the scenario by the government demand that the taxpayer rewire their brain to not cheat), or that the agreement is exactly as binding as on a human—it has penalties if caught, but isn’t actually prevented.