Reducing Agents: When abstractions break

https://www.lesswrong.com/posts/fgNtucxwqa7Sa9kKG/reducing-agents-when-abstractions-break

*Epistemic Effort: A month of dwelling on this idea, 12-16 hours of writing to explore the idea, and 2-5 hours rereading old LW stuff. * In the past few months, I’ve been noticing more things that lead me to believe there’s something incomplete about how I think about beliefs, motives, and agents. There’s been one too many instances of me wondering, "Yeah, but but what do you really believe?" or "Is that what you really want?" This post is the first in a series where I’m going to apply More Dakka to a lot of Lesswrong ideas I was already familiar with, but hadn’t quite connected the dots on. Here are the main points:

Imagine a robot with a turret-mounted camera and laser. Each moment, it is programmed to move forward a certain distance and perform a sweep with its camera. As it sweeps, the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area in the camera image, and then continues on its way.It’s tempting to look at that robot and go, "Aha! It’s a blue minimizing robot." Now you can model the robot as an agent with goals and go about making predictions. Yet time and time again, the robot fails to achieve the goal of minimizing blue. In fact, there are many ways to subvert this robot. What if we put a lens over its camera which inverts the image, so that white appears as black, red as green, blue as yellow, and so on? The robot will not shoot us with its laser to prevent such a violation (unless we happen to be wearing blue clothes when we approach) - its entire program was detailed in the first paragraph, and there’s nothing about resisting lens alterations. Nor will the robot correct itself and shoot only at objects that appear yellow—its entire program was detailed in the first paragraph, and there’s nothing about correcting its program for new lenses. The robot will continue to zap objects that register a blue RGB value; but now it’ll be shooting at anything that is yellow.Maybe you conclude that the robot is just a Dumb Agent™ . It wants to minimize blue, but it just isn’t clever enough to figure out how. But as Scot points out, the key error with such an analysis is to even model the robot as an agent in the first place. The robot’s code is all that’s needed to fully predict how the robot will operate in all future scenarios. If you were in the business of anticipating the actions of such robots, you’d best forget about trying to model it as an agent and just use the source code. The Connect 4 VNM Robot I’ve got a Connect 4 playing robot that beats you 37 times in a row. You conclude it’s a robot whose goal is to win at Connect 4. I even let you peak at the source code, and aha! It’s explicitly encoded as a VNM agent using a mini-max algorithm. Clearly this can safely be modeled as an expected utility maximizer with the goal of whooping you at connect 4, right? Well, depends on what counts as safely. If the ICC (International Connect 4 Committee) declares that winning at Connect 4 is actually defined by getting 5 in a row, my robot is going to start losing games to you. Wait, but isn’t it cheating to just say we are redefining what winning is? Okay, maybe. Instead of redefining winning, let’s run interference. Every time my robot is about to place a piece, you block the top of the board (but only for a few seconds). My robot will let go of its piece, not realizing it never made a move. Arg! If only the robot was smart enough to wait until you stopped blocking the board, then it could have achieved it’s true goal of winning at connect 4! Except this robot doesn’t have any such goal. The robot is only code, and even though it’s doing a faithful recreation of a VNM agent, it’s still not a Connect 4 winning robot. Until you make an Agent model that is at least as complex as the source code, I can put the robot in a context where your Agent model will make an incorrect prediction. "So what?" you might ask. What if we don’t care about every possible context? Why can’t we use an Agent model and only put the robot in contexts where we know the abstraction works? We absolutely can do that. We just want to make sure we never forget that this model breaks down in certain places, and we’d also like to know exactly where and how it will break down. Adaptation Executors, Not Fitness Maximisers Things get harder when we talk about humans. We can’t yet "use the source code" to make predictions. At first glance, using Agents might seem like a perfect fit. We want things, we believe things, and we have intelligence. You can even look at evolution and go, "Aha! People are fitness maximizers!" But then you notice weird things like the fact that humans eat cookies. Eliezer has already tackled that idea. No human being with the > deliberate goal of maximizing their alleles’ inclusive genetic fitness, would ever eat a cookie unless they were starving. But individual organisms are best thought of as adaptation-executors, not fitness-maximizers.Adaptation executors, not fitness-maximizers. Adaptation executors, not fitness-maximizers. Adaptation executors, not fitness-maximizers. Repeat that 5 more times every morning upon waking, and then thrice more at night before going to bed. I’ve certainly been muttering it to myself for the last month that I’ve been dwelling on this post. Even if you’ve already read the Sequences, give that chunk another read through. Rebuttal: Maybe fitness isn’t the goal. Maybe we should model humans as Agents who want cookies. We could, but that doesn’t work either. More from Scott: If there is a cookie in front of me and I am on a diet, I may feel an ego dystonic temptation to eat the cookie—one someone might attribute to the "unconscious". But this isn’t a preference—there’s not some lobe of my brain trying to steer the universe into a state where cookies get eaten. If there were no cookie in front of me, but a red button that teleported one cookie from the store to my stomach, I would have no urge whatsoever to press the button; if there were a green button that removed the urge to eat cookies, I would feel no hesitation in pressing it, even though that would steer away from the state in which cookies get eaten. If you took the cookie away, and then distracted me so I forgot all about it, when I remembered it later I wouldn’t get upset that your action had decreased the number of cookies eaten by me. The urge to eat cookies is not stable across changes of context, so it’s just an urge, not a preference.Like with the blue minimizing robot, it’s tempting to resort to using a Dumb Agent™ model. Maybe you really do have a preference for cookies, but there is a counter-preference for staying on your diet. Maybe proximity to cookies increases how much you value the cookie world-state. There are all sorts of weird ways you could specify your Dumb Agent ™ to produce human cookie. But please, don’t. I can’t appeal to "Just use the source code" anymore, but hopefully, I’m getting across the point that it’s at least a little bit suspicious that we (I) want to conform all human behavior to the Agent Abstraction. So if we aren’t agents, what are we? Hopefully that last sentence triggered a strong reflex. Remember, it’s a not a question of whether or not we are agents. We are quarks/​whatever-is-below, all hail reductionism. We are trying to get a better understanding of when the Agent abstraction breaks down, and what alternative models to use when things do break down. This post’s main intent was to motivate this exploration, and put to rest any fears that I am naively trying to explain away agents, beliefs, and motives. Next Post: What are difficult parts of intelligence that the Agent abstraction glosses over?

Comment

https://www.lesswrong.com/posts/fgNtucxwqa7Sa9kKG/reducing-agents-when-abstractions-break?commentId=tYsGDzTL8SiyN4jXk

This was great, and I found it to be a clearer way of pointing at how typical "straw rationalist-y" ways of looking at decision-making might not work well in certain situations. I’m looking forward to the rest of the sequence!

https://www.lesswrong.com/posts/fgNtucxwqa7Sa9kKG/reducing-agents-when-abstractions-break?commentId=vpDZTbNGSDBdQxY7o

Well now you’ve got me looking forward to more :)

https://www.lesswrong.com/posts/fgNtucxwqa7Sa9kKG/reducing-agents-when-abstractions-break?commentId=xGMRh89YsuNQzoKvw

Something I struggle with in the adaptation-executor-vs-fitness-maximiser is not dismissing too much as just ‘misfiring adaptation’ and allowing that some of my intuitive behaviour is sensible. I’m particularly prone to doing this with productivity: "ah, a part of me that is fairly well described as being agent-y, is being stymied by a non-agenty part that just interacts with shiny things! I should make sure that in-so-far as I have shininess in my vicinity, it should mostly be attached to things the agenty part of me wants to interact with!"

I don’t know if that is something this sequence will cover, but I hope so!

Comment

https://www.lesswrong.com/posts/fgNtucxwqa7Sa9kKG/reducing-agents-when-abstractions-break?commentId=RqrWSvvWSMSEeYTMe

The word "just" in "just interacts with shiny things" is giving your parts too little credit, I think. There are a lot of flavors of being attracted to shiny things and some of them look like genuinely poor impulse control but some of them look like trying to avoid feeling pain, or perhaps trying to avoid existing in some sense. Your description isn’t detailed enough for me to say more, though.

Comment

Clearly this wasn’t the best description, because the comment was supposed to not be giving the part in quotes much credit. So as you say, the word ‘just’ is needlessly dismissive, but I am dismissing the part of me that is so dismissive!

(I added some more detail in another comment)

https://www.lesswrong.com/posts/fgNtucxwqa7Sa9kKG/reducing-agents-when-abstractions-break?commentId=5QLx4DkQusQQCysqQ

Some of the topics I was planning on will be related, though there might not be a direct commentary on it. What are some specific things your paragraph in quotes refers to? I don’t see that problem description as fitting nicely into, " not dismissing too much as just ‘misfiring adaptation’ and allowing that some of my intuitive behaviour is sensible ", so an example would help.

Comment

You’re right, that isn’t very clear. By the way, the thought in this example is not that I definitely dismiss unfairly in this case, the idea is that I’m doing it in an unnuanced way that doesn’t take into account possible reasons the impulse can’t be ignored. The ‘silly shiny things’ attitude is not able to tease apart when my behaviour is sensible from when it’s not.

As a hopefully clearer example, say I’m trying to get some work done, but I’m just sending stupid messages and gifs to whatsapp groups instead. This could be a case where I should be getting the work done and should turn off notifications on my phone, but it could be an unresolved fear of the consequences of failure. It could also be that I’m exhausted and there’s no point working. It could even be that I feel isolated and need some interaction!

Does that make the case I’m talking about a bit clearer?

Comment

As a hopefully clearer example, say I’m trying to get some work done, but I’m just sending stupid messages and gifs to whatsapp groups instead. This could be a case where I should be getting the work done and should turn off notifications on my phone, but it could be an unresolved fear of the consequences of failure. It could also be that I’m exhausted and there’s no point working. It could even be that I feel isolated and need some interaction!You might also think your work is stupid and pointless on top of all the other stuff, cf. the Procrastination Equation; that part’s important too, because you might also have narratives around the kind of work that you "should" be doing, underneath which are fears that if you don’t do the work you "should" be doing then something terrible will happen.

Comment

I think have a bug of this form, and it’s been an issue for me for a long time.
When it’s school break or a huge weight has been lifted off my shoulders, I find that it’s peaceful for me to study mathematics because I feel like I’m not being forced to study certain things. But as soon as school starts, and I take a math class where the material starts to become unfamiliar to me , and there’s no motivation being provided for the material as to why we are doing what we we’re doing, it feels forced and so I become stymied by listening to music and browsing articles on psychology/​cognitive science, reddit ,or lesswrong to figure out this burnout , and then bouncing back to thinking whether I should suppress my curiosity to do well in the class , or should I let my curiosity run free but in return not doing so great in the class.
Because if I don’t suppress my curiosity and hence flow with mathematics, then I feel like I’ll run the risk of diverging away from the course material, which in turn I’ll do badly on exams because I didn’t focus on the required topics enough.
I’ve taken some proof based courses like real analysis, so it’s not like I don’t know how to prove some things in a typical traditional math undergrad course . It’s just that I feel guilty not focusing on my school , so I retreat to listening to music or reading psychology or browsing lesswrong articles to escape from these negative feelings on whether I should focus exclusively on the math in the class or playing with math I find interesting but at the risk of performing poorly in my class. I know these two activities don’t have to be mutually exclusive: you can play with math you find interesting that’s been assigned by the professor. However, the math assigned by the professor might not be interesting sometimes at first, so I burnout from being bombarded by "should" statements. Any input/​advice/​guidance from anyone here would be greatly appreciated. I’ve been having trouble fixing this bug alone.

Comment

My suggestion is that if it feels effortful to do well enough in the class to get the grade you want, then drop it. School is terrible. If you want to optimize for credentials then you can do it by taking the easiest classes that allow you to graduate in the major you want, and you can optimize for credentials completely separately from your actual learning.