Preserving and continuing alignment research through a severe global catastrophe

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a

Contents

Introduction

There is some chance that civilization will cease to function before we hit an intelligence explosion. If it does, it would be good to preserve existing alignment research for future generations who might rebuild advanced technology, and ideally have safe havens ready for current and future researchers to spend their lives adding to that pool of knowledge. This might delay capabilities research by many decades, centuries, or longer while allowing basic theoretical alignment research to continue, and so be a potential Yudkowskian positive model violation for which we should prepare. Setting this infrastructure up is a massively scalable intervention, and one that should likely be tackled by people who are not already on the researcher career path. It would have been good to get started some years ago given recent events, but now is the second best time to plant a tree.[1]

Preserving alignment knowledge through a global catastrophe

What data do we want to store?

Thankfully, the EleutherAI people are working on a dataset of all alignment research[2]. It’s still a WIP[3] and contributions to the scripts to collect it are welcome, so if you’re a programmer looking for a shovel ready way to help with this then consider submitting a PR[4].

How do we want to store it?

My shallow dive into this uncovered these options:

Each has advantages, so some combination of them might be ideal.

Where do we store it?

Having many redundant backups seem advisable, preferably protected by communities which can last centuries or in locations which will not be disturbed for a very long time. Producing "alignment backup kits" to send out and offering microgrants to people all around the world to place them in secure locations would achieve this. We’d likely want basic (just pre-collapse work) and advanced (capable of adding archives for a long time post-collapse) options. If you’d like to take on the challenge of preparing these kits, storing an archive, or coordinating things, please join the Alignment After A GCR Discord (AAAG). I’m happy to collaborate and give some seed funding. If you want to help collect and improve the archive files, #accelerating-alignment on EAI is the place to go.

Continuing alignment research after a global catastrophe

It is obviously best if as many people survive the GCR as possible, and supporting the work of organizations like the Alliance to Feed the Earth in Disasters seems extremely valuable. However, a targeted intervention to focus on allowing alignment researchers to continue their work in the wake of a disaster might be an especially cost-effective way to improve the long-term future of humanity.

Evacuation plans

A list of which researchers to prioritize would need to be drawn up.[7] They would need instructions on how to get to the haven, ideally someone with reliable transport to take them there. In case of moments of extreme risk, they would be encouraged to preemptively (and hopefully temporarily) move to the haven.

Designing havens

The locations would need to be be bought, funded, and partially populated before the GCR.[8] I have some ideas about which other subcultures might be good to draw from, with the Authentic Relating community top of the list.[9] The havens would need to be well-stocked to weather the initial crisis and recover after. They should be located in places where farming or fishing could produce a surplus in the long term to allow some of the people living there to spend much of their time making research progress. Being relatively far from centers of population seems beneficial, but close enough to major hubs that transport is practical. There are many considerations, and talking to ALLFED to get their models of how to survive GCRs seems like an obvious first step to plan this. Avoiding the failure mode of allowing so many people to join that the whole group goes under would be both challenging and necessary. Clear rules would have to be agreed on for who could join. The culture would need to be set up to be conducive to supporting research in the long term while being mostly self-sufficient, this would be an interesting challenge in designing community. People with the skills to produce food and other necessities would need to be part of the team.

Call to action

Even more than archiving, this needs some people to make it their primary project in order for it to happen. That could include you! I would be happy to provide advice, mentorship, connections, and some seed funding to a founder or team who wants to take this project on.[10] Message me here or @A_donor on the Discord. This project could also benefit from volunteers for various roles. If you or someone you know would like to help by

Comment

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=mJzP3EKAYi7E3n4LB

I wrote about this on EA Forum a few days ago. I’m glad others are starting to think about this. I do think archiving all existing alignment work is very important and perhaps equally important as efforts to keep alive people who represent existing experts & talent in the field. It would be much better for them to be able to continue their work than for new people to attempt to pick off where they left off, especially since many things like intuitions honed over time etc. may not be readily learnable.

I’m increasingly inclined to think that a massive "shock" in the near future (like a nuclear war or a severe pandemic) which effectively halts economic progress, perhaps for a few decades or more, then restarts it at a lower baseline, may be one of the few remaining scenarios we can reasonably expect to survive AGI, taking into account the grim present strategic situation as Eliezer outlined in the recent sequence. Such a world might especially favour alignment since AI work (prosaic AI especially) seems to be much more capital intensive than alignment work, so in a post-shock world with less capital available it would be disadvantaged or impossible to continue carrying out at all. There are a few other reasons such a catastrophic shock may actually increase our collective odds of success re: AI risk, such as a greatly reduced population implying fewer AGI projects & race pressures, etc., morbid as it is.

Given this, the OP’s project is doubly important.

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=ctd3CRyjz32JBZ7ii

Interesting! One potential downside my mind immediately goes to is public perception, in the (hopefully probable) case that such a contingency plan isn’t needed. In popular culture, the idea of a privileged (usually very wealthy) class of people escaping to an "ark" as the world ends for everyone else is generally considered a classic evil villain trope. For instance, in Don’t Look Up (a recent Hollywood blockbuster involving a GCR), the good guy scientists are offered a refuge in the evil president’s secret escape spaceship, but refuse. This is presented as the heroic and correct thing to do, even though refusing was an effective act of suicide (within the context of the movie). Not that your idea is actually in any way a bad one, but I would wager that the similarities between your proposition and what evil Hollywood villains stereotypically do is likely to increase the public perception of EA folks being cult-like (if your plan captures any press attention), which could potentially drive talent away, and discourage outsiders from cooperating with the community. All that being said, this is ultimately a rather minor concern compared to, say, the possibility of human extinction, so take the above with a grain of salt. If you do plan on going ahead with this on a large scale, I would definitely talk to some people outside the community with PR experience, so as to minimize any possible negative social effects. Good luck!!!

Comment

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=frv5AcSm33jQiRgga

I’m hopeful that most people would see the difference between "rich people trying to save their own skin" and "allowing researchers who are trying to make sure humanity has a long-term future at all to continue their work", but I would be very happy to have leads on who to talk to about presenting this well.

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=tcar62kFEFFwFDLCf

EA is already committing to caring about AI alignment, and spending large sums of money in what would be otherwise perceived as unusual things. EA will also inevitably need to compete against others for finite sources of power* - including other charities and non-EA causes—if it must fulfill its aims better. I doubt its possible to hide this forever, even if it were desirable to do so (which idk). *civ ark isn’t that finite tbh, you can carry all sorts of research and researchers not just AI alignment work. But other sources of power are even more finite, such as being able to influence govt policy.

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=vjPtqaFzk7e7zqBJW

(x-post from EA forum)

Nice post, but if I may add: It’s not just alignment research that needs to be preserved. For instance here’s Linch Zhang’s comment on civilisational restart manuals. Would be cool to have one coordinated megaproject on all aspects of this.

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=mBNGhFQs7iX4rZkMB

Various forms of embossing/​etc on metal sheeting can also be decent, although beware the tradeoff of ‘cheap metals corrode; expensive metals have a tendency to get melted down because they are expensive’.

Comment

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=oESRxWgndLi9WbKuf

Stainless steel is not that expensive, and pretty corrosion resistant. Although laser etched glass may be a better option.

Comment

Stainless steel is an option. It does still corrode long-term[1]. It works fine over decade-to-century timescales for structural applications[2]; I don’t know if we can trust it to retain fine details[3] over long timescales[4]. Laser etched glass is interesting, though brittle.

  • ^Said corrosion is slow, especially in proper conditions (in still dry air, no other metals around for galvanic corrosion, etc); it is not, and cannot be, non-existent.

  • ^...most of the time. Salt water destroys everything.

  • ^1mm of corrosion in a 2cm-deep structural member is far less of a problem than 1mm of corrosion on 0.1mm-deep lettering.

  • ^The longest study I found on atmospheric exposure of stainless steel was 10 years. Somewhat surprising considering that stainless steel has been around for ~180y at this point (1840s or so).

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=fhCLvwPpufpeE4JfL

Footnote #5 seems as if it cuts off too soon.

Comment

https://www.lesswrong.com/posts/xrxh3usuoYMckkKom/preserving-and-continuing-alignment-research-through-a?commentId=rbmDhmPhoBzbCxyCG

Fixed, thanks.