Musings on Cumulative Cultural Evolution and AI

https://www.lesswrong.com/posts/K686EFdXysfRBdob2/musings-on-cumulative-cultural-evolution-and-ai

Contents

Cumulative Cultural Evolution

Humans have altered more than one-third of the earths’ land surface. We cycle more nitrogen than all other terrestrial life forms combined and have now altered the flow of two-thirds of the earth’s rivers. Our species uses 100 times more biomass than any large species that has ever lived. If you include our vast herds of domesticated animals, we account for more than 98% of terrestrial vertebrate biomass. — Joseph HenrichCumulative cultural evolution is often framed as an answer to the general question: what makes humans so successful relative to other apes? Other apes do not occupy as many continents, are not as populous, and do not shape or use environment to the extent that humans do. There are a smorgasbord of different accounts of this, referencing:

Assumptions

Muthukrishna and co make the following assumptions:

Lifecycle

To tease out the impact of these different pressures Muthukrishna and co run a move agents through a lifecycle of BIRTH, LEARNING, MIGRATION, and SELECTION. These steps are straightforward, agents are born, spend time learning asocially or socially, migrate between groups, and then are selected according to the amount of adaptive knowledge they acquired, costliness of their brain size, and the environmental payoff.

Parameters

The model includes the following parameters:

These parameters are modified for different agents and groups.

Results

What is the result of the simulation? The simulation outputs out the following causal relationships:

See 19-22 for relationship strengths and caveats. The general model provides a rather neat picture. Crucially, there are positive feedback loops between larger brains and social learning in the right environment. This in turn pushes towards longer juvenile periods and larger groups.

Cumulative Cultural Evolution

Under the right parameter values, Muthukrishna and co saw that a species which undergoes something like cumulative cultural evolution can be generated. Recall the parameters of the model:

These parameter values offer an explanation of human brain size, social learning capabilities, and general success. Moreover, they may also explain why species like humans are very low in number. In order for there to be a species that invests in social learning at a very high rate:

AI Forecasting and Development

Now, what, if anything, is the import of this for AI forecasting and development? There’s an argument that’s been floating awhile for sometime that goes something like this: Humans are vastly more successful in certain ways than other hominids, yet in evolutionary time, the distance between them is small. This suggests that evolution induced discontinuous progress in returns, if not in intelligence itself, somewhere approaching human-level intelligence. If evolution experienced this, this suggests that artificial intelligence research may do also. — > AI ImpactsOne can push back on the argument above with the claims that:

moderate asocial learning ⇒ social learning.One can imagine machine learning work developing sufficient asocial learning techniques and then ratcheting capabilities forward by combining previous work through social learning techniques (potentially via imitation learning or model transfer, but likely much more sophisticated techniques). On this model asocial learning (probably made up by a number of different modules and techniques), enables social learning to become a winning strategy. However, social learning is an independent thing, it is not built on top of asocial learning modules. Related to this, more work needs to be done determining capacities primarily drive human social learning. Henrich suggests that it is both mindreading and imitation. Tomasello’s work stresses mindreading, in particular the ability for humans to develop joint attention . Answers to this issue would provide at least some evidence about what machine learning algorithms are likely to be more successful. Another issue brought up by this work is whether social learning is an instinct, that is (roughly) whether it is encoded by genes and not brought into existence via culture, or whether it is a gadget. A gadget is not encoded by genetic information, but is instead developed by cultural means. Suggestive work by Heyes argues that social learning capabilities could be developed from humans’ temperament, high working memory, and attention capabilities. If this is so, then the development of artificial intelligence sketched above is likely flawed. It may instead look like: temperament + computation power + working memory + attention ⇒ social learningOn this model, not only does asocial learning enable social learning to be a winning strategy, asocial learning capabilities compose social learning abilities. Social learning is really not that different from asocial learning, it just a layer built on top of lower level intelligence systems. Both of these models suggest that AI development is currently bottlenecked on the asocial learning step. However, once a threshold for asocial learning is reached, intelligence will increase at a vastly quick rate. There’s a lot more to do here. I hope to have persuasively motivated the cumulative cultural evolution story and the idea that it has important upshots for AI development and forecasting.

Comment

https://www.lesswrong.com/posts/K686EFdXysfRBdob2/musings-on-cumulative-cultural-evolution-and-ai?commentId=q93pfgjFx3PQM5jjs

Planned summary: A recent paper develops a conceptual model that retrodicts human social learning. They assume that asocial learning allows you adapt to the current environment, while social learning allows you to copy the adaptations that other agents have learned. Both can be increased by making larger brains, at the cost of increased resource requirements. What conditions lead to very good social learning? First, we need high transmission fidelity, so that social learning is effective. Second, we need some asocial learning, in order to bootstrap—mimicking doesn’t help if the people you’re mimicking haven’t learned anything in the first place. Third, to incentivize larger brains, the environment needs to be rich enough that additional knowledge is actually useful. Finally, we need low reproductive skew, that is, individuals that are more adapted to the environment should have only a slight advantage over those who are less adapted. (High reproductive skew would select too strongly for high asocial learning.) This predicts pair bonding rather than a polygynous mating structure. This story cuts against the arguments in Will AI See Sudden Progress? and Takeoff speeds: it seems like evolution "stumbled upon" high asocial and social learning and got a discontinuity in reproductive fitness of species. We should potentially also expect discontinuities in AI development. We can also forecast the future of AI based on this story. Perhaps we need to be watching for the perfect combination of asocial and social learning techniques for AI, and once these components are in place, AI intelligence will develop very quickly and autonomously. Planned opinion: As the post notes, it is important to remember that this is one of many plausible accounts for human success, but I find it reasonably compelling. It moves me closer to the camp of "there will likely be discontinuities in AI development", but not by much. I’m more interested in what predictions about AI development we can make based on this model. I actually don’t think that this suggests that AI development will need both social and asocial learning: it seems to me that in this model, the need for social learning arises because of the constraints on brain size and the limited lifetimes. Neither of these constraints apply to AI—costs grow linearly with "brain size" (model capacity, maybe also training time) as opposed to superlinearly for human brains, and the AI need not age and die. So, with AI I expect that it would be better to optimize just for asocial learning, since you don’t need to mimic the transmission across lifetimes that was needed for humans.

Comment

https://www.lesswrong.com/posts/K686EFdXysfRBdob2/musings-on-cumulative-cultural-evolution-and-ai?commentId=gdPmCKqX2ppLznjaM

Awesome, thanks for the super clean summary. I agree that the model doesn’t show that AI will need both asocial and social learning. Moreover, there is a core difference between the growth of the cost of brain size between humans and AI (sublinear [EDIT: super] vs linear). But in the world where AI dev faces hardware constraints, social learning will be much more useful. So AI dev could involve significant social learning as described in the post.

Comment

Moreover, there is a core difference between the growth of the cost of brain size between humans and AI (sublinear vs linear).Actually, I was imagining that for humans the cost of brain size grows superlinearly. The paper you linked uses a quadratic function, and also tried an exponential and found similar results. But in the world where AI dev faces hardware constraints, social learning will be much more useful.Agreed if the AI uses social learning to learn from humans, but that only gets you to human-level AI. If you want to argue for something like fast takeoff to superintelligence, you need to talk about how the AI learns independently of humans, and in that setting social learning won’t be useful given linear costs. E.g. Suppose that each unit of adaptive knowledge requires one unit of asocial learning. Every unit of learning costs $K, regardless of brain size, so that everything is linear. No matter how much social learning you have, the discovery of N units of knowledge is going to cost $KN, so the best thing you can do is put N units of asocial learning in a single brain/​model so that you don’t have to pay any cost for social learning. In contrast, if N units of asocial learning in a single brain costs $KN^2, then having N units of asocial learning in a single brain/​model is very expensive. You can instead have N separate brains each with 1 unit of asocial learning, for a total cost of $KN, and that is enough to discover the N units of knowledge. You can then invest a unit or two of social learning for each brain/​model so that they can all accumulate the N units of knowledge, giving a total cost that is still linear in N. I’m claiming that AI is more like the former while this paper’s model is more like the latter. Higher hardware constraints only changes the value of K, which doesn’t affect this analysis.

https://www.lesswrong.com/posts/K686EFdXysfRBdob2/musings-on-cumulative-cultural-evolution-and-ai?commentId=kbHg8Aj9bX2TwK5iF

Tomasello’s work stresses mindreading, in particular the ability for humans to carry joint attention [link].

Link seems to be missing.

Comment

https://www.lesswrong.com/posts/K686EFdXysfRBdob2/musings-on-cumulative-cultural-evolution-and-ai?commentId=izgCq9zvwniEZihd4

Thanks, fixed.