Causality and determinism in social science—An investigation using Pearl’s causal ladder

https://www.lesswrong.com/posts/mgA5JQnNeJGJoRaEJ/causality-and-determinism-in-social-science-an-investigation

Contents

Rung 3 - Individual-level causality

The defining feature of rung 3 causality is determinism. Some initial variables go in, and an outcome comes out. The determinism makes it possible to evaluate what outcome you would have had if some of the variables had been different. The basic building blocks of rung 3 causal models are structural equations, which are equations of the form: *z *:= f(x, y) That is, structural equations are equations which describe a variable z as a function f of some set of causes x, y, …. When you then want to figure out the effects of one of the causes, you do so by computing a counterfactual, where you replace the variable of the cause with a new value—e.g. f(1, y) - f(0, y) tells you the effect on z of changing x from 0 to 1, for a given y level. Why should we care about rung 3 causality? Here’s three reasons:

The inherent nondeterminism of social science

First problem: heterogeneity. Consider, for instance, the case of education improving one’s income. Maybe it improved your income, but at the same time there are some tech startup founders who dropped out of education and got rich creating highly valued tech products. It seems like if they had stayed in the education system for longer, they could plausibly have missed the chance of creating the startup and earning the money. So for them, more education would have reduced their income. Heterogeneity is not necessarily a problem, if you can predict the heterogeneity. But social scientists can’t do that, due to the second problem. Second problem: chaos. Causality is about what would have happened if some variable had been different. But most complex systems are highly sensitive to initial conditions, such that even an infinitesimal change to the initial variables will lead to entirely different outputs. (This, by the way, is the reason that weather forecasts only work for a short timespan.) For an example of how chaos appears, you can see this video of many nearly identical double pendulums, showing how even tiny changes can lead them to completely different trajectories: Social scientists can’t predict people’s lives or society’s trajectory, and a large part of this difficulty is probably due to chaos. But if you can’t predict what actually happens, how could you ever hope to predict how things would be different if some variable had been different? Since rung 3 causality requires deterministic relations, this makes rung 3 impossible.

Defining rung-2 causality

Let’s consider a concrete example; the genetic heritability of intelligence. I’ll ignore the question of how to measure intelligence—a topic for another blog post—and just assume we have some reasonable measure. Though as mentioned in the start of the post, almost any social science topic would be applicable here. Social scientists claim that intelligence has a substantial genetic component. But this doesn’t make sense in terms of rung 3 causality, because we cannot separate the genetic component from the environmental component, and because intelligence is probably too chaotic to be predictable. But imagine that we did have a rung 3 causal model of intelligence. We might imagine that it looks like this intelligence := f(genes, environment) With f being some unknown, presumably very complicated function, which describes the results of the development of human intelligence. In this hypothetical, I’ve split the inputs to f into genes and environment, which you might think of as referring to the DNA passed onto you by your parents at conception, vs the state of the rest of the universe at that time. The "inputs" to f could be split in many ways other than this, and a complete theory would probably split them in a very different way than I describe here, but it’s just a hypothetical example. If f was linear, it would in principle be straightforward to attribute causation; just use the coefficients associated with each input as your quantities for the strength of causation. (In practice, figuring out the linear coefficients for f would be a difficult undertaking in itself.) But in reality, f is probably nonlinear and chaotic; you can’t just separate genetics and environment, and you can’t predict someone’s intelligence before they’re born, decades in advance of testing it. This not only makes it hard to figure out what f is, it also makes it hard to define the effect of someone’s genes on their intelligence. Suppose for instance you have two people:

Estimating rung-2 effects

Of course, these sorts of definitions are not immediately very useful. After all, it’s not clear that the previously defined average effect can be estimated through any method other than figuring out the rung 3 deterministic causal model f, and computing the appropriate averages. But this is where statistics and the field of causal inference comes in. For instance, suppose hypothetically your system is determined as an unknown function f of exactly two variables: z := f(x, y) In this case, if you have a distribution of data (x, y, z) where x and y are statistically independent, you can estimate the average effect of x and y using linear regression, without knowing the details of f. This allows you to bypass the problem of a potentially highly complex and unpredictable function, and get an idea of some rough pattern of effects. Of course, often things won’t be as straightforward as this; often the different input variables won’t be statistically independent, and often we won’t even know what all of the different input variables are. The causal inference literature has come up with a toolbox of different methods that can be used under different conditions, though often they cannot quite estimate the average causal effects, but instead only some modified version (which may still tell you about causality, but in a way which is only guaranteed to be accurate for a certain subgroup). In the specific case of untangling the aggregate effects of genes, there has been invented a rich set of methods, including twin studies, adoption studies, within-family polygenic score regression, relatedness disequilibrium regression, and more. All of these have different advantages and disadvantages, and they all rely on different assumptions and provide different kinds of bounds. The complexity in the previous paragraph gives another important lesson: There is an incredible amount of detail to learn about each topic where you are trying to investigate rung 2 causal effects. You need to learn the assumptions that each method makes, and then learn how plausible those assumptions are (both a priori and according to studies which directly investigate the assumptions), and how big of a skew it would introduce if the assumptions were broken.

In the limit, rung 2 becomes rung 3

Couldn’t we just skip past rung 2 and go directly to rung 3? Not really; as mentioned earlier, chaos might mean that we can’t ever reach rung 3 in social science. But also, even if it wasn’t for chaos, the main way it seems like we could start approaching rung 3 would be to investigate each factor with rung 2 methods, until we find a set of factors that are sufficiently powerful so as to allow us to predict the outcome in a rung 3 manner. After all, we don’t just magically know what variables are relevant until we start investigating, and we can’t look at all variables in the universe at once. As such, if we want to do iterative science, we need to start with rung 2 causality and build upwards, maybe eventually reaching rung 3 under certain circumstances. Thanks to Justis Mills for proofreading and feedback about the coherence of the post.

Comment

https://www.lesswrong.com/posts/mgA5JQnNeJGJoRaEJ/causality-and-determinism-in-social-science-an-investigation?commentId=wTu8r5jfzDyYXrJqj

Let’s start with some philosophy of causality, specifically a quick review of > Judea Pearl’s Ladder of Causality.Is this from a particular book by Judea Pearl?

Comment

https://www.lesswrong.com/posts/mgA5JQnNeJGJoRaEJ/causality-and-determinism-in-social-science-an-investigation?commentId=QX8qRBgN7bJhcBfQo

I’m pretty sure that picture is from the Book of Why

https://www.lesswrong.com/posts/mgA5JQnNeJGJoRaEJ/causality-and-determinism-in-social-science-an-investigation?commentId=oThSmKszyPv6fkFeB

I haven’t read any of Judea Pearl’s books, only various papers he and others have written on causality, some encyclopedia stuff, his twitter and blog. I assume the Ladder of Causality is discussed in the Book of Why, but I don’t know for sure, and I don’t know whether this is the only book in which it has been discussed.

Comment

I asked, partly because I’ve seen some of his lecture notes posted online, that sort of thing, and was wondering if you had a link. (I figured other, more mundane examples might make things a little more clear.)

https://www.lesswrong.com/posts/mgA5JQnNeJGJoRaEJ/causality-and-determinism-in-social-science-an-investigation?commentId=g6Ki9zqKX46A8EGYt

Determinism is not a defining feature of counterfactuals, you can make a stochastic theory of counterfactuals that is a strict generalisation of SEM-style deterministic counterfactuals. See Pearl, Causality (2009), p. 220 "counterfactuals with intrinsic nondeterminism" for the basic idea. It’s a brief discussion and doesn’t really develop the theory but, trust me, such a theory is possible. The basic idea is contained in "the mechanism equations u_i = f_i(pa_i, u_i ) lose their deterministic character and hence should be made stochastic."

Comment

https://www.lesswrong.com/posts/mgA5JQnNeJGJoRaEJ/causality-and-determinism-in-social-science-an-investigation?commentId=w8JaoJQyJKnzmmjrr

This doesn’t work in social science. Quoting the book:

This evaluation can, of course, be implemented in ordinary causal Bayesian networks (i.e., not only in ones that represent intrinsic nondeterminism), but in that case the results computed would not represent the probability of the counterfactual Yx = y. Such evaluation amounts to assuming that units are homogeneous, with each possessing the stochastic properties of the population—namely, P( Vi I pai , u) = P( Vi I pai). Such an assumption may be adequate in quantum-level phenomena, where units stands for specific experimental conditions, but it will not be adequate in macroscopic phenomena, where units may differ appreciably from each other. Emphasis added.

Comment

Pearl is distinguishing "intrinsically nondeterministic" from "ordinary" Bayesian networks, and he is saying that we shouldn’t mix up the two (though I think it would be easier to avoid this with a clearer explanation of the difference). Three questions:

  • Do we need determinism to define counterfactuals? No

  • Is uncertainty represented in causal Bayesian networks typically used in social science limited to "intrinsic nondeterminism"? No, and so we should be careful not to mix them up with "intrinsically nondeterministic" Bayesian networks

  • Is there no intrinsic nondeterminism in any causal Bayesian network relevant to social science? I doubt it More importantly, the thing you can do with counterfactual models that you can’t do with "ordinary" causal Bayesian networks is you can condition on the results of an action and then change the action (called "abduction"; this is why it would be helpful to have a better explanation of the difference between the two!). This will usually leave a bunch of uncertainty about stuff, which may or may not be intrinsic. You definitely don’t need determinism to do abduction, and I submit that our attitude towards the question of whether the leftover uncertainty is "intrinsic" should often be the same as our usual attitude toward this question: who cares?

Comment

  • Is there no intrinsic nondeterminism in any causal Bayesian network relevant to social science? I doubt it The intrinsic quantum nondeterminism probably mostly gets washed away due to enormous averages. Of course chaos theory means that it eventually gets relevant, but by the time it gets to relevant magnitudes, the standard epistemic uncertainty has already overwhelmed the picture. So I think in comparison to standard epistemic uncertainty, any intrinsic nondeterminism will be negligible in social science.

More importantly, the thing you can do with counterfactual models that you can’t do with "ordinary" causal Bayesian networks is you can condition on the results of an action and then change the action (called "abduction"; this is why it would be helpful to have a better explanation of the difference between the two!). This will usually leave a bunch of uncertainty about stuff, which may or may not be intrinsic. I know and agree.

  • Do we need determinism to define counterfactuals? No [...] You definitely don’t need determinism to do abduction, and I submit that our attitude towards the question of whether the leftover uncertainty is "intrinsic" should often be the same as our usual attitude toward this question: who cares? I sort of have some problems with/​objections to counterfactuals in the presence of intrinsic nondeterminism. E.g.Y_{X=X} might not be equal to Y (and per chaos theory, would under many cicumstances never be equal or even particularly close). But since intrinsic nondeterminism isn’t relevant for social science anyway, I just skipped past them.

Comment

Does chaos theory apply at the micro-scale (quantum phenomena) or at the macro-scale?

Comment

Chaos theory applies at all scales. It turns micro-scale uncertainty into macro-scale uncertainty at an exponential rate.