Link post Science aims to come up with good theories about the world—but what makes a theory good? The standard view is that the key traits are predictive accuracy and simplicity. Deutsch focuses instead on the concepts of explanation and understanding: a good theory is an explanation which enhances our understanding of the world. This is already a substantive claim, because various schools of instrumentalism have been fairly influential in the philosophy of science. I do think that this perspective has a lot of potential, and later in this essay explore some ways to extend it. First, though, I discuss a few of Deutsch’s arguments which I don’t think succeed, in particular when compared to the bayesian rationalist position defended by Yudkowsky.To start, Deutsch says that good explanations are "hard to vary", because every part of the explanation is playing a role. But this seems very similar to the standard criterion of simplicity. Deutsch rejects simplicity as a criterion because he claims that theories like "The gods did it" are simple. Yet I’m persuaded by Yudkowsky’s argument that a version of "The gods did it" theory which could actually predict a given set of data would essentially need to encode all that data, making it very complex. I’m not sold on Yudkowsky’s definition of simplicity in terms of Kolmogorov complexity (for reasons I’ll explain later on) but re-encoding a lot of data should give rise to a complex hypothesis by any reasonable definition. So it seems most parsimonious to interpret the "hard to vary" criterion as an implication of the simplicity criterion.Secondly, Deutsch says that good explanations aren’t just predictive, but rather tell us about the underlying mechanisms which generate those predictions. As an illustration, he argues that even if we can predict the outcome of a magic trick, what we really want to know is how the trick works. But this argument doesn’t help very much in adjudicating between scientific theories—in practice, it’s often valuable to accept purely predictive theories as stepping-stones to more complete theories. For example, Newton’s inverse square law of gravity was a great theory despite not attempting to explain why gravity worked that way; instead it paved the way for future theories which did so (and which also made better predictions). If Deutsch is just arguing that eventually science should aim to identify all the relevant underlying mechanisms, then I think that most scientific realists would agree with him. The main exception would be in the context of foundational physics. Yet that’s a domain in which it’s very unclear what it means for an underlying mechanism to "really exist"; it’s so far removed from our everyday intuitions that Deutsch’s magician analogy doesn’t seem very applicable.Thirdly, Deutsch says that we can understand the importance of testability in terms of the difference between good and bad explanations:
"The best explanations are the ones that are most constrained by existing knowledge – including other good explanations as well as other knowledge of the phenomena to be explained. That is why testable explanations that have passed stringent tests become extremely good explanations." But this doesn’t help us distinguish between explanations which have themselves been tested, versus explanations which were formulated afterwards to match the data from those same tests. Both are equally constrained by existing knowledge—why should we be more confident in the former? Without filling in this step of the argument, it’s hard to understand the central role of testability in science. I think, again, that Yudkowsky provides the best explanation: that the human tendency towards hindsight bias means we dramatically overestimate how well our theories explain observed data, unless we’re forced to make predictions in advance.Having said all this, I do think that Deutsch’s perspective is valuable in other ways. I was particularly struck by his argument that the "theory of everything" which fundamental physicists search for would be less interesting than a high-level "theory of everything" which forges deep links between ideas from many disciplines (although I wish he’d say a bit more about what it means for a theory to be "deep"). This argument (along with the rest of Deutsch’s framework) pushes back against the longstanding bias in philosophy of science towards treating physics as the central example of science. In particular, thinking of theories as sets of equations is often appropriate for physics, but much less so for fields which are less formalism-based—i.e. almost all of them.[0] For example, the theory of evolution is one of the greatest scientific breakthroughs, and yet its key insights can’t be captured by a formal model. In Chapman’s terminology, evolution and most other theories are somewhat nebulous. This fits well with Deutsch’s focus on science as a means of understanding the world—because even though formalisms don’t deal well with nebulosity, our minds do.Another implication of the nebulosity of scientific theories is that we should move beyond the true-false dichotomy when discussing them. Bayesian philosophy of science is based on our credences about how likely theories are to be true. But it’s almost never the case that high-level theories are totally true or totally false; they can explain our observations pretty well even if they don’t account for everything, or are built on somewhat leaky abstractions. And so assigning probabilities only to the two outcomes "true" and "false" seems simplistic. I still consider probabilistic thinking about science to be valuable, but I expect that thinking in terms of degrees of truth is just as valuable. And the latter comes naturally from thinking of theories as explanations, because we intuitively understand that the quality of explanations should be evaluated in a continuous rather than binary way.[1]Lastly, Deutsch provides a good critique of philosophical positions which emphasise prediction over explanation. He asks us to imagine an "experiment oracle" which is able to tell us exactly what the outcome of any specified experiment would be: "If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction – even perfect, universal prediction – is simply no substitute for explanation." Although I assume it isn’t intended as such, this is a strong critique of Solomonoff induction, a framework which Yudkowsky defends as an idealised model for how to reason. The problem is that the types of hypotheses considered by Solomonoff induction are not explanations, but rather computer programs which output predictions. This means that even a hypothesis which is assigned very high credence by Solomonoff induction might be nearly as incomprehensible as the world itself, or more so—for example, if it merely consists of a simulation of our world. So I agree with Deutsch: even idealised Solomonoff induction (with infinite compute) would lack some crucial properties of explanatory science.[2]
Extending the view of science as explanation
How could Deutsch’s identification of the role of science as producing human-comprehensible explanations actually improve science in practice? One way is by making use of the social science literature on explanations. Miller identifies four overarching lessons:
-
Explanations are contrastive — they are sought in response to particular counterfactual cases.
-
Explanations are selected (in a biased manner) - humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation.
-
Referring to probabilities or statistical relationships in explanation is not as effective as referring to causes.
-
Explanations are social — they are a transfer of knowledge, presented as part of a conversation or interaction, and are thus presented relative to the explainer’s beliefs about the explainee’s beliefs. We can apply some of these lessons to improve scientific explanations. Consider that scientific theories are usually formulated in terms of existing phenomena. But to formulate properly contrastive explanations, science will need to refer to counterfactuals. For example, in order to fully explain the anatomy of an animal species, we’ll need to understand other possible anatomical structures, and the reasons why those didn’t evolve instead. Geoffrey West’s work on scaling laws in biology provides a good example of this type of explanation. Similarly, we shouldn’t think of fundamental physics as complete until we understand not only how our universe works, but also which counterfactual laws of physics could have generated other universes as interesting as ours.A second way we can try to use Deutsch’s framework to improve science: what does it mean for a human to understand an explanation? Can we use findings from cognitive science, psychology or neuroscience to make suggestions for the types of theories scientists work towards? This seems rather difficult, but I’m optimistic that there’s some progress to be made. For example, analogies and metaphors play an extensive role in everyday human cognition, as highlighted by Lakoff’s Metaphors we live by. So instead of thinking about analogies as useful ways to communicate a scientific theory, perhaps we should consider them (in some cases) to be a core part of the theory itself. Focusing on analogies may slightly reduce those theories’ predictive power (because it’s hard to cash out analogies in terms of predictions) while nevertheless increasing the extent to which they allow us to actually understand the world. I’m reminded of the elaborate comparison between self-reference in mathematics and self-replication in biology drawn by Hofstadter in Godel, Escher, Bach—if we prioritise a vision of science as understanding, then this sort of work should be much more common. However, the human tendency towards hindsight bias is a formidable opponent, and so we should always demand that such theories also provide novel predictions, in order to prevent ourselves from generating an illusion of understanding.[0]. As an example of this bias, see the first two perspectives on scientific theories discussed here; my position is closest to the third, the pragmatic view.[1]. Work on logical induction and embedded agency may partly address this issue; I’m not sure.[2]. I was originally planning to go on to discuss Deutsch’s broader critiques of empiricism and induction. But Deutsch makes it hard to do this, because he doesn’t refer very much to the philosophical literature, or specific people whose views he disagrees with. It seems to me that this leads to a lot of linguistic disagreements. For example, when he critiques the idea of knowledge being "derived" from experience, or scientific theories being "justified" by empirical experience, I feel like he’s using definitions of these terms which diverge both from what most people take them to mean, and also from what most philosophers take them to mean. Nor do I think that his characterisation of observation as theory-laden is inconsistent with standard inductivism; he seems to think it is, but doesn’t provide evidence for that. So I’ve decided not to go deeper on these issues, except to note my skepticism about his position.
Indeed. Solomonoff Inductors contain computer programmes, not explanations, not hypotheses and not beliefs. That makes it quite hard to understand the sense in which they are dealing with probability.
Probability of what? Hypotheses and beliefs have a probability of being true, of succeeding in corresponding. What does it mean to say that one programme is more probable than another? That it is short? A shorter bitstring is more likely to be found in a random sequence, but what has that to do with constructing a true model of the universe?
If you are dealing with propositions instead of programmes, it is easy to explain the relationship between simplicity and probability-of-corrrsponding-to-reality: the probability of a small conjunction of propositions is generally higher than the probability of a large number.
Comment
Comment
This old post seems relevant. (No need to reply, just putting it here for posterity.)
Comment
So..someone pointed out the shortcomings of a core LW belief 10 years ago...and nothing much happened. In accordance with the usual pattern. As I keep saying. But nothing much happens when I do that , either.
Comment
I’m sympathetic, as this has happened to me too. Have you considered writing a post laying our your beliefs? I expect that this will be easier than comments for people to substantively engage with.
Ok, but that isn’t answering the question. I know that shortness is the criterion for saying that a programme is probable . The question is about the upshot, what that means...other than shortness. If the upshot is that a short programme is more likely to correspond to reality, then SI is indeed formalised epistemology. But why should an SI have the ability to correspond to reality, when the only thing it is is designed to do is predict observations? And how can a programme correspond when it is not semantically interpretable?
Maybe it’s a category error to say of programmes that they have some level of probability.
Comment
Comment
To summarise, I interpret TAG as saying something like "when SI assigns a probability of x to a program P, what does that mean; how can we cash that out in terms of reality? And Vaniver is saying "It means that, if you sum up the probabilities assigned to all programs which implement roughly the same function, then you get the probability that this function is ‘the underlying program of reality’". I think there are three key issues with this response (if I’ve understood it correctly):
It is skipping all the hard work of figuring out which functions are roughly the same. This is a difficult unsolved (and maybe unsolveable?) problem, which is, for example, holding back progress on FDT.
It doesn’t actually address the key problem of epistemology. We’re in a world, and we’d like to know lots of things about it. Solomonoff induction, instead of giving us lots of knowledge about the world, gives us a massive Turing machine which computes the quantum wavefunction, or something, and then outputs predictions for future outputs. For example, let’s say that previous inputs were the things I’ve seen in the past, and the predictions are of what I’ll see in the future. But those predictions might tell us very few interesting things about the world! For example, they probably won’t help me derive general relativity. In some sense the massive Turing machine contains the fact that the world runs on general relativity, but accessing that fact from the Turing machine might be even harder than accessing it by studying the world directly. (Relatedly, see Deutsch’s argument (which I quote above) that even having a predictive oracle doesn’t "solve" science.)
There’s no general way to apply SI to answer a bounded question with a sensible bounded answer. Hence, when you say "you can make your stable of hypotheses infinitely large", this is misleading: programs aren’t hypotheses, or explanations, in the normal sense of the word, for almost all of the questions we’d like to understand.
Comment
I agree with those issues. I think the way you expressed issue 3 makes it too much of a clone of issue 1; if I tell you the bounds for the question in terms of programs, then I think there is a general way to apply SI to get a sensible bounded answer. If I tell you the bounds in terms of functions, then there would be a general way to incorporate that info into SI, *if *you knew how to move between functions and programs. The way I think about those issues that (I think?) separates them more cleanly is that we both have to figure out the ‘compression’ problem of how to consider ‘models’ as families of programs (at some level of abstraction, at least) and the ‘elaboration’ problem of how to repopulate our stable of candidates when we rule out too many of the existing ones. SI bypasses the first and gives a trivial answer to the second, but a realistic intelligence will have interesting answers to both.
And it’s also unclear, to say the least , that the criterion that an SI uses to prefer and discard hypotheses/programmes actually is a probability, despite being labelled as such.
You still haven’t answered the question "probability of what?".
You have a process that assigns a quantity to a thing. The details of how the quantity gets assigned are not the issue. The issues are whether the quantity, which you have called a probability , actually is probability , and whether the thing you at treating as a model of reality, is actually a such a model , in the sense of scientific realism, or merely something that churns out predictions, in the sense of instrumentalism.
Labelling isn’t enough.
You haven’t shown that it has any such ability. Prediction is not correspondence.
...and casually equating programmes and hypotheses and casually equating prediction and correspondence...
The fact that bayesians don’t have an containing every possible hypothesis, combined with the fact that they also dont have a method of hypothesis w formation is a problem...but it’s not the problem I am talking about today.
″ why should an SI have the ability to correspond to reality, when the only thing it is is designed to do is predict observations?"
You still haven’t told me. It’s possible for a predictive theory to fail to correspond, so there is no link of necessity between prediction and correspondence.
Maybe it’s a category error to say of programmes that they have some level of probability.
What I care about is finding a correct ontological model of reality. Caring about which programmes predict upcoming tokens is a means to that end. There is a well defined and conventional sense in which upcoming tokens have a certain probability, because the arrival of a token is an event, and conventional probability theory deals with events.
But the question is about the probability of the programme itself.
Even if programmes are actually making distinct claims about reality , which has not been shown, then some "integration" of different programmes is not going to be a clear model!
No. In general it’s possible for completely different algorithms to produce equivalent results.
But are you saying that the "deep structure" is the ontologcal content?
Comment
Comment
What are the hard and easy problems? Realism and instrumentalism? I haven’t said that SI is incapable of instrumentalism (prediction). Indeed, that might be the only thing it can do.
I think the mathematical constraints are clearly insufficient to show that something is a probability, even if they are necessary. If I have a cake of 1m^2, and I cut it up. Then the pieces sum to 1. But pieces of cake aren’t probabilities.
So every hypothesis has the same probability of "not impossible". Well, no, several times over. You haven’t shown that programmes are hypotheses, and what an SI is doing is assigning different non zero order probabilities, not a uniform one, and it is doing so based on programme length, although we don’t know that reality is a programme, and so on.
Do you think scientists are equally troubled?
Even if I no longer have a instrumental need for something, I can terminally value it.
But it isn’t about me.
The rational sphere in general value realism, and make realistic claims. Yudkowsky has made claims about God not existing, and MWI being true that are explicitly based on SI style reasoning. So the cat is out of the bag… SI cannot be defended as something that was only ever intended as an instrumentalist predictor without walking back those claims.
You’.re saying realism is an illusion? Maybe that’s your philosophy, but it’s not the less wrong philosophy.
It’s obvious that it could, but so what?
Comment
Comment
SI cannot generate realistic hypotheses about uncomputable universes , but it doesn’t follow that it can generate realistic hypotheses about computable universes.
The fact that an SI must sort and filter candidate functions does not mean it s doing so according to probability.
Given the assumptions that you have an infinite number of prgrammes, and that you need to come to a determinate result in finite time, then you need to favour shorter programmes. That’s a reasonable justification for the operation of an SI which happens to have nothing to do truth or probability or reference or realism. (You lapsed into describing the quantity an SI sorts programmes by as "probability"...that has not, of course, been established)
You haven’t shown that an SI is capable of anything deep and territorial. After all,it’s only trying to predict observations.
Comment
I currently don’t see all that much value in responding to "You haven’t shown / established" claims; like, SI is what it is, you seem to have strong opinions about how it should label particular things, and I don’t think those opinions are about the part of SI that’s interesting, or about why it’s only useful as a hypothetical model (I think attacks from this angle are more compelling on that front). If you’re getting value out of this exchange, I can give responding to your comments another go, but I’m not sure I have new things to say about the association between observations and underlying reality or aggregation of possibilities through the use of probabilities. (Maybe I have elaborations that would either more clearly convey my point, or expose the mistakes I’m making?)
Comment
It isn’t any a tool for anybody because it’s uncomputable. Whatever interest it has must be theoretical.
I’m responding to claims that SI can solve long standing philosophical puzzles such as the existence of God or the correct interpretation of quantum mechanics. The claims have been made, and they have been made here but they may not have been made by you.
Comment
Comment
The claim has been made , even if you don’t believe it.
Rationalists don’t consistently believe that, because if they did , they would be indfferent about MW versus Copenhagen , since all interpretations make the same predictions. Lesswrongian epistemology isn’t even consistent.
If you can have a non empirical reason to believe in non interacting branches of the universal wave function, your theist opponents can have a non empirical reason to believe in non interacting gods.
Of course not. SI can’t tell you why simplicity matters, epistemologically. At the same time. It is clear that simplicity is no additional help in making predictions. Once you have filtered out the non predictive order programmes, the remaining ones are all equally predictive … so whatever simpliciy is supplying, it isn’t extra productiveness. The obvious answer is that it’s some ability to show that, out of N equally predictive theories, one corresponds to reality.
That’s a standard defence of Occam’s razor. It isn’t given by SI, as we have seen. SI just needs the simplicity criterion in order to be able to spit something out.
But there are other defenses of Occam’s razor.
And the traditional versions don’t settle everything in favour of MWI and against (sophisticated versions of) God..those are open questions.
And SI isnt a new improved version of Occam’s razor. In fact , it is unable to relate simplicity to truth.
These old problems are open problems because we can’t agree on which kind of simplicity is relevant. SI doesn’t help because it introduces yet another simplicity measure. Or maybe two,the speed prior and the space prior.
Wrongly conflates Copenhagen with Objective Reduction.
Wrongly assumes MW is the only alternative to "Copenhagen".