How much work can Mill’s method of difference do?

I have a new paper coming out in the European Journal for Philosophy of Science, and here’s a link to a preprint on the PhilSci archive.

One of the basic ideas in scientific methodology is that in experiments you should “vary one thing at a time while keeping everything else constant”. This is often called Mill’s method of difference due to John Stuart Mill’s influential formulation of the principle in his System of Logic of 1843. Like many great ideas (think of natural selection), the method of difference can be explained to a second grader in two minutes – and yet the more one thinks about it, the more interesting it becomes.

The late Peter Lipton in his 1991 book on inference to the best explanation (IBE) made the descriptive claim that the method of difference is used widely in much of science, and this seems correct to me. But he also argued that the method is actually much less powerful than we think. In principle, we would like to vary one factor (and one factor only), observe a difference in some outcome, and then conclude that the factor we varied is the cause of the difference. But of course this depends on some rather steep assumptions.

First, we need to be sure that only one factor has changed — otherwise the inference does not succeed and this happens. But how do we ever know that there is only one difference? This is what Lipton called the problem of multiple differences.

Second, we may sometimes wish to conduct experiments where the factor which varies is unobserved or unobservable. For instance, John Snow inferred in the 19th century that local differences in cholera outbreaks in London were caused by a difference in the water supplied by two different companies. However, Snow could not actually observe this difference in the water supply (what we now know was a difference in the presence of the bacterium Vibrio cholerae). So Snow inferred causality even though the relevant initial difference was itself only inferred. This is what Lipton called the problem of inferred differences.

Lipton proposed elegant and clever solutions to both problems. He argued that the method of difference is to some extent mere surface action. Beneath the surface, scientists actually judge the explanatory power of various hypotheses, and this is crucial to inferences based on the method of difference. So Snow may not have known that an invisible agent in part of the water supply caused cholera, or that this was the only relevant difference between the water supplies. But he could judge that if such an agent existed, it would provide a powerful explanation of many known facts. In order to make it easier to discuss such judgments about the “explaininess” of hypotheses, Lipton introduced the “loveliness” of explanations as a technical term. Loveliness on his account comprises many common notions about explanatory virtues: for instance, unification and mechanisms. Snow’s explanation is lovely because it would unify multiple known facts: that cholera rates correlate with water supply, that those who got the bad water at their houses but didn’t drink it didn’t get sick, that the problematic water supply underwent less filtration, and so on. An invisible agent would moreover provide a mechanism for how a difference in water supply could cause a difference in disease outcomes, which would again increase the loveliness of Snow’s explanation. Ultimately, Lipton would argue, Snow’s causal inference relied on these explanatory judgments and not on the method of difference “taken neat” (to use Lipton’s phrase).

I have great sympathy for Lipton’s overall project. But I am also convinced that in many experimental studies there are ways to handle Lipton’s two problems that do not rely on an IBE framework. In my paper, therefore, I take a closer look at his main case study — Semmelweis on childbed fever — to find out how the problems of multiple and inferred differences were actually addressed. The result is that multiple differences can be dealt with to some extent by understanding control experiments correctly; and inferred differences become less of an issue if we understand how unobservables are often made detectable. The motto, if there is one, is that we always use true causes (once found) to explain, but that explanatory power is not our guide to whether causes are true. The causal inference crowd will find none of this particularly deep: but within the small debate about the relationship between the method of difference and IBE, these points seemed worth making.