The spirit of HPS (a love letter)

Last June I was in Vienna for the fifth conference on Integrated History and Philosophy of Science (&HPS5). It was an immensely enjoyable event. Towards the end of the conference, during the very last talk that I saw before I had to leave for the airport, I rediscovered my love for HPS. Here’s how it happened.

The beginning was inauspicious. The speaker had made slides with LaTeX, so they were heavy on text.1 What is more, she recited those slides word for word, which is usually considered bad presentation technique. But here’s the surprising thing: it worked brilliantly. Because of the exact parallelism between the slides and the spoken word, it was easy to follow the speaker’s arguments and evidence. Many presentations go off the rails because the audience doesn’t know whether to focus on the slides or the spoken word. That wasn’t a problem in this case.

The story started simply enough. There’s a famous biomedical discovery from the 1980s that led to a Nobel prize: the fact that gastric ulcers are caused by infection with Helicobacter pylori. The episode is reasonably well researched in HPS, so we know something about who discovered what, when, and where, and how additional research established the finding beyond reasonable doubt. But the speaker asked an interesting counterfactual question: Why was the discovery not made before the 1980s? The conditions should have been right earlier. On the face of it, there was no good reason for the delay. In terms of concepts and methods, the discovery could have been made in the 1950s. So why wasn’t it?

Here’s where things became interesting. A big part of the problem was a mistaken assumption: that the stomach is sterile because of its high acid content. The speaker began by asking the most obvious questions. Perhaps there was good empirical warrant for believing in a sterile stomach? Perhaps the techniques for detecting certain types of bacteria did not exist prior to the 1980s? Or if they existed, perhaps they were not routinely used? Perhaps an earlier study had made other causes of gastric ulcers very likely? These are good, solid epistemological question that, I think, must always be asked first. In general, scientists are good at science.

But when none of these explanations seemed right, she opened up the list of possibilities. Could it be that we have an instance here of a sociological rather than an epistemological process? Maybe epidemiologists in the 1950s felt that the search for infectious etiologies belonged to an “old paradigm” and was no longer worth pursuing? Or perhaps some gastroenterologists who rejected the infectious etiology of gastric ulcers had undue influence? Could it be that a study claiming that the stomach is sterile was cited more and more but questioned less and less? Or maybe the treatment of gastric ulcers only became big business in the 1980s, which made it more attractive to do research on the disease? Clearly, there are many non-epistemic consideration that may have been in play.

I like this plurality of questions. Historians of science remain (on the whole) captivated by the social conditions of science, while philosophers are (on the whole) enraptured by highly abstract formal problems. It is up to HPS to ask the whole range of pertinent questions about the scientific process: to produce an adequate understanding of how science actually works, from the epistemology of experiments to the social organization of inquiry. To me, this is what HPS is all about. I left Vienna at peace with my discipline.2

If you are interested, the talk was based on a paper by Dunja Šešelja and Christian Straßer which is now published in Acta Biotheoretica. Note: The paper’s focus differs from the talk; it is mostly about whether the bacterial hypothesis of ulcer causation was “worthy of pursuit” from the 1950s to the 1980s, with much less focus on broader questions discussed above.


  1. I think LaTeX is great for writing essays, papers and books — I even force my students to learn the system as a kind of tough love measure. But I don’t think it’s a good tool for presentations: it’s not sufficiently visual to produce interesting results, and it encourages a number of bad presentation habits.
  2. Of course, I never knew the old Vienna before the war with its Strauss music, its glamour and easy charm – and Popper (not yet Sir Karl) telling you how science is really done.

How much work can Mill’s method of difference do?

I have a new paper coming out in the European Journal for Philosophy of Science, and here’s a link to a preprint on the PhilSci archive.

One of the basic ideas in scientific methodology is that in experiments you should “vary one thing at a time while keeping everything else constant”. This is often called Mill’s method of difference due to John Stuart Mill’s influential formulation of the principle in his System of Logic of 1843. Like many great ideas (think of natural selection), the method of difference can be explained to a second grader in two minutes – and yet the more one thinks about it, the more interesting it becomes.

The late Peter Lipton in his 1991 book on inference to the best explanation (IBE) made the descriptive claim that the method of difference is used widely in much of science, and this seems correct to me. But he also argued that the method is actually much less powerful than we think. In principle, we would like to vary one factor (and one factor only), observe a difference in some outcome, and then conclude that the factor we varied is the cause of the difference. But of course this depends on some rather steep assumptions.

First, we need to be sure that only one factor has changed — otherwise the inference does not succeed and this happens. But how do we ever know that there is only one difference? This is what Lipton called the problem of multiple differences.

Second, we may sometimes wish to conduct experiments where the factor which varies is unobserved or unobservable. For instance, John Snow inferred in the 19th century that local differences in cholera outbreaks in London were caused by a difference in the water supplied by two different companies. However, Snow could not actually observe this difference in the water supply (what we now know was a difference in the presence of the bacterium Vibrio cholerae). So Snow inferred causality even though the relevant initial difference was itself only inferred. This is what Lipton called the problem of inferred differences.

Lipton proposed elegant and clever solutions to both problems. He argued that the method of difference is to some extent mere surface action. Beneath the surface, scientists actually judge the explanatory power of various hypotheses, and this is crucial to inferences based on the method of difference. So Snow may not have known that an invisible agent in part of the water supply caused cholera, or that this was the only relevant difference between the water supplies. But he could judge that if such an agent existed, it would provide a powerful explanation of many known facts. In order to make it easier to discuss such judgments about the “explaininess” of hypotheses, Lipton introduced the “loveliness” of explanations as a technical term. Loveliness on his account comprises many common notions about explanatory virtues: for instance, unification and mechanisms. Snow’s explanation is lovely because it would unify multiple known facts: that cholera rates correlate with water supply, that those who got the bad water at their houses but didn’t drink it didn’t get sick, that the problematic water supply underwent less filtration, and so on. An invisible agent would moreover provide a mechanism for how a difference in water supply could cause a difference in disease outcomes, which would again increase the loveliness of Snow’s explanation. Ultimately, Lipton would argue, Snow’s causal inference relied on these explanatory judgments and not on the method of difference “taken neat” (to use Lipton’s phrase).

I have great sympathy for Lipton’s overall project. But I am also convinced that in many experimental studies there are ways to handle Lipton’s two problems that do not rely on an IBE framework. In my paper, therefore, I take a closer look at his main case study — Semmelweis on childbed fever — to find out how the problems of multiple and inferred differences were actually addressed. The result is that multiple differences can be dealt with to some extent by understanding control experiments correctly; and inferred differences become less of an issue if we understand how unobservables are often made detectable. The motto, if there is one, is that we always use true causes (once found) to explain, but that explanatory power is not our guide to whether causes are true. The causal inference crowd will find none of this particularly deep: but within the small debate about the relationship between the method of difference and IBE, these points seemed worth making.