The limits of my language and the limits of my world

Mcwhorter coverThe Language Hoax by John H. McWhorter is a book about the Sapir-Whorf hypothesis: the notion that languages deeply affect the way in which their speakers conceptualize the world. To give a few examples, Russian makes a distinction between lighter blues and darker blues that English (like German and French) doesn’t make. Do Russian speakers therefore have a richer perception of blue than English speakers? Similarly, German and French assign genders to objects, which may lead German speakers to assign male qualities to tables (sturdy?) while French speakers assign feminine qualities to them (supportive?). More interestingly perhaps, the male gender is somewhat dominant in European languages: for instance, the third person plural in French is “ils” even if there are women in a group. So does this lay the conceptual basis for a kind of sexism? As a final example, it is certainly plausible (and fun) to speculate that speakers of languages without a future tense might conceptualize the future, and plan for it, in a way that is completely different from our own. Thus, the potential reach of the Sapir-Whorf hypothesis is vast: it ranges from the relatively innocuous (color perception) to the socially charged (gender roles) to the conceptually profound (our very notion of time).

In this brief but rich book, McWhorter argues that the available empirical evidence speaks against any strong version of the Sapir-Whorf hypothesis. True, Russian speakers can distinguish certain shades of blue more quickly than speakers of other languages — but the differences are small in absolute terms.1 Yes, we can spin tales about the impact of linguistic peculiarities on cultural traits in some subpopulations: for instance, McWorther discusses attempts to link obligatory “evidential markers” (I see/I hear/they say) with particularly skeptical attitudes towards knowledge. However, he shows that this correlation breaks down (like many similar ones) when we extend our data set to a larger sample of languages and cultures: we then find cultural skepticism in languages without evidential markers and evidential markers in cultures with little skepticism.2 Now, such counterexamples are not conclusive: consider that the counterexamples could be explained by the fact that evidential markers sometimes cause a particularly skeptical attitude, and that there could be alternative causes of skeptical attitudes. But the counterexamples certainly show that any strong assumptions about language “structuring” thought are doubtful. The main point is that it is easy to come up with “just so stories” that link linguistic habits and cultural traits,3 but we need demonstrations of actual causality and deep cognitive effects. According to McWhorter, the consensus among professional linguists is that such demonstrations have not succeeded; language does have an impact on cognition, but these effects are relatively weak.

In addition to the empirical and methodological points, McWhorter argues that many in the humanities are drawn to the Sapir-Whorf hypothesis for the wrong reasons. The inclination is to think that demonstrating the richness of foreign linguistic concepts is to counteract a kind of Western cultural hegemony. But of course this can backfire if we find that there are some quite nifty things for which Western languages seem to be better equipped than others. For instance, the English language marks the hypothetical and counterfactual more explicitly than Mandarin Chinese. So do the Chinese have an impoverished sense of the hypothetical? If you are worried about Western cultural hegemony, you won’t find this thesis attractive. McWhorter takes the view that the Whorfian approach is the wrong way to argue for human equality. We should instead recognize the essential similarity of all human thought — which just happens to be expressed in different linguistic forms:

We are told that what languages teach us about being human is how different we are. Actually, languages’ lesson for us is more truly progressive – that our differences are variations on being the same. Many would consider that something to celebrate. (p. 168)

I certainly do.

Aside from its linguistic interest, the Sapir-Whorf hypothesis has a relationship to a topic in the history and philosophy of science (and this is why I, as a tradesman, was initially interested). Namely, it touches on the question of scientific realism: should we trust scientific results about unobservables such as “electron” or “gene”? McWhorter does not discuss this aspect of the story, but I believe it is worth some thought. I suspect that to many people, especially in the humanities, the question of scientific realism seems almost beside the point. This is because they “know” that even our ordinary perceptions — such as the color “blue” or activities like “eat and drink”4 — are deeply structured by our language. So how could the much more distant objects of scientific investigation not be similarly affected by our linguistic and conceptual apparatus? But of course, if McWhorter is correct that the Sapir-Whorf hypothesis fails for ordinary perception, then its extension to scientific results cannot even get off the ground.


  1. I also suspect that if we were to look at this data, we would find that the differences between populations are not only small in absolute terms, but small relative to the variation within populations.
  2. One of my favorite examples in the book is of a culture (the Amazonian Jarawara and related societies) where the feminine, rather than the masculine, is the default form for most words and plurals. However, the culture is nevertheless quite misogynistic.
  3. In Our Time just did an episode on Rudyard Kipling.
  4. Some languages do not distinguish between ingesting solids and liquids and have one word to cover both activities; others make fine-grained distinctions between ingesting different kinds of solids (hard, soft, stringy, round, …). The wealth of examples in this book is worth the price of entry.

There is no cow on the ice

Here at the Center for Philosophy of Science we are gently encouraged to express what we are thinking about on glassboards outside our offices. I think this is 1) a terrific idea and 2) not entirely unlike an accidentally acquired Tumblr that you have to keep feeding. My glassboard has been a bit stale for the past month, and so others have risen to the challenge of updating it:

IMG 7202

I still don’t get the hammer joke (I’m sorry, it just doesn’t hit the nail on the head). But ingen ko på isen — this is good to keep in mind.

John Norton has blogged about some more glassboard art.

The spirit of HPS (a love letter)

Last June I was in Vienna for the fifth conference on Integrated History and Philosophy of Science (&HPS5). It was an immensely enjoyable event. Towards the end of the conference, during the very last talk that I saw before I had to leave for the airport, I rediscovered my love for HPS. Here’s how it happened.

The beginning was inauspicious. The speaker had made slides with LaTeX, so they were heavy on text.1 What is more, she recited those slides word for word, which is usually considered bad presentation technique. But here’s the surprising thing: it worked brilliantly. Because of the exact parallelism between the slides and the spoken word, it was easy to follow the speaker’s arguments and evidence. Many presentations go off the rails because the audience doesn’t know whether to focus on the slides or the spoken word. That wasn’t a problem in this case.

The story started simply enough. There’s a famous biomedical discovery from the 1980s that led to a Nobel prize: the fact that gastric ulcers are caused by infection with Helicobacter pylori. The episode is reasonably well researched in HPS, so we know something about who discovered what, when, and where, and how additional research established the finding beyond reasonable doubt. But the speaker asked an interesting counterfactual question: Why was the discovery not made before the 1980s? The conditions should have been right earlier. On the face of it, there was no good reason for the delay. In terms of concepts and methods, the discovery could have been made in the 1950s. So why wasn’t it?

Here’s where things became interesting. A big part of the problem was a mistaken assumption: that the stomach is sterile because of its high acid content. The speaker began by asking the most obvious questions. Perhaps there was good empirical warrant for believing in a sterile stomach? Perhaps the techniques for detecting certain types of bacteria did not exist prior to the 1980s? Or if they existed, perhaps they were not routinely used? Perhaps an earlier study had made other causes of gastric ulcers very likely? These are good, solid epistemological question that, I think, must always be asked first. In general, scientists are good at science.

But when none of these explanations seemed right, she opened up the list of possibilities. Could it be that we have an instance here of a sociological rather than an epistemological process? Maybe epidemiologists in the 1950s felt that the search for infectious etiologies belonged to an “old paradigm” and was no longer worth pursuing? Or perhaps some gastroenterologists who rejected the infectious etiology of gastric ulcers had undue influence? Could it be that a study claiming that the stomach is sterile was cited more and more but questioned less and less? Or maybe the treatment of gastric ulcers only became big business in the 1980s, which made it more attractive to do research on the disease? Clearly, there are many non-epistemic consideration that may have been in play.

I like this plurality of questions. Historians of science remain (on the whole) captivated by the social conditions of science, while philosophers are (on the whole) enraptured by highly abstract formal problems. It is up to HPS to ask the whole range of pertinent questions about the scientific process: to produce an adequate understanding of how science actually works, from the epistemology of experiments to the social organization of inquiry. To me, this is what HPS is all about. I left Vienna at peace with my discipline.2

If you are interested, the talk was based on a paper by Dunja Šešelja and Christian Straßer which is now published in Acta Biotheoretica. Note: The paper’s focus differs from the talk; it is mostly about whether the bacterial hypothesis of ulcer causation was “worthy of pursuit” from the 1950s to the 1980s, with much less focus on broader questions discussed above.


  1. I think LaTeX is great for writing essays, papers and books — I even force my students to learn the system as a kind of tough love measure. But I don’t think it’s a good tool for presentations: it’s not sufficiently visual to produce interesting results, and it encourages a number of bad presentation habits.
  2. Of course, I never knew the old Vienna before the war with its Strauss music, its glamour and easy charm – and Popper (not yet Sir Karl) telling you how science is really done.

How much work can Mill’s method of difference do?

I have a new paper coming out in the European Journal for Philosophy of Science, and here’s a link to a preprint on the PhilSci archive.

One of the basic ideas in scientific methodology is that in experiments you should “vary one thing at a time while keeping everything else constant”. This is often called Mill’s method of difference due to John Stuart Mill’s influential formulation of the principle in his System of Logic of 1843. Like many great ideas (think of natural selection), the method of difference can be explained to a second grader in two minutes – and yet the more one thinks about it, the more interesting it becomes.

The late Peter Lipton in his 1991 book on inference to the best explanation (IBE) made the descriptive claim that the method of difference is used widely in much of science, and this seems correct to me. But he also argued that the method is actually much less powerful than we think. In principle, we would like to vary one factor (and one factor only), observe a difference in some outcome, and then conclude that the factor we varied is the cause of the difference. But of course this depends on some rather steep assumptions.

First, we need to be sure that only one factor has changed — otherwise the inference does not succeed and this happens. But how do we ever know that there is only one difference? This is what Lipton called the problem of multiple differences.

Second, we may sometimes wish to conduct experiments where the factor which varies is unobserved or unobservable. For instance, John Snow inferred in the 19th century that local differences in cholera outbreaks in London were caused by a difference in the water supplied by two different companies. However, Snow could not actually observe this difference in the water supply (what we now know was a difference in the presence of the bacterium Vibrio cholerae). So Snow inferred causality even though the relevant initial difference was itself only inferred. This is what Lipton called the problem of inferred differences.

Lipton proposed elegant and clever solutions to both problems. He argued that the method of difference is to some extent mere surface action. Beneath the surface, scientists actually judge the explanatory power of various hypotheses, and this is crucial to inferences based on the method of difference. So Snow may not have known that an invisible agent in part of the water supply caused cholera, or that this was the only relevant difference between the water supplies. But he could judge that if such an agent existed, it would provide a powerful explanation of many known facts. In order to make it easier to discuss such judgments about the “explaininess” of hypotheses, Lipton introduced the “loveliness” of explanations as a technical term. Loveliness on his account comprises many common notions about explanatory virtues: for instance, unification and mechanisms. Snow’s explanation is lovely because it would unify multiple known facts: that cholera rates correlate with water supply, that those who got the bad water at their houses but didn’t drink it didn’t get sick, that the problematic water supply underwent less filtration, and so on. An invisible agent would moreover provide a mechanism for how a difference in water supply could cause a difference in disease outcomes, which would again increase the loveliness of Snow’s explanation. Ultimately, Lipton would argue, Snow’s causal inference relied on these explanatory judgments and not on the method of difference “taken neat” (to use Lipton’s phrase).

I have great sympathy for Lipton’s overall project. But I am also convinced that in many experimental studies there are ways to handle Lipton’s two problems that do not rely on an IBE framework. In my paper, therefore, I take a closer look at his main case study — Semmelweis on childbed fever — to find out how the problems of multiple and inferred differences were actually addressed. The result is that multiple differences can be dealt with to some extent by understanding control experiments correctly; and inferred differences become less of an issue if we understand how unobservables are often made detectable. The motto, if there is one, is that we always use true causes (once found) to explain, but that explanatory power is not our guide to whether causes are true. The causal inference crowd will find none of this particularly deep: but within the small debate about the relationship between the method of difference and IBE, these points seemed worth making.

Grue-some confusion

Having concluded at the end of my previous post that the study of statistics has helped me to appreciate the value of the philosophy of science, it is only fitting to point to an instance of the reverse: Statistics can be a resource for solving philosophical problems.

Nelson Goodman’s new riddle of induction is supposed to show that no purely syntactical theory of confirmation is possible (i.e. one that does not depend on the meaning of the terms that appear in the argument, like “raven” or “black”). Here’s an outline of the argument. Goodman introduced the term “grue” and defined it thus: an object is grue if it is green before a certain date D or if it is blue after D. So an emerald is grue before D, while the sky is grue after D. (Note well that nothing changes color: This is all about the terminology we use to describe things.) Obviously, grue-like terms make it difficult to generalize from empirical observations. Even if we have examined a vast number of emeralds under all kinds of conditions and have found all of them to be grue, this fact does not generalize. After the future date D, we will encounter emeralds that are not grue. Thus, it is entirely hopeless to attempt to specify how many observations or how many variations of circumstances are needed before we can arrive at the general claim that “all emeralds are grue”.

It is a sound intuition to think that something must be fishy about grue-like terms. However, it has been difficult to show why precisely grue-like terms are inadmissible in science. Many attempts to solve the problem failed: Most famously, attempts to show that time-relative terms in general are inadmissible didn’t succeed, despite their intuitive plausibility. It was also proposed that the relevant distinction might be between terms that are “projectible” and those that are not, and this led to a search for criteria of projectibility. Others suggested that true confirmation is only possible where so-called “natural kinds” are concerned. In general, many philosophers concluded that the grue-problem may be intractable and may represent a deep problem for all theories of confirmation.

However, I think that a robust understanding of the problem (or much of the problem) was eventually found — an understanding based on statistical thinking. It is an excellent instance of progress in philosophy of science. Here’s a brief review of the proposed solution. Goodman was thinking about a sampling operation: You look at n members of a population in order to form an opinion about the properties of the entire population. To use his example, you conclude that all emeralds are green from sampling n emeralds. To use a more realistic example, you might want to predict how a country is going to vote based on a sample of 2000 likely voters. Now, it is well known that sampling fails if certain assumptions aren’t met. One of these assumptions is that the act of sampling must not alter the property being sampled. If my asking people “who will you vote for next Tuesday?” causes them to feel more established and therefore to vote for the incumbent party, my survey will overestimate the incumbents’ share of the vote. It turns out that something similar holds for the term grue: The fact that I sample an emerald before date D makes it “grue”, while otherwise it would be “not grue”. Thus, my sampling of an emerald can change its “grueness”. Clearly, then, this violates the rules of sampling. It is this statistical reasoning, and not some elaborate philosophical theory, that explains why “grue” is an inadmissible predicate.

The matter is, of course, more complex than I make it out to be; and there may indeed exist instances of grue-like problems that cannot be treated like this. For those who wish to go beyond my three paragraphs, I recommend this paper by Peter Godfrey-Smith.

What I learned by discovering statistics using R

I would summarize many of my driving interests under the heading of “scientific epistemology”. However, for a long time I had an egregious blind spot: statistics. Although I read my way through Rohlf and Sokal’s classic text “Biometry” six years ago, it left me with something less than a working understanding of statistics as a research scientist would use it. Whether this was my fault or the text’s, or simply a matter of incompatibility, is hard to say.

To ameliorate the situation, I spent much of my spare time last April plowing through each and every chapter of “Discovering Statistics Using R” by Andy Field (and co-authors). On the whole, it was an immensely enjoyable experience. Here are a few of my meta-insights.

  1. You can grasp the statistical concepts without becoming a mathematician. I sometimes have difficulty assimilating knowledge if I fail to understand its foundations — e.g. to learn how a drug is used without understanding its molecular mode of action. This difficulty persisted even after I had identified it as a hindrance. (This is part of why I wandered from medicine into the history and philosophy of science, where an obsession with foundations is generally a natural advantage.) Analogously, I was worried that I might get stuck with my statistics text as soon as I encountered some mathematical theorem that I had to accept but couldn’t understand with reasonable effort and within reasonable time. Happily, I found it easy to deal with mathematical black boxes in statistics. I think two things helped. First, DSUR introduces the black boxes efficiently and often labels them explicitly, which makes it easier to accept them. Second, many statistical black boxes can be grasped intuitively. For instance, there is the “variance sum law”, which states that the variance of the differences or sums of two independent variables is equal to the sum of the variances of the two variables (this matters, for example, if you are testing whether the means of two populations differ in a t-test). I don’t know how you prove this (although it is not difficult to imagine the outlines of a proof), but I nevertheless find it highly plausible that the variance sum law holds. Other questions are more difficult — e.g., why do correlation coefficients range from -1 to 1? Mathematician friends tell me that the answer to this is nontrivial. Nevertheless, I did not have any difficulty accepting it, and so my education in statistics could proceed. I found that there were many similar instances of very tolerable black boxes.
  2. Statistics should be seen in relation to concrete study designs. When I read “Biometry”, I think I lost the forest for the trees: I learned about the theory of statistics but failed to see how it applied to concrete research situations. One of the strengths of DSUR is that it is pretty clear about how each statistical method relates to familiar types of study designs.
  3. The importance of the computer is hard to exaggerate. “Biometry” was originally written in the 1970s, and its primary tool was the pencil: It taught me how to do statistics by hand, if necessary. I get that this can be useful for teaching concepts. But in practice (and in 2014) I found it vastly more enjoyable to study statistics in close contact with R, where I learned how to actually work on more or less realistic data sets. I like to joke that I love computers and will take any excuse to spend more time with them. More seriously, I think that doing statistics is pretty similar to programming: Understanding the concepts is one thing, but you also need to learn which functions take which values, where to put the semicola, and what the error messages mean. There is a craft to statistics, and I think that familiarity with the craft makes it easier to assimilate the theory.
  4. Emotions matter. It is well know that learning without positive emotion is difficult for us humans. Importantly, therefore, DSUR helped me to get excited about statistical methods. I get that you should have a good conceptual grasp of the assumptions that a data set must meet if you want to do an ANOVA. But studying those assumptions before you have ever done an ANOVA and thus before you have discovered the potential power of the method is, frankly, boring. DSUR helps you to see — more importantly, to feel — that statistical methods are really cool and powerful, and this helps you through more tedious things like checking whether your data are homoscedastic.
  5. Philosophy of science is useful. During the first half of the book, in the throes of young romance, I felt that statistics is the key to understanding scientific epistemology and in some sense removes the need for a philosophy of science. But I quickly recuperated: I now think with renewed conviction that the philosophy of science is tremendously important. It should be taught alongside statistics to students. One cannot make sense of scientific methodology by understanding only statistics but none of the concepts that traditionally live in the philosophy of science. Statistics texts hardly touch many of these questions: What is a cause? What is the logic of causal inference, and what are its prerequisites? (Which is the basis for asking: And how does statistics help in inferring causes?) What is the epistemological role of scientific models? What are mechanisms, and what does it take to ascertain them? How do causal processes at different levels of organization relate to each other? What is an explanation, and what role does explanatory power play in the confirmation of scientific hypotheses? Many of these questions do not (currently) have definitive answers. But I do think (based on experience) that most working scientists have strong intuitions about them that help them in their epistemological work — and if nothing else, the philosophy of science can prime these intuitions and help to produce better scientists. To my surprise, then, an immersion in statistics has helped me to better appreciate one of my parent disciplines (which are, in this order: biomedicine, history of science, and philosophy of science).