Why trust science?

Naomi Oreskes’s recent book, Why Trust Science?, occupies an underpopulated space: It is interesting to an audience of professional historians or philosophers of science, but it also aims to engage a much larger part of the public. The book is motivated by Oreskes’s work on climate change denialism, and it asks: Why should we trust the pronouncements of scientists?

Oreskes begins by rejecting the view that scientists employ special methods by which they are able to tell what is true. We might think that scientists only hold theories that have been appropriately verified. But no: Philosophers have long argued, for a variety of reasons, that we can never verify theories in the sense of conclusively demonstrating their truth. Or perhaps proper scientific theories must be falsifiable, and must have survived many rigorous attempts at falsification? Again, this famous proposal by Karl Popper did not hold up to scrutiny. Oreskes goes through additional suggestions, and many influential philosophers make an appearance: Comte, Duhem, Hempel, Quine, Kuhn — and of course Feyerabend, who wrote “against method”.

Oreskes argues that scientific method ultimately cannot be what guarantees the trustworthiness of scientific findings. Instead, she suggests that we should look to social aspects of science. For example, we should ask whether the relevant scientific community is diverse, so that many viewpoints are represented. And we should ask whether the community allows critical discussion, so that superior viewpoints have a chance to prevail. In brief, our trust in science ought to rest not on truth-conducive methods, but on a truth-conducive type of social organisation. If an appropriately open, diverse, and critical scientific community reaches a consensus on a question, that consensus can probably be trusted. Since we cannot know for sure what is true, we use consensus as a proxy.

This is an appealing solution to the problem, and I have no doubt that it gets at something important. However, I am not convinced that it can successfully bracket the question of method. Here’s why. For group consensus to be a meaningful proxy for truth, we have to assume that the group can somehow assess whether its methods a truth-tropic, that is, whether they are sensitive to how the world actually is (which is not to say that these methods are required to be infallible). We must also assume that such an assessment of truth-tropism is relevant to the emergence of consensus. Otherwise the consensus might indicate nothing more than that the entire community has convinced itself to use the same, although unreliable, methods. If a community lacks methods that are truth-tropic, or lacks the ability to tell which of its methods are truth-tropic, or does not care, then its consensus doesn’t count for much — even if the community is wonderfully diverse, open, and critical.

Oreskes rightly fears that strong methodological norms could become a straightjacket (she speaks of a “fetish”), such as when we demand randomised, controlled trials in cases where they are neither feasible nor necessary. She discusses, for example, the debates about the effectiveness of dental flossing. Even in the absence of RCTs, she argues, we should not dismiss the abundant mechanistic evidence that flossing is effective against periodontal disease. I agree! However, this does not show that methodological norms are a bad thing to have. It only shows that we should not adopt artificially narrow accounts. The goal of scientific epistemology must be to study the full range of methods that empirical scientists employ, and to learn about their track records as epistemic tools.

To be clear, I think that we should in most cases trust science. I also think that the social organisation of scientific inquiry is important and deserves much more attention from philosophers of science than it has so far received. And, finally, I agree with Oreskes that the existence of a consensus within a scientific community is an important indicator of whether any particular scientific result ought to be trusted. Consensus is an especially important indicator for a general, non-expert public. But as historians and philosophers of science, we have an additional explanatory aim. It is to understand why consensus in a community can be taken as an indication that particular claims are likely to be true. And this requires a debate about the methods by which such claims are assessed, even if we believe that this assessment is a social processs.

(P.S.: I often dabble in the debates about what, if anything, the history of science and the philosophy of science have to learn from each other. Oreskes’s book is interesting in this respect. She clearly self-identifies as a historian of science. But in this book she begins by reviewing philosophical accounts of method, and she returns to this philosophical framing throughout as she discusses episodes from the history of science. I think that this is not at all surprising: If the epistemology of science is our interest, then we are in an area where the history and the philosophy of science naturally intersect.)

Review: Jutta Schickore, “About Method” (University of Chicago Press, 2017)

(This is a significantly expanded version of my review for NTM Journal of the History of Science, Technology and Medicine, which will appear later this year.)

Jutta Schickore’s About Method focuses on a neglected genre of scientific texts: what she calls the “methods discourse” of experimentalists. She argues that even though the philosophical and historical literature on experimental science is vast, little attention has been paid to what scientists themselves have to say about how to design, evaluate, and report the results of experiments. She acknowledges, of course, that methods discourse may be an unreliable guide to what scientists actually do in experimental practice, but she argues that it nevertheless offers a unique window into scientists’ conceptions of proper experimental procedure.

The book is structured around an extended case study of research on snake venom. This is a felicitous choice. It allows Schickore to connect significant episodes in the history of experimentation ranging from seventeenth-century Europe to the late nineteenth-century United States. We meet Francesco Redi (1626—1697), the Italian physician and naturalist who is now best known for his experiments questioning spontaneous generation. In his treatise on vipers, he insisted on repeating experiments many times in order to show that the “yellow liquor” in vipers’ teeth caused poisoning if it was inserted into wounds, but not if it was swallowed. Two chapters focus on Felice Fontana (1730—1805), who has been lauded as one of the first experimentalists to use adequate controls. Working a century after Redi, Fontana relied on the variation of experimental circumstances to determine which tissues snake venom acts upon. Another century later, the American physician Silas Weir Mitchell (1829—1914), among others, sought to isolate the active components in snake venom. Schickore finally links the research on snake venom to broader nineteenth-century reflections on the nature and limits of experimental inquiry, especially through the widely felt influence of the French physiologist and methodologist Claude Bernard (1813—1878). As we travel along this historical arc, the stability of snake venom as a research problem serves as a backdrop against which we recognize the evolution of experimental methods.

Schickore distinguishes multiple layers of methods discourse. At an extreme level of generality, we find scientists making broad commitments: for example, that experience is the only path to certain knowledge. At the other extreme, we find detailed discussions of particular laboratory procedures: of how precisely an incision was made and poison was deposited. Schickore, however, focuses on an intermediate level: that of methodological strategies and principles, such as the notions that we should conduct comparative experiments, repeat them, and vary their circumstances. Intriguingly, this intermediate level of analysis reveals continuities between researchers and time periods that remain hidden at the level of broad commitments or concrete research practices.

To illustrate, consider the contrast between two nineteenth-century scientists and methodologists, John Herschel and Claude Bernard. Herschel’s broad commitment was to a form of Baconianism, and his experience with experimental practice was in the physical sciences. Bernard, by contrast, could never quite stop kicking Bacon in the shins, as it were, and his practical experience was in physiology. Nevertheless, at the intermediate level of methodological strategies there was much that united the two. Both, for example, were interested in the theory and epistemic force of comparative experimentation. Thus, the middle-level analysis permits us to recognize significant continuities that might otherwise remain hidden.

Scholars have often reinterpreted methodological statements as efforts at positioning and persuasion, but Schickore believes a literal reading is often appropriate. For example, Redi’s emphasis of his many resource-consuming repetitions has been seen as a “political gambit to display the power of the Tuscan court” (32). However, organisms are highly variable, and repeating experiments helps to reveal variations and to discover which findings are reliable. Schickore shows that Redi defended his methodological views on such epistemic grounds. We have read much about the ways in which detailed descriptions of experimental procedures served to establish an experimenter’s credibility. Schickore, by contrast, emphasizes the role of methodological statements in establishing the credibility of the experimental procedures themselves.

About Method highlights the power of Schickore’s approach to integrated history and philosophy of science. She has previously argued that in order to understand current concepts and practices, we must often study how these concepts and practices have come into existence. The case of experimental method is an excellent example. Present-day methodological statements are all about the particulars of experimental processes and rarely discuss underlying methodological principles. There is much that we have simply come to take for granted. To understand the epistemic aims of such practices as comparisons, repetitions, and variations, we must look to a time when these practices were still controversial and needed explicit defense.

In some instances, it would have been welcome to drive the analysis further. For example, I would have liked to see how precisely Redi or Fontana deployed strategies like repetitions or variations of circumstances, and what inferences these strategies enabled. It is one thing to say that repetitions or variations reveal sources of error — but how did this work in concrete cases? Perhaps such an analysis would have seemed to Schickore too much like an attempt to “measure the distance” (9) between historical actors and modern-day conceptions of experimental reasoning. Or perhaps the shift from methods discourse to practice would have meant a loss of focus. In any case, I would have like to read more about the nitty-gritty details of experimental reasoning.

The title of Schickore’s book is a play on Paul Feyerabend’s seminal Against Method, which was published in 1975. It is telling that the method that Feyerabend argued against has very little in common with the methods that Schickore’s historical actors write about. The goal of the snake venom researchers was to isolate suspected causes, and to demonstrate their competence to produce the effects ascribed to them. These researchers worried about targeted interventions and the control of confounding variables, not about the power of falsification and the problems of ad hoc hypotheses. Schickore’s title is thus fitting. Her book is in many ways orthogonal to older debates. It is not a counterweight to Feyerabend’s book: for method, a defense of a philosophical project. It is about method: a description, an analysis, an invitation to a new dialogue.

To conclude, About Method is an excellent book. It not only gives us insights into the long-term development of experimental methods, but also serves as a model of how to conduct studies at the interface of history and philosophy of science.

The limits of my language and the limits of my world

Mcwhorter coverThe Language Hoax by John H. McWhorter is a book about the Sapir-Whorf hypothesis: the notion that languages deeply affect the way in which their speakers conceptualize the world. To give a few examples, Russian makes a distinction between lighter blues and darker blues that English (like German and French) doesn’t make. Do Russian speakers therefore have a richer perception of blue than English speakers? Similarly, German and French assign genders to objects, which may lead German speakers to assign male qualities to tables (sturdy?) while French speakers assign feminine qualities to them (supportive?). More interestingly perhaps, the male gender is somewhat dominant in European languages: for instance, the third person plural in French is “ils” even if there are women in a group. So does this lay the conceptual basis for a kind of sexism? As a final example, it is certainly plausible (and fun) to speculate that speakers of languages without a future tense might conceptualize the future, and plan for it, in a way that is completely different from our own. Thus, the potential reach of the Sapir-Whorf hypothesis is vast: it ranges from the relatively innocuous (color perception) to the socially charged (gender roles) to the conceptually profound (our very notion of time).

In this brief but rich book, McWhorter argues that the available empirical evidence speaks against any strong version of the Sapir-Whorf hypothesis. True, Russian speakers can distinguish certain shades of blue more quickly than speakers of other languages — but the differences are small in absolute terms.1 Yes, we can spin tales about the impact of linguistic peculiarities on cultural traits in some subpopulations: for instance, McWorther discusses attempts to link obligatory “evidential markers” (I see/I hear/they say) with particularly skeptical attitudes towards knowledge. However, he shows that this correlation breaks down (like many similar ones) when we extend our data set to a larger sample of languages and cultures: we then find cultural skepticism in languages without evidential markers and evidential markers in cultures with little skepticism.2 Now, such counterexamples are not conclusive: consider that the counterexamples could be explained by the fact that evidential markers sometimes cause a particularly skeptical attitude, and that there could be alternative causes of skeptical attitudes. But the counterexamples certainly show that any strong assumptions about language “structuring” thought are doubtful. The main point is that it is easy to come up with “just so stories” that link linguistic habits and cultural traits,3 but we need demonstrations of actual causality and deep cognitive effects. According to McWhorter, the consensus among professional linguists is that such demonstrations have not succeeded; language does have an impact on cognition, but these effects are relatively weak.

In addition to the empirical and methodological points, McWhorter argues that many in the humanities are drawn to the Sapir-Whorf hypothesis for the wrong reasons. The inclination is to think that demonstrating the richness of foreign linguistic concepts is to counteract a kind of Western cultural hegemony. But of course this can backfire if we find that there are some quite nifty things for which Western languages seem to be better equipped than others. For instance, the English language marks the hypothetical and counterfactual more explicitly than Mandarin Chinese. So do the Chinese have an impoverished sense of the hypothetical? If you are worried about Western cultural hegemony, you won’t find this thesis attractive. McWhorter takes the view that the Whorfian approach is the wrong way to argue for human equality. We should instead recognize the essential similarity of all human thought — which just happens to be expressed in different linguistic forms:

We are told that what languages teach us about being human is how different we are. Actually, languages’ lesson for us is more truly progressive – that our differences are variations on being the same. Many would consider that something to celebrate. (p. 168)

I certainly do.

Aside from its linguistic interest, the Sapir-Whorf hypothesis has a relationship to a topic in the history and philosophy of science (and this is why I, as a tradesman, was initially interested). Namely, it touches on the question of scientific realism: should we trust scientific results about unobservables such as “electron” or “gene”? McWhorter does not discuss this aspect of the story, but I believe it is worth some thought. I suspect that to many people, especially in the humanities, the question of scientific realism seems almost beside the point. This is because they “know” that even our ordinary perceptions — such as the color “blue” or activities like “eat and drink”4 — are deeply structured by our language. So how could the much more distant objects of scientific investigation not be similarly affected by our linguistic and conceptual apparatus? But of course, if McWhorter is correct that the Sapir-Whorf hypothesis fails for ordinary perception, then its extension to scientific results cannot even get off the ground.


  1. I also suspect that if we were to look at this data, we would find that the differences between populations are not only small in absolute terms, but small relative to the variation within populations.
  2. One of my favorite examples in the book is of a culture (the Amazonian Jarawara and related societies) where the feminine, rather than the masculine, is the default form for most words and plurals. However, the culture is nevertheless quite misogynistic.
  3. In Our Time just did an episode on Rudyard Kipling.
  4. Some languages do not distinguish between ingesting solids and liquids and have one word to cover both activities; others make fine-grained distinctions between ingesting different kinds of solids (hard, soft, stringy, round, …). The wealth of examples in this book is worth the price of entry.

What I learned by discovering statistics using R

I would summarize many of my driving interests under the heading of “scientific epistemology”. However, for a long time I had an egregious blind spot: statistics. Although I read my way through Rohlf and Sokal’s classic text “Biometry” six years ago, it left me with something less than a working understanding of statistics as a research scientist would use it. Whether this was my fault or the text’s, or simply a matter of incompatibility, is hard to say.

To ameliorate the situation, I spent much of my spare time last April plowing through each and every chapter of “Discovering Statistics Using R” by Andy Field (and co-authors). On the whole, it was an immensely enjoyable experience. Here are a few of my meta-insights.

  1. You can grasp the statistical concepts without becoming a mathematician. I sometimes have difficulty assimilating knowledge if I fail to understand its foundations — e.g. to learn how a drug is used without understanding its molecular mode of action. This difficulty persisted even after I had identified it as a hindrance. (This is part of why I wandered from medicine into the history and philosophy of science, where an obsession with foundations is generally a natural advantage.) Analogously, I was worried that I might get stuck with my statistics text as soon as I encountered some mathematical theorem that I had to accept but couldn’t understand with reasonable effort and within reasonable time. Happily, I found it easy to deal with mathematical black boxes in statistics. I think two things helped. First, DSUR introduces the black boxes efficiently and often labels them explicitly, which makes it easier to accept them. Second, many statistical black boxes can be grasped intuitively. For instance, there is the “variance sum law”, which states that the variance of the differences or sums of two independent variables is equal to the sum of the variances of the two variables (this matters, for example, if you are testing whether the means of two populations differ in a t-test). I don’t know how you prove this (although it is not difficult to imagine the outlines of a proof), but I nevertheless find it highly plausible that the variance sum law holds. Other questions are more difficult — e.g., why do correlation coefficients range from -1 to 1? Mathematician friends tell me that the answer to this is nontrivial. Nevertheless, I did not have any difficulty accepting it, and so my education in statistics could proceed. I found that there were many similar instances of very tolerable black boxes.
  2. Statistics should be seen in relation to concrete study designs. When I read “Biometry”, I think I lost the forest for the trees: I learned about the theory of statistics but failed to see how it applied to concrete research situations. One of the strengths of DSUR is that it is pretty clear about how each statistical method relates to familiar types of study designs.
  3. The importance of the computer is hard to exaggerate. “Biometry” was originally written in the 1970s, and its primary tool was the pencil: It taught me how to do statistics by hand, if necessary. I get that this can be useful for teaching concepts. But in practice (and in 2014) I found it vastly more enjoyable to study statistics in close contact with R, where I learned how to actually work on more or less realistic data sets. I like to joke that I love computers and will take any excuse to spend more time with them. More seriously, I think that doing statistics is pretty similar to programming: Understanding the concepts is one thing, but you also need to learn which functions take which values, where to put the semicola, and what the error messages mean. There is a craft to statistics, and I think that familiarity with the craft makes it easier to assimilate the theory.
  4. Emotions matter. It is well know that learning without positive emotion is difficult for us humans. Importantly, therefore, DSUR helped me to get excited about statistical methods. I get that you should have a good conceptual grasp of the assumptions that a data set must meet if you want to do an ANOVA. But studying those assumptions before you have ever done an ANOVA and thus before you have discovered the potential power of the method is, frankly, boring. DSUR helps you to see — more importantly, to feel — that statistical methods are really cool and powerful, and this helps you through more tedious things like checking whether your data are homoscedastic.
  5. Philosophy of science is useful. During the first half of the book, in the throes of young romance, I felt that statistics is the key to understanding scientific epistemology and in some sense removes the need for a philosophy of science. But I quickly recuperated: I now think with renewed conviction that the philosophy of science is tremendously important. It should be taught alongside statistics to students. One cannot make sense of scientific methodology by understanding only statistics but none of the concepts that traditionally live in the philosophy of science. Statistics texts hardly touch many of these questions: What is a cause? What is the logic of causal inference, and what are its prerequisites? (Which is the basis for asking: And how does statistics help in inferring causes?) What is the epistemological role of scientific models? What are mechanisms, and what does it take to ascertain them? How do causal processes at different levels of organization relate to each other? What is an explanation, and what role does explanatory power play in the confirmation of scientific hypotheses? Many of these questions do not (currently) have definitive answers. But I do think (based on experience) that most working scientists have strong intuitions about them that help them in their epistemological work — and if nothing else, the philosophy of science can prime these intuitions and help to produce better scientists. To my surprise, then, an immersion in statistics has helped me to better appreciate one of my parent disciplines (which are, in this order: biomedicine, history of science, and philosophy of science).

Primates and Philosophers: Read monkeys for preexistence

Darwin opened his “M” notebook in the summer of 1838, when he had already formulated the thesis of common descent but not yet the mechanism of natural selection. The “M” notebook was dedicated to “metaphysical” considerations (note well: the modern usage of “metaphysical” differs). Speaking very broadly, it explored the evolutionary history of human psychological traits. On page 128, we find Darwin’s at his most quotable:

Plato says in Phaedo that our “necessary ideas” arise from the preexistence of the soul, are not derivable from experience — read monkeys for preexistence.

Ever since Darwin it has been evident that much of our emotional and cognitive furniture must be explained by our evolutionary history. This is one of the most philosophically significant aspects of evolutionary biology, but also one of the hardest to explore empirically.

Frans de Waal’s Primates and Philosophers: How Morality Evolved is a modern continuation of Darwin’s “M” project: a reflection by a leading primate researcher on the evolutionary origins of our moral sentiments. It is tremendously enjoyable. I can give it no higher recommendation than to note that I have already ordered more books by the author. De Waal argues that the “building blocks” of human morality can be found in our primate relatives and are the results of evolutionary processes such as reciprocal altruism, kin selection and perhaps (de Waal is skeptical) group selection. This is of course not unique: The value of the book lies in its copious and lucidly presented empirical material on the morally significant behavior of primates. De Waal argues forcefully and intelligently that moral behavior is not a thin “veneer” of good behavior plastered onto a brutish, selfish psychological core. Instead, acting morally is as human (or primately) as anything.

The book also includes comments by a number of philosophers. These are not as engaging as the science (although this may reflect only my interests), but they are useful. I particularly liked Philip Kitcher’s contribution. He argues that the “veneer theory” of human morality, which de Waal attacks at length, may be a straw man: Who really believes that morality is a purely cultural layer on top of a selfish underlying biology? I suspect that de Waal needed some sort of dialectic to get going with his argument, but “veneer theory” may not be a good choice, and I rather doubt that he is being fair to those he names as exponents of the view (such as Thomas Henry Huxley and Richard Dawkins). A more productive sparring partner might be one of the following positions: (1) The assumption (which may be prevalent among philosophers) that morality is a matter of reason and not of evolved emotions; or (2) the charge that de Waal’s primate research is merely “descriptive” and so has no bearing on morality, which is understood to be “normative”.

As a separate point, Kitcher argues that it is insufficient to speak of the foundation of morality in “altruism”: different dimensions of altruism must be distinguished in order for the “building blocks” notion to be made precise. I think this is a useful conceptual clarification. It lays groundwork for the continuation of the exciting and difficult empirical project.

The elephant in the room does not get much discussion, perhaps wisely: It is the conclusion (which I find nearly inescapable, and de Waal might agree) that an understanding of our moral sentiments is all there is, or almost all there is, to understanding the foundations of ethics and morality. For now, I leave it to Hume and Ruse and Wilson to argue for this – but read also Peter Singer’s contribution to the book in hand.

I like the way it raises its family, partly birdly, partly mammaly

Ann Moyal’s book on the history of the platypus is a good read. It gives an overview of the difficulties the platypus posed for zoology and of the way it gradually came to be understood in light of evolution. Along the way, we meet many of the great figures of the history of 19th century biology – Georges Cuvier, Geoffroy Saint-Hilaire, Richard Owen, Charles Darwin, Thomas Huxley – and learn something about their scientific context. Much of this material is familiar, but it works. There are also some very nice platypus anecdotes spread throughout the book, such as Churchill’s attempts to import a platypus to Britain in the middle of the second World War (it came to be known as “Winston”).

However, something about the book irked me, and I think it relates to a broader issue in the history of science. As the etiquette for serious historians of science dictates, Moyal discusses the past entirely on its own terms. This means that we do not get much of a primer on platypus biology early on in the book, and as past scientists formulate theories about the platypus, we are rarely told whether their findings were true or not. This approach makes for a cognitively challenging read. Sometimes it would be nice just for orientation to know which early findings were true or false, why past scientists were mistaken, and how exactly they squared their (false) theories with empirical findings. Far from resulting in de-contextualized history of science, I believe that this would make it easier to appreciate the social context of scientific discovery – to understand in some detail how empirical, social and personal forces interacted. As it stands, the history is often just one thing after another, and in some sense we wind up as ignorant of the overall process as the historical actors themselves. Surely that’s not the goal of historiography.

More generally, I felt that a more distanced view would have improved the book. Much of the second half is structured around a “race” (there are shades of The Double Helix here) to determine the platypus’s mode of reproduction – oviparous, ovoviviparous, or viviparous. This ends with what is perhaps the most famous telegram in the history of science: “monotremes oviparous, ovum meroblastic”. (The platypus lays eggs, and their development is more like reptiles than mammals.) However, it seems to me that much of the real intellectual action of the case was in the struggle to use different kinds of information about the platypus – including, but not limited to, its mode of reproduction – to see where it belongs in the overall scheme of biological classification. I would have loved to read more about that side of the story. But I guess it can’t be told unless we relax and make good use of our privileged present-day view of the case.

This is not to say that Moyal stays strictly in the past. In the final chapters, she reports on present-day findings about the platypus. These are among the most fascinating chapters. For instance, the platypus’s snout (famously “duck-like” in dead specimens, hence its scientific designation Ornithorhynchus anatinus, or “duck-like bird snout”) is in fact a unique organ for electrolocation. The platypus uses it to locate its prey as it dives with eyes and ears closed. In this respect the platypus is a highly specialized modern species rather than a relic of our evolutionary past. When I read about this, I thought it would have made for a great essay by Stephen Jay Gould. Of course, SJG was way ahead of me: you will find his highly enjoyable take on the story in Bully for Brontosaurus.