Data never tell a story on their own

As a follow-up on last week’s post, here’s Paul Krugman on Nate Silver’s new FiveThirtyEight:

I’d argue that many of the critics are getting the problem wrong. It’s not the reliance on data; numbers can be good, and can even be revelatory. But data never tell a story on their own. They need to be viewed through the lens of some kind of model, and it’s very important to do your best to get a good model. And that usually means turning to experts in whatever field you’re addressing.

A tentative suggestion: It seems to me that when Krugman says “model”, philosophers of science might prefer to say “mechanism”. I don’t think Krugman wants you to use a particular way of representing reality (a model); he wants you to analyze data with reference to the actual entities and interactions in the system under study (its mechanism).

Silver’s Mining Playbook (data mining, that is)

Having left the New York Times last year, Nate Silver has now relaunched FiveThirtyEight. I love Silver’s work, and I think his contribution to the American political discourse is invaluable. His push for “data journalism” is timely and necessary. But then, I would advocate a data-driven approach to most areas of life.

When it comes to the philosophy of science, however, Silver could and should be more sophisticated. This bothered me about his intriguing but flawed book “The Signal and the Noise” (look out for a brief review on this blog in the future), and it became apparent again in this piece introducing the new FiveThirtyEight:

Suppose you did have a credible explanation of why the 2012 election, or the 2014 Super Bowl, or the War of 1812, unfolded as it did. How much does this tell you about how elections or football games or wars play out in general, under circumstances that are similar in some ways but different in other ways?

These are hard questions. No matter how well you understand a discrete event, it can be difficult to tell how much of it was unique to the circumstances, and how many of its lessons are generalizable into principles. But data journalism at least has some coherent methods of generalization. They are borrowed from the scientific method. Generalization is a fundamental concern of science, and it’s achieved by verifying hypotheses through predictions or repeated experiments.

The first of these hyperlinks is to the Stanford Encyclopedia of Philosophy’s entry on scientific progress, and the second is to the entry on Karl Popper. Both are problematic – let me explain.

Popper is famous for advocating the view that there is no such thing as verification. Contrary to what generations of scientists and philosophers of science thought, Popper argued, there is in fact no way to show that some generalization such as “all ravens are black” is true, or even likely to be true. On Popper’s account, all you can do is to refute false generalizations. A hundred black ravens do not show that all ravens are black. Nor do a thousand, or a million. However, a single white raven shows conclusively that “all ravens are black” is false. It is debatable whether this works as a general method of science: The philosophical consequences of such a view are severe, and it is pretty clear that scientists do not actually think like this. But one thing is certain: “verifying hypotheses through prediction or repeated experiments” is not a good characterization of Popper’s position.

Modern opinion on the role of generalizations in science is more divided. Certainly they play a role. But it is doubtful that science’s main goal is to search for generalizations, since these are often not too interesting. Take the above example of “all ravens are black”. This is known not to be true, but suppose it were: Would it be very interesting? We would still not know whether it just happens to be true, or whether there is something lawful about it. I would argue that the goal of science is different: it is to understand causal mechanisms. To return to the ravens, we want to know in detail how the causal mechanisms of raven pigmentation work. This will give us an understanding of why most ravens are black, and also of how the mechanisms of pigmentation can change to produce differently colored ravens. Knowledge of causal mechanisms is far more insightful and useful than knowledge about generalizations. But the transition from generalizations to causal mechanisms is one of the great challenges for data mining approaches such as Silver’s.

(Thanks to my friend Fabio Molo for drawing my attention to Silver’s piece, and to Tim Räz for paronomastic help with the title.)

Mechanisms in 19th century evolutionary thought (or: How Darwin developed natural selection out of Lamarckian inheritance)

The episode of March 21 of the radio program In Our Time with Melvyn Bragg is on Alfred Russel Wallace, the co-discoverer of the principle of natural selection. It is on the whole very good. However, the episode may leave the listener with the wrong impression on one issue – and I think it is wrong in an interesting way.

It is claimed repeatedly in the episode that evolutionists other than Darwin and Wallace did not have a mechanism of evolution. This is true in the somewhat trivial sense that other evolutionists did not have the principle that Darwin and Wallace discovered, and that we still accept: natural selection. It is also true in the less trivial sense that other evolutionists did not have a mechanism that could explain adaptation without presupposing adaptation – that is, as the result of undirected processes. And it is certainly appropriate in a program for a general audience to draw a stark contrast between Darwin’s revolutionary mechanism and everything else.

However, it would be wrong to think that other evolutionists left the question of the mechanism of transformation entirely unanswered. Robert Chambers, the author of the influential The Vestiges of the Natural History of Creation of 1844, thought that God had created biological laws which predetermined the gradual unfolding of increasingly advanced forms of life (in parallel with the equally lawful unfolding of geological changes). We would today reject this sort of mechanism, and we would perhaps even deny that it is a mechanism because (as it turned out) it could not be reduced to more basic interactions. But it was nevertheless an attempt to explain biological structure and diversity by appeal to secondary causes. These secondary causes were in principle amenable to empirical investigation.

The same is true for the numerous evolutionists after Darwin and Wallace who accepted common descent but rejected natural selection as the mechanism of transformation. For instance, “Lamarckists” would have claimed that the main mechanism of transformation is the inheritance of acquired (adaptive) characters – that is, of the blacksmith’s son starting out with a particularly strong biceps. This was called Lamarckism after Jean-Baptiste, whose early theory of evolution included, among other things, a then-commonplace belief in the inheritance of acquired characters. Proponents of “orthogenesis” would have claimed that certain biological laws of development dictated the gradual changing of species (this is related to Chambers’ views). And “saltationists” would have argued that new biological forms come to be in variational leaps from earlier forms – caused by genetic laws yet to be determined.

Now, once we are aware of these alternative notions, some historical questions become much easier to approach and answer. My favorite example at the moment is the question of how Darwin came to formulate the principle of natural selection. Without the context of the alternative views, Darwin and Wallace both managed an almost unimaginable leap of the intellect. Placed within the context, however, we can discern a gradual development of correct ideas out of incorrect ones.

In its most abstract formulation, the principle of natural selection says that in a population with variation within the population, differential survival of some variants, and inheritance of variations, the better adapted forms will increase in frequency over the course of generations. Before Darwin’s notebooks of the years between 1836 and 1839 had been fully evaluated, authors such as Ernst Mayr largely had to speculate about Darwin’s path to the principle of natural selection. It is easy enough to find influences that may have prepared his mind for parts of the principle: For example, variation within populations and what Darwin called the “strong principle of inheritance” were well known to breeders, in whose work Darwin was deeply interested; and Robert Malthus’s Essay on the Principle of Population could have made Darwin aware of competition and differential survival within populations (Malthus famously argued that human populations grow exponentially while their means of subsistence only grow arithmetically). But these facts were widely available before Darwin, and so it remained somewhat mysterious how he (and, independently, Wallace) suddenly managed to put them all together in the principle of natural selection.

In the past decades, historical scholarship has clarified the question of Darwin’s path to natural selection considerably. In the following, I rely largely on Jonathan Hodge’s “The Notebook Programmes and Projects of Darwin’s London Years” in the Cambridge Companion to Darwin, although the original publications on these questions date back to the 1980s.

When Darwin was already assuming transformation and common descent, but before he discovered natural selection, he was apparently thinking about the process in terms of Lamarckism. So organisms acquired new, useful variations through the intensified use of certain organs, and these variations were then transmitted to their descendants (again: think of the blacksmith’s son). This was probably the best candidate for a mechanism of transformation before natural selection came along.

The crucial point is that the Lamarckian inheritance of acquired characters has surprisingly many similarities to natural selection! It is a process where variation within a population occurs, is adaptive, and is heritable. So it is not at all surprising that Darwin would have developed and pursued an interest in the nature of variation and inheritance while thinking about Lamarckism. Of the three main pillars of natural selection (variation, differential survival, and inheritance), two were important within the Lamarckian framework as well.

What seems to have happened when Darwin read Malthus in September of 1838 is that he began to think in earnest about the fate of advantageous (but use-acquired) variations within a population. He reasoned that the usefulness of certain (again: use-acquired) variations would be increased by the fact that there was competition for resources within the population. In essence, he came to regard population pressure as a reinforcement of the transformation of species by the inheritance of acquired characters.

This first step then allowed Darwin – several weeks later – to ask whether it mattered if useful variations came about in a directed (use-acquired) or an undirected (random) way. And the answer was, of course, no: even random variations could offer an advantage to an individual in a within-population struggle for existence.

And now Darwin was ready to formulate two versions of the process of transformation. In version one, variation came about in a directed way (through use and disuse), offered an advantage to the individual, was preserved in the struggle for existence, and was then inherited by the organism’s descendants. In version two, variation came about in an undirected, random way – and the rest was exactly the same, except that now the struggle for existence played a more crucial role in sorting out the favorable from the unfavorable variations.

Darwin later drew an analogy between natural selection and artificial selection by breeders. Artificial selection is “variation” + “selection by breeders” + “inheritance”. In natural selection,”selection by breeders” is replaced by “differential survival in the struggle for life”. For a long time we had to assume that this analogy played an important role in Darwin’s path to natural selection (just as it played an important part as a didactic tool in the first chapters of Darwin’s Origin of Species). This would have made a lot of sense! But as it turns out, the path actually led from the inheritance of acquired characters to natural selection – and Darwin only later saw the analogy between natural and artificial selection. This is a little ironic since Ernst Mayr, for example (in the paper linked above), saw Lamarckian inheritance purely as something that Darwin had to overcome in order to find natural selection. In truth, however, Lamarckian inheritance was not so much a hindrance on Darwin’s path to natural selection as it was a stepping stone.

Thus, Darwin’s correct mechanism grew out of his earlier belief in the incorrect mechanism of the inheritance of acquired characters, and so the discovery becomes somewhat less mysterious (although no less of an accomplishment). To see this, however, we have to be aware that evolutionists in the 19th century did have mechanisms other than natural selection. Without his earlier, false beliefs, Darwin might never have found natural selection at all. What I do not know (but will try to find out) is whether Wallace’s discovery followed a similar path.