The Decline Effect and the Scientific Method

By Jonah Lehrer

On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties. The drugs, sold under brand names such as Abilify, Seroquel, and Zyprexa, had been tested on schizophrenics in several large clinical trials, all of which had demonstrated a dramatic decrease in the subjects’ psychiatric symptoms. As a result, second-generation antipsychotics had become one of the fastest-growing and most profitable pharmaceutical classes. By 2001, Eli Lilly’s Zyprexa was generating more revenue than Prozac. It remains the company’s top-selling drug.

But the data presented at the Brussels meeting made it clear that something strange was happening: the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t any better than first-generation antipsychotics, which have been in use since the fifties. “In fact, sometimes they now look even worse,” John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.

Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.

But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.

For many scientists, the effect is especially troubling because of what it exposes about the scientific process. If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe? Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to “put nature to the question.” But it appears that nature often gives us different answers.

Jonathan Schooler was a young graduate student at the University of Washington in the nineteen-eighties when he discovered a surprising new fact about language and memory. At the time, it was widely believed that the act of describing our memories improved them. But, in a series of clever experiments, Schooler demonstrated that subjects shown a face and asked to describe it were much less likely to recognize the face when shown it later than those who had simply looked at it. Schooler called the phenomenon “verbal overshadowing.”

The study turned him into an academic star. Since its initial publication, in 1990, it has been cited more than four hundred times. Before long, Schooler had extended the model to a variety of other tasks, such as remembering the taste of a wine, identifying the best strawberry jam, and solving difficult creative puzzles. In each instance, asking people to put their perceptions into words led to dramatic decreases in performance.

But while Schooler was publishing these results in highly reputable journals, a secret worry gnawed at him: it was proving difficult to replicate his earlier findings. “I’d often still see an effect, but the effect just wouldn’t be as strong,” he told me. “It was as if verbal overshadowing, my big new idea, was getting weaker.” At first, he assumed that he’d made an error in experimental design or a statistical miscalculation. But he couldn’t find anything wrong with his research. He then concluded that his initial batch of research subjects must have been unusually susceptible to verbal overshadowing. (John Davis, similarly, has speculated that part of the drop-off in the effectiveness of antipsychotics can be attributed to using subjects who suffer from milder forms of psychosis which are less likely to show dramatic improvement.) “It wasn’t a very satisfying explanation,” Schooler says. “One of my mentors told me that my real mistake was trying to replicate my work. He told me doing that was just setting myself up for disappointment.”

Schooler tried to put the problem out of his mind; his colleagues assured him that such things happened all the time. Over the next few years, he found new research questions, got married and had kids. But his replication problem kept on getting worse. His first attempt at replicating the 1990 study, in 1995, resulted in an effect that was thirty per cent smaller. The next year, the size of the effect shrank another thirty per cent. When other labs repeated Schooler’s experiments, they got a similar spread of data, with a distinct downward trend. “This was profoundly frustrating,” he says. “It was as if nature gave me this great result and then tried to take it back.” In private, Schooler began referring to the problem as “cosmic habituation,” by analogy to the decrease in response that occurs when individuals habituate to particular stimuli. “Habituation is why you don’t notice the stuff that’s always there,” Schooler says. “It’s an inevitable process of adjustment, a ratcheting down of excitement. I started joking that it was like the cosmos was habituating to my ideas. I took it very personally.”

Schooler is now a tenured professor at the University of California at Santa Barbara. He has curly black hair, pale-green eyes, and the relaxed demeanor of someone who lives five minutes away from his favorite beach. When he speaks, he tends to get distracted by his own digressions. He might begin with a point about memory, which reminds him of a favorite William James quote, which inspires a long soliloquy on the importance of introspection. Before long, we’re looking at pictures from Burning Man on his iPhone, which leads us back to the fragile nature of memory.

Although verbal overshadowing remains a widely accepted theory—it’s often invoked in the context of eyewitness testimony, for instance—Schooler is still a little peeved at the cosmos. “I know I should just move on already,” he says. “I really should stop talking about this. But I can’t.” That’s because he is convinced that he has stumbled on a serious problem, one that afflicts many of the most exciting new ideas in psychology.

One of the first demonstrations of this mysterious phenomenon came in the early nineteen-thirties. Joseph Banks Rhine, a psychologist at Duke, had developed an interest in the possibility of extrasensory perception, or E.S.P. Rhine devised an experiment featuring Zener cards, a special deck of twenty-five cards printed with one of five different symbols: a card was drawn from the deck and the subject was asked to guess the symbol. Most of Rhine’s subjects guessed about twenty per cent of the cards correctly, as you’d expect, but an undergraduate named Adam Linzmayer averaged nearly fifty per cent during his initial sessions, and pulled off several uncanny streaks, such as guessing nine cards in a row. The odds of this happening by chance are about one in two million. Linzmayer did it three times.

Rhine documented these stunning results in his notebook and prepared several papers for publication. But then, just as he began to believe in the possibility of extrasensory perception, the student lost his spooky talent. Between 1931 and 1933, Linzmayer guessed at the identity of another several thousand cards, but his success rate was now barely above chance. Rhine was forced to conclude that the student’s “extra-sensory perception ability has gone through a marked decline.” And Linzmayer wasn’t the only subject to experience such a drop-off: in nearly every case in which Rhine and others documented E.S.P. the effect dramatically diminished over time. Rhine called this trend the “decline effect.”

Schooler was fascinated by Rhine’s experimental struggles. Here was a scientist who had repeatedly documented the decline of his data; he seemed to have a talent for finding results that fell apart. In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhine’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”—a standard statistical measure—“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhine’s,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”

The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time! Hell, it’s happened to me multiple times.” And this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics. “Whenever I start talking about this, scientists get very nervous,” he says. “But I still want to know what happened to my results. Like most scientists, I assumed that it would get easier to document my effect over time. I’d get better at doing the experiments, at zeroing in on the conditions that produce verbal overshadowing. So why did the opposite happen? I’m convinced that we can use the tools of science to figure this out. First, though, we have to admit that we’ve got a problem.”

In 1991, the Danish zoologist Anders Møller, at Uppsala University, in Sweden, made a remarkable discovery about sex, barn swallows, and symmetry. It had long been known that the asymmetrical appearance of a creature was directly linked to the amount of mutation in its genome, so that more mutations led to more “fluctuating asymmetry.” (An easy way to measure asymmetry in humans is to compare the length of the fingers on each hand.) What Møller discovered is that female barn swallows were far more likely to mate with male birds that had long, symmetrical feathers. This suggested that the picky females were using symmetry as a proxy for the quality of male genes. Møller’s paper, which was published in Nature, set off a frenzy of research. Here was an easily measured, widely applicable indicator of genetic quality, and females could be shown to gravitate toward it. Aesthetics was really about genetics.

In the three years following, there were ten independent tests of the role of fluctuating asymmetry in sexual selection, and nine of them found a relationship between symmetry and male reproductive success. It didn’t matter if scientists were looking at the hairs on fruit flies or replicating the swallow studies—females seemed to prefer males with mirrored halves. Before long, the theory was applied to humans. Researchers found, for instance, that women preferred the smell of symmetrical men, but only during the fertile phase of the menstrual cycle. Other studies claimed that females had more orgasms when their partners were symmetrical, while a paper by anthropologists at Rutgers analyzed forty Jamaican dance routines and discovered that symmetrical men were consistently rated as better dancers.

Then the theory started to fall apart. In 1994, there were fourteen published tests of symmetry and sexual selection, and only eight found a correlation. In 1995, there were eight papers on the subject, and only four got a positive result. By 1998, when there were twelve additional investigations of fluctuating asymmetry, only a third of them confirmed the theory. Worse still, even the studies that yielded some positive result showed a steadily declining effect size. Between 1992 and 1997, the average effect size shrank by eighty per cent.

And it’s not just fluctuating asymmetry. In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for—Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis—there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”

What happened? Leigh Simmons, a biologist at the University of Western Australia, suggested one explanation when he told me about his initial enthusiasm for the theory: “I was really excited by fluctuating asymmetry. The early studies made the effect look very robust.” He decided to conduct a few experiments of his own, investigating symmetry in male horned beetles. “Unfortunately, I couldn’t find the effect,” he said. “But the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” For Simmons, the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.

Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for. A “significant” result is defined as any data point that would be produced by chance less than five per cent of the time. This ubiquitous test was invented in 1922 by the English mathematician Ronald Fisher, who picked five per cent as the boundary line, somewhat arbitrarily, because it made pencil and slide-rule calculations easier. Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.

Read the enitre New Yorker article, here.


About the Author

20 Responses to The Decline Effect and the Scientific Method

  1. Science is imperfect. Still, to date there is no better means for treating schizophrenia than anti-psychotic meds, this coming from a man dedicated to the dialogic solutions offered by talk therapies.

    ESP trials also have data decline (see Intro to Parapsychology: Irwin & Watt. Great book!) but this is due in part to the weak design of some of the studies not to scientific method. I think that science bashing only goes so far… science is reductive… it is designed to be. Combine that with philosophical productive methods and you have a nice pairing.

    Spiritual traditions have not done better in either treating Schizophrenia or proving/teaching ESP. I hardly think that publishing more null findings will help this. I mean how many people here even read the positive findings of science? Who would spend hours reading published articles that conclude with, we found nothing? In science it is assumed that there is no effect and the hypothesis is intended to suggest that there is an effect. If there is no effect we go back to assuming there isn’t. Null is given, no need to publish it. It is null to publish null.

    • I agree with what you said but at least in some places in the article the point was that after positive results are published, negative results should not be rejected. If someone is studying something for the first time and the results are null, I agree, why publish it. But if people are enthusiastic about a slew of positive results and reject negative ones on the basis of this enthusiasm, that does seem like a real problem, assuming the procedures leading to the negative results were indeed up to standard.

    • Referring to this article as “science bashing” reveals a very strong bias.

    • From what I have heard the new generation anti-psychotics have been very valuable for the treatment of schizophrenia because they have a lot less harmful and debilitating side-effects. From what I have also heard there is sometimes a lot of trial and error with the drugs in treating schizophrenia. They may work for some time and then the drug may have to be changed.

      I think that it is also interesting to see not only how scientific understanding may be subject to bias but beyond that how scientific understandings are applied. For instance, there is a popular scientific understanding that lower levels of serotonin in the brain cause depression. Based on this understanding there are a huge amount of SSRI’s prescribed not by psychiatrists but by general practitioners without the application of talk therapy. Despite the evidence which suggests that an SSRI without talk therapy will probably not be very effective for treatment of depression.

      So scientific evidence may often be applied in a very narrow minded and non-holistic way. As if we can just take a pill and depression will be stopped.

      • These are good points, Atma. I agree that examining the application of such scientific evidence is also very telling.

        Geez,if second generation drugs like Zyprexa have less harmful and debilitating side-effects that is pretty darn scary.

  2. It is amazing to consider how much the human self-concept and worldview is invested in and shaped by the notion of scientifically verifiable objective truth. Perhaps that plays a role in the reluctance to examine the decline effect.

    Maybe there is a new type of Copernican Revolution on the horizon.

  3. Do you think the main point of this article is only publication bias? There’s nothing really surprising about that. As you say – who wants to read about nothing? But publication bias was the explanation of only one scientist for this decline effect. This same scientist also said the decline effect is troubling and he didn’t want to talk about it publicly.

    This is because this decline effect has worrying implications for the scientific method itself. It may be the best method we think we have, but its limitations and imperfections are not recognized (or advertised) by most scientists.

    Although spiritual traditions may not be good for treating schizophrenia etc, they aim at the heart of the problem and provide a solution to all forms of suffering. The recent rise in outspoken atheism can be attributed partly to the over reliance on the scientific method and a failure to recognize its inherent limitations, thereby leading to too much faith in its ability to discover all truths.

    From the spiritual perspective, this attitude is cause for real concern. Will we waste our human birth in only seeking solutions to temporary problems and spend all our time trying to improve the short life we have here on earth because we believe its all there is?

  4. It is human to be biased… Who is not? Isn’t the idea behind posting the article to undermine the overvaluing of science? That shows bias as well.

    By the way, I am not an empiricist or positivist… You know that guru maharaja,, I am post-modern Perspectival Realist! I just like empiricism as a perspective. I am very into subjectivity rather than objectivity…so i say viva la bias!

    • I think the idea behind the article is to present an objective reality concerning science that scientists themselves are concerned with and we should be as well. Yes it’s human to be biased, but not scientific (theoretically).

  5. Maybe getting older is making me cynical. Sorry Gurumaharaja.

    Maybe this article attempts to be objective. However, I should say, even if there is such a thing as ‘objective data’ there is no such thing as objective reading of the data. It always requires a reader and thus a subjective interpretation. This is not a bad thing… it just is. There is no such thing as a text…just a text & reader dyad. (I think I proved that in my original post!!!)

    If a tree falls in the forest and no one is there, does it make a sound? Who cares..?. No experience without an experiencer. Post modernism!

    This just reminded me of our talk at my house last month. When I was singing the glories of Perspectival Realism; the philosophical position that states ‘reality can only be known as a subjective perspective and multiple perspectives may approach closer to what we would call truth or reality. And you said something like: Don’t forget that Reality may also be revealing him/herself as well and Reality may offer different and fuller perspectives to the subjects with whom he/she communicates. This suggestion was very moving as it says Reality is a subject as well, a supersubject, Reality the beautiful. This is a lovely theological argument; don’t reduce Reality to an ‘it’ but allow it to be a ‘Thou’, a subject rather than an object.

  6. Perhaps human consciousnes is changing, evolving, becoming increasingly more and more difficult to manipulate using simple chemical methods? A lot of things seem to point in that direction.

    • What about altering the brain with targeted electromagnetic influence? Are those experiments still going on or is that fringe science?

      • Electro-convulsive therapy is still used to treat severe depression. Its kind of the last line of treatment. But it has been refined quite a bit from days gone by. It is very safe treatment from what I have heard.
        There is a treatment now available called deep brain stimulation. Some type of wires are actually surgically implanted in the brain and then a pacemaker type of device is used to stimulate the targeted area of the brain.

      • I’m not sure if that methodology is still commonly used. Chemical methods are known to sometimes produce spectacular and very tangible results, which unfortunatelly are often not sustainable, as depicted in a very interesting movie “Awakenings” based on true events.

        • Eletro Convulsive Therapy is still common as a last level defense against depression. It causes damage to memory systems sometimes and some discomfort after the treatment. But it is still one of the most effective treatments for deep depression.

  7. This article made me think of how the Decline Effect almost always plays out (unfortunately) in devotees’ spiritual lives as well. First the results are amazing, chanting is blissful, service is enjoyable and the mind is clean of doubts. But then the decline starts kicking in and the original results are almost impossible to reproduce. And before we know it we’re habituated into a compromised devotional life in a “maintenance” mode.
    The mysterious effect is surely not problematic only for the sciences . . .

  8. I had sent an article to a practitioner of Chaitanya Vaisnavism and he responded like this:
    There really are no other sources of knowledge. Only pratyaksa. All other forms are based on pratyaksa. Inference is based on the prior experience of invariable concomitance between a cause and an effect (fire and smoke). Verbal authority (sabda or apta) is based on the prior pratyaksa of authorities which they later communicate to others through speech (that too is based on pratyaksa, the accuracy of the hearing abilities or reading abilities of the audience). It is all pratyaksa “before the eyes or senses.” All of those so-called other sources of knowledge that you listed boil down to mere pratyaksa.

    No one denies that the senses are unreliable. But that does not discredit science. Science has learned to work around that flaw by repeating experiments. refining measurements and confirming or disconfirming the work of others. Besides, there is the simple fact that science works. Anomalies in the data are potential new discoveries or an indication that something is wrong with the instrumentation. Scientists do not ignore the anomalies, but try to discover their causes, in other words they do more science. Have you ever heard of a religion doing that? Religions hide behind the veil of immunity to disproof. They are not based on any knowledge but instead a lack of it (which is called faith). Of course, as human beings we are filled with all kinds of ignorance and thus we are infested with all kinds of faith. The goal is not to strengthen those faiths but to replace them with knowledge or direct experience. Faith is at best a temporary support to get one started down the path of sadhana. Once one has advanced down that path to some degree one no longer needs faith.

    • Is there a kind of knowledge that pratyaksa alone or even pratyaksa and reason together cannot arrive at? There is, and it is known through revelation, in relation to which we have to engage our senses and reason. Faith in revelation affords one a different kind of knowledge. Placing one’s faith in revelation and reasoning about it is theology, as opposed to mere philosophy. The subsequent knowledge arrived at through faith in spiritual practice is ultimately prema. Does prema do away with faith or does prema in some way constitute the culmination of faith? There is a world of doubt. Why not a world of faith. Thinking like this makes faith living and tangible. Absence of doubt is faith. Faith is knowledge.

      • Thank you Maharaja. I will forward this reply. The idea above is that the revelation in the first place does come through the medium of the purified senses, mind and intellect of advanced rishis in the past. So the infinite meets the finite through the senses, mind and intellect that are purified and ready to receive it. So here the word pratyaksha is broadened to include revelation.

  9. Sorry I am not able to edit my post above. The idea in the Bhagavata is that Vyasa had direct experience of Isvara and his sakti through his purified senses and he kind of advances the revelation further in the Bhagavata. He starts with some element of doubt about previous work and then through deep experience goes further. That he could do based on direct experience, which certainly had contents that were different from previous sabda. So revelation is one sense is pratyaksha for the advanced soul. I think BVT distinguishes between faith and belief and many people understand faith to be belief. Belief is just about getting things right, like belief in resurrection, and is tied to their religious belief system and religious conception . Faith is a deeper quality of the soul that continue even if the content of the faith moves according to the changes presented by the times. So I think it would be good to really clarify what faith means like BVT did previously.

Leave a Reply to alex Cancel reply

Your email address will not be published. Required fields are marked *

Back to Top ↑