A Problem with Scientific Truth

This article is a must read forevery scientist.  The first take home isthat weak statistical signals are profoundly untrustworthy.  It has been my inclination to not trust themanyway, but this informs me that we need to be rigorous about it all.  How many attempts at replication suddenlymatters.

How much is observer bias drivingthe results however tight the experiment? Remember hypnotic suggestion?  Didyou just see that bat at the window?

This is an extraordinary findingand must be kept in mind, however good the results.  Far too much of what we do is borderline tostart and this natural bias is the nasty in the wood shed.  It really makes small sample tests extremelysuspect.

What needs to be investigated carefullyin short sample work is the discarding of one failed test or the addition ofone viable test into the sample.  Far toomuch work has been based on short samples because of outright cost and neverchecked with far larger samples.  Thathas always given me disquiet.


Is there somethingwrong with the scientific method?

DECEMBER 13, 2010

Many results that are rigorously proved andaccepted start shrinking in later studies.

On September 18, 2007,a few dozen neuroscientists, psychiatrists, and drug-company executivesgathered in a hotel conference room in Brusselsto hear some startling news. It had to do with a class of drugs known asatypical or second-generation antipsychotics, which came on the market in theearly nineties. The drugs, sold under brand names such as Abilify, Seroquel,and Zyprexa, had been tested on schizophrenics in several large clinicaltrials, all of which had demonstrated a dramatic decrease in the subjects’psychiatric symptoms. As a result, second-generation antipsychotics had becomeone of the fastest-growing and most profitable pharmaceutical classes. By 2001,Eli Lilly’s Zyprexa was generating more revenue than Prozac. It remains thecompany’s top-selling drug.
But the data presentedat the Brusselsmeeting made it clear that something strange was happening: the therapeuticpower of the drugs appeared to be steadily waning. A recent study showed aneffect that was less than half of that documented in the first trials, in theearly nineteen-nineties. Many researchers began to argue that the expensivepharmaceuticals weren’t any better than first-generation antipsychotics, whichhave been in use since the fifties. “In fact, sometimes they now look evenworse,” John Davis, a professor of psychiatry at the Universityof Illinois at Chicago, told me.
Before theeffectiveness of a drug can be confirmed, it must be tested and tested again.Different scientists in different labs need to repeat the protocols and publishtheir results. The test of replicability, as it’s known, is the foundation ofmodern research. Replicability is how the community enforces itself. It’s asafeguard for the creep of subjectivity. Most of the time, scientists know whatresults they want, and that can influence the results they get. The premise ofreplicability is that the scientific community can correct for these flaws.
But now all sorts ofwell-established, multiply confirmed findings have started to look increasinglyuncertain. It’s as if our facts were losing their truth: claims that have beenenshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yethave an official name, but it’s occurring across a wide range of fields, frompsychology to ecology. In the field of medicine, the phenomenon seems extremelywidespread, affecting not only antipsychotics but also therapies ranging fromcardiac stents to Vitamin E and antidepressants: Davis has a forthcominganalysis demonstrating that the efficacy of antidepressants has gone down asmuch as threefold in recent decades.
For many scientists,the effect is especially troubling because of what it exposes about thescientific process. If replication is what separates the rigor of science fromthe squishiness of pseudoscience, where do we put all these rigorouslyvalidated findings that can no longer be proved? Which results should webelieve? Francis Bacon, the early-modern philosopher and pioneer of thescientific method, once declared that experiments were essential, because theyallowed us to “put nature to the question.” But it appears that nature oftengives us different answers.
Jonathan Schooler wasa young graduate student at the University of Washington in thenineteen-eighties when he discovered a surprising new fact about language andmemory. At the time, it was widely believed that the act of describing our memoriesimproved them. But, in a series of clever experiments, Schooler demonstratedthat subjects shown a face and asked to describe it were much less likely torecognize the face when shown it later than those who had simply looked at it.Schooler called the phenomenon “verbal overshadowing.”
The study turned himinto an academic star. Since its initial publication, in 1990, it has beencited more than four hundred times. Before long, Schooler had extended themodel to a variety of other tasks, such as remembering the taste of a wine,identifying the best strawberry jam, and solving difficult creative puzzles. Ineach instance, asking people to put their perceptions into words led todramatic decreases in performance.
But while Schooler waspublishing these results in highly reputable journals, a secret worry gnawed athim: it was proving difficult to replicate his earlier findings. “I’d oftenstill see an effect, but the effect just wouldn’t be as strong,” he told me.“It was as if verbal overshadowing, my big new idea, was getting weaker.” Atfirst, he assumed that he’d made an error in experimental design or astatistical miscalculation. But he couldn’t find anything wrong with hisresearch. He then concluded that his initial batch of research subjects musthave been unusually susceptible to verbal overshadowing. (John Davis,similarly, has speculated that part of the drop-off in the effectiveness ofantipsychotics can be attributed to using subjects who suffer from milder formsof psychosis which are less likely to show dramatic improvement.) “It wasn’t avery satisfying explanation,” Schooler says. “One of my mentors told me that myreal mistake was trying to replicate my work. He told me doing that was justsetting myself up for disappointment.”
Schooler tried to putthe problem out of his mind; his colleagues assured him that such thingshappened all the time. Over the next few years, he found new researchquestions, got married and had kids. But his replication problem kept ongetting worse. His first attempt at replicating the 1990 study, in 1995,resulted in an effect that was thirty per cent smaller. The next year, the sizeof the effect shrank another thirty per cent. When other labs repeatedSchooler’s experiments, they got a similar spread of data, with a distinctdownward trend. “This was profoundly frustrating,” he says. “It was as ifnature gave me this great result and then tried to take it back.” In private,Schooler began referring to the problem as “cosmic habituation,” by analogy tothe decrease in response that occurs when individuals habituate to particularstimuli. “Habituation is why you don’t notice the stuff that’s always there,”Schooler says. “It’s an inevitable process of adjustment, a ratcheting down ofexcitement. I started joking that it was like the cosmos was habituating to myideas. I took it very personally.”
Schooler is now atenured professor at the University ofCalifornia at Santa Barbara. He has curly black hair,pale-green eyes, and the relaxed demeanor of someone who lives five minutesaway from his favorite beach. When he speaks, he tends to get distracted by hisown digressions. He might begin with a point about memory, which reminds him ofa favorite William James quote, which inspires a long soliloquy on theimportance of introspection. Before long, we’re looking at pictures fromBurning Man on his iPhone, which leads us back to the fragile nature of memory.
Although verbalovershadowing remains a widely accepted theory—it’s often invoked in thecontext of eyewitness testimony, for instance—Schooler is still a little peevedat the cosmos. “I know I should just move on already,” he says. “I reallyshould stop talking about this. But I can’t.” That’s because he is convincedthat he has stumbled on a serious problem, one that afflicts many of the mostexciting new ideas in psychology.
One of the firstdemonstrations of this mysterious phenomenon came in the earlynineteen-thirties. Joseph Banks Rhine, a psychologist at Duke, had developed aninterest in the possibility of extrasensory perception, or E.S.P. Rhine devisedan experiment featuring Zener cards, a special deck of twenty-five cardsprinted with one of five different symbols: a card was drawn from the deck andthe subject was asked to guess the symbol. Most of Rhine’s subjects guessedabout twenty per cent of the cards correctly, as you’d expect, but anundergraduate named Adam Linzmayer averaged nearly fifty per cent during hisinitial sessions, and pulled off several uncanny streaks, such as guessing ninecards in a row. The odds of this happening by chance are about one in twomillion. Linzmayer did it three times.
Rhinedocumented these stunning results in his notebook and prepared several papersfor publication. But then, just as he began to believe in the possibility ofextrasensory perception, the student lost his spooky talent. Between 1931 and1933, Linzmayer guessed at the identity of another several thousand cards, buthis success rate was now barely above chance. Rhinewas forced to conclude that the student’s “extra-sensory perception ability hasgone through a marked decline.” And Linzmayer wasn’t the only subject toexperience such a drop-off: in nearly every case in which Rhine and othersdocumented E.S.P. the effect dramatically diminished over time. Rhine called this trend the “decline effect.”
Schooler wasfascinated by Rhine’s experimental struggles.Here was a scientist who had repeatedly documented the decline of his data; heseemed to have a talent for finding results that fell apart. In 2004, Schoolerembarked on an ironic imitation of Rhine’sresearch: he tried to replicate this failure to replicate. In homage to Rhine’s interests, he decided to test for aparapsychological phenomenon known as precognition. The experiment itself wasstraightforward: he flashed a set of images to a subject and asked him or herto identify each one. Most of the time, the response was negative—the imageswere displayed too quickly to register. Then Schooler randomly selected half ofthe images to be shown again. What he wanted to know was whether the imagesthat got a second showing were more likely to have been identified the firsttime around. Could subsequent exposure have somehow influenced the initialresults? Could the effect become the cause?
The craziness of the hypothesiswas the point: Schooler knows that precognition lacks a scientific explanation.But he wasn’t testing extrasensory powers; he was testing the decline effect.“At first, the data looked amazing, just as we’d expected,” Schooler says. “Icouldn’t believe the amount of precognition we were finding. But then, as wekept on running subjects, the effect size”—a standard statistical measure—“kepton getting smaller and smaller.” The scientists eventually tested more than twothousand undergraduates. “In the end, our results looked just like Rhine’s,” Schooler said. “We found this strong paranormaleffect, but it disappeared on us.”
The most likelyexplanation for the decline is an obvious one: regression to the mean. As theexperiment is repeated, that is, an early statistical fluke gets cancelled out.The extrasensory powers of Schooler’s subjects didn’t decline—they were simplyan illusion that vanished over time. And yet Schooler has noticed that many ofthe data sets that end up declining seem statistically solid—that is, theycontain enough data that any regression to the mean shouldn’t be dramatic.“These are the results that pass all the tests,” he says. “The odds of thembeing random are typically quite remote, like one in a million. This means thatthe decline effect should almost never happen. But it happens all the time!Hell, it’s happened to me multiple times.” And this is why Schooler believesthat the decline effect deserves more attention: its ubiquity seems to violatethe laws of statistics. “Whenever I start talking about this, scientists getvery nervous,” he says. “But I still want to know what happened to my results.Like most scientists, I assumed that it would get easier to document my effectover time. I’d get better at doing the experiments, at zeroing in on theconditions that produce verbal overshadowing. So why did the opposite happen?I’m convinced that we can use the tools of science to figure this out. First,though, we have to admit that we’ve got a problem.”
In 1991, the Danishzoologist Anders Møller, at Uppsala University, in Sweden, made a remarkable discoveryabout sex, barn swallows, and symmetry. It had long been known that theasymmetrical appearance of a creature was directly linked to the amount ofmutation in its genome, so that more mutations led to more “fluctuatingasymmetry.” (An easy way to measure asymmetry in humans is to compare thelength of the fingers on each hand.) What Møller discovered is that female barnswallows were far more likely to mate with male birds that had long,symmetrical feathers. This suggested that the picky females were using symmetryas a proxy for the quality of male genes. Møller’s paper, which was publishedin Nature, set off afrenzy of research. Here was an easily measured, widely applicable indicator ofgenetic quality, and females could be shown to gravitate toward it. Aestheticswas really about genetics.
In the three yearsfollowing, there were ten independent tests of the role of fluctuatingasymmetry in sexual selection, and nine of them found a relationship betweensymmetry and male reproductive success. It didn’t matter if scientists werelooking at the hairs on fruit flies or replicating the swallow studies—femalesseemed to prefer males with mirrored halves. Before long, the theory wasapplied to humans. Researchers found, for instance, that women preferred thesmell of symmetrical men, but only during the fertile phase of the menstrualcycle. Other studies claimed that females had more orgasms when their partnerswere symmetrical, while a paper by anthropologists at Rutgersanalyzed forty Jamaican dance routines and discovered that symmetrical men wereconsistently rated as better dancers.
Then the theorystarted to fall apart. In 1994, there were fourteen published tests of symmetryand sexual selection, and only eight found a correlation. In 1995, there wereeight papers on the subject, and only four got a positive result. By 1998, whenthere were twelve additional investigations of fluctuating asymmetry, only athird of them confirmed the theory. Worse still, even the studies that yieldedsome positive result showed a steadily declining effect size. Between 1992 and1997, the average effect size shrank by eighty per cent.
And it’s not justfluctuating asymmetry. In 2001, Michael Jennions, a biologist at the Australian National University,set out to analyze “temporal trends” across a wide range of subjects in ecologyand evolutionary biology. He looked at hundreds of papers and forty-fourmeta-analyses (that is, statistical syntheses of related studies), anddiscovered a consistent decline effect over time, as many of the theoriesseemed to fade into irrelevance. In fact, even when numerous variables werecontrolled for—Jennions knew, for instance, that the same author might publishseveral critical papers, which could distort his analysis—there was still asignificant decrease in the validity of the hypothesis, often within a year ofpublication. Jennions admits that his findings are troubling, but expresses areluctance to talk about them publicly. “This is a very sensitive issue forscientists,” he says. “You know, we’re supposed to be dealing with hard facts,the stuff that’s supposed to stand the test of time. But when you see thesetrends you become a little more skeptical of things.”
What happened? LeighSimmons, a biologist at the University of Western Australia, suggested oneexplanation when he told me about his initial enthusiasm for the theory: “I wasreally excited by fluctuating asymmetry. The early studies made the effect lookvery robust.” He decided to conduct a few experiments of his own, investigatingsymmetry in male horned beetles. “Unfortunately, I couldn’t find the effect,”he said. “But the worst part was that when I submitted these null results I haddifficulty getting them published. The journals only wanted confirming data. Itwas too exciting an idea to disprove, at least back then.” For Simmons, thesteep rise and slow fall of fluctuating asymmetry is a clear example of ascientific paradigm, one of those intellectual fads that both guide andconstrain research: after a new paradigm is proposed, the peer-review processis tilted toward positive results. But then, after a few years, the academicincentives shift—the paradigm has become entrenched—so that the most notableresults are now those that disprove the theory.
Jennions, similarly,argues that the decline effect is largely a product of publication bias, or thetendency of scientists and scientific journals to prefer positive data overnull results, which is what happens when no effect is found. The bias was firstidentified by the statistician Theodore Sterling, in 1959, after he noticedthat ninety-seven per cent of all published psychological studies withstatistically significant data found the effect they were looking for. A“significant” result is defined as any data point that would be produced bychance less than five per cent of the time. This ubiquitous test was inventedin 1922 by the English mathematician Ronald Fisher, who picked five per cent asthe boundary line, somewhat arbitrarily, because it made pencil and slide-rulecalculations easier. Sterlingsaw that if ninety-seven per cent of psychology studies were proving theirhypotheses, either psychologists were extraordinarily lucky or they publishedonly the outcomes of successful experiments. In recent years, publication biashas mostly been seen as a problem for clinical trials, since pharmaceuticalcompanies are less interested in publishing results that aren’t favorable. Butit’s becoming increasingly clear that publication bias also produces majordistortions in fields without large corporate incentives, such as psychologyand ecology.
While publication biasalmost certainly plays a role in the decline effect, it remains an incompleteexplanation. For one thing, it fails to account for the initial prevalence ofpositive results among studies that never even get submitted to journals. Italso fails to explain the experience of people like Schooler, who have beenunable to replicate their initial data despite their best efforts. RichardPalmer, a biologist at the University of Alberta, who has studied the problemssurrounding fluctuating asymmetry, suspects that an equally significant issueis the selective reporting of results—the data that scientists choose todocument in the first place. Palmer’s most convincing evidence relies on astatistical tool known as a funnel graph. When a large number of studies havebeen done on a single subject, the data should follow a pattern: studies with alarge sample size should all cluster around a common value—the trueresult—whereas those with a smaller sample size should exhibit a randomscattering, since they’re subject to greater sampling error. This pattern givesthe graph its name, since the distribution resembles a funnel.
The funnel graphvisually captures the distortions of selective reporting. For instance, afterPalmer plotted every study of fluctuating asymmetry, he noticed that thedistribution of results with smaller sample sizes wasn’t random at all butinstead skewed heavily toward positive results. Palmer has since documented asimilar problem in several other contested subject areas. “Once I realized thatselective reporting is everywhere in science, I got quite depressed,” Palmertold me. “As a researcher, you’re always aware that there might be somenonrandom patterns, but I had no idea how widespread it is.” In a recent reviewarticle, Palmer summarized the impact of selective reporting on his field: “Wecannot escape the troubling conclusion that some—perhaps many—cherishedgeneralities are at best exaggerated in their biological significance and atworst a collective illusion nurtured by strong a-priori beliefs oftenrepeated.”
Palmer emphasizes thatselective reporting is not the same as scientific fraud. Rather, the problemseems to be one of subtle omissions and unconscious misperceptions, asresearchers struggle to make sense of their results. Stephen Jay Gould referredto this as the “shoehorning” process. “A lot of scientific measurement isreally hard,” Simmons told me. “If you’re talking about fluctuating asymmetry,then it’s a matter of minuscule differences between the right and left sides ofan animal. It’s millimetres of a tail feather. And so maybe a researcher knowsthat he’s measuring a good male”—an animal that has successfully mated—“and heknows that it’s supposed to be symmetrical. Well, that act of measurement isgoing to be vulnerable to all sorts of perception biases. That’s not a cynicalstatement. That’s just the way human beings work.”
One of the classicexamples of selective reporting concerns the testing of acupuncture indifferent countries. While acupuncture is widely accepted as a medicaltreatment in various Asian countries, its use is much more contested in theWest. These cultural differences have profoundly influenced the results ofclinical trials. Between 1966 and 1995, there were forty-seven studies ofacupuncture in China, Taiwan, and Japan, and every single trialconcluded that acupuncture was an effective treatment. During the same period,there were ninety-four clinical trials of acupuncture in the United States, Sweden,and the U.K.,and only fifty-six per cent of these studies found any therapeutic benefits. AsPalmer notes, this wide discrepancy suggests that scientists find ways toconfirm their preferred hypothesis, disregarding what they don’t want to see.Our beliefs are a form of blindness.
John Ioannidis, anepidemiologist at Stanford University, argues that such distortions are aserious issue in biomedical research. “These exaggerations are why the declinehas become so common,” he says. “It’d be really great if the initial studiesgave us an accurate summary of things. But they don’t. And so what happens iswe waste a lot of money treating millions of patients and doing lots offollow-up studies on other themes based on results that are misleading.” In2005, Ioannidis published an article in the Journal of the American Medical Association that looked atthe forty-nine most cited clinical-research studies in three major medicaljournals. Forty-five of these studies reported positive results, suggestingthat the intervention being tested was effective. Because most of these studieswere randomized controlled trials—the “gold standard” of medical evidence—theytended to have a significant impact on clinical practice, and led to the spreadof treatments such as hormone replacement therapy for menopausal women anddaily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, thedata Ioannidis found were disturbing: of the thirty-four claims that had beensubject to replication, forty-one per cent had either been directlycontradicted or had their effect sizes significantly downgraded.
The situation is evenworse when a subject is fashionable. In recent years, for instance, there havebeen hundreds of studies on the various genes that control the differences indisease risk between men and women. These findings have included everythingfrom the mutations responsible for the increased risk of schizophrenia to thegenes underlying hypertension. Ioannidis and his colleagues looked at fourhundred and thirty-two of these claims. They quickly discovered that the vastmajority had serious flaws. But the most troubling fact emerged when he lookedat the test of replication: out of four hundred and thirty-two claims, only asingle one was consistently replicable. “This doesn’t mean that none of theseclaims will turn out to be true,” he says. “But, given that most of them weredone badly, I wouldn’t hold my breath.”
According toIoannidis, the main problem is that too many researchers engage in what hecalls “significance chasing,” or finding ways to interpret the data so that itpasses the statistical test of significance—the ninety-five-per-cent boundaryinvented by Ronald Fisher. “The scientists are so eager to pass this magicaltest that they start playing around with the numbers, trying to find anythingthat seems worthy,” Ioannidis says. In recent years, Ioannidis has becomeincreasingly blunt about the pervasiveness of the problem. One of his mostcited papers has a deliberately provocative title: “Why Most Published ResearchFindings Are False.”
The problem ofselective reporting is rooted in a fundamental cognitive flaw, which is that welike proving ourselves right and hate being wrong. “It feels good to validate ahypothesis,” Ioannidis said. “It feels even better when you’ve got a financialinterest in the idea or your career depends upon it. And that’s why, even aftera claim has been systematically disproven”—he cites, for instance, the earlywork on hormone replacement therapy, or claims involving various vitamins—“youstill see some stubborn researchers citing the first few studies that show astrong effect. They really want to believe that it’s true.”
That’s why Schoolerargues that scientists need to become more rigorous about data collectionbefore they publish. “We’re wasting too much time chasing after bad studies andunderpowered experiments,” he says. The current “obsession” with replicabilitydistracts from the real problem, which is faulty design. He notes that nobodyeven tries to replicate most science papers—there are simply too many.(According to Nature, athird of all studies never even get cited, let alone repeated.) “I’ve learnedthe hard way to be exceedingly careful,” Schooler says. “Every researchershould have to spell out, in advance, how many subjects they’re going to use,and what exactly they’re testing, and what constitutes a sufficient level ofproof. We have the tools to be much more transparent about our experiments.”
In a forthcomingpaper, Schooler recommends the establishment of an open-source database, inwhich researchers are required to outline their planned investigations anddocument all their results. “I think this would provide a huge increase inaccess to scientific work and give us a much better way to judge the quality ofan experiment,” Schooler says. “It would help us finally deal with all theseissues that the decline effect is exposing.”
Although such reformswould mitigate the dangers of publication bias and selective reporting, theystill wouldn’t erase the decline effect. This is largely because scientificresearch will always be shadowed by a force that can’t be curbed, onlycontained: sheer randomness. Although little research has been done on theexperimental dangers of chance and happenstance, the research that exists isn’tencouraging.
In the latenineteen-nineties, John Crabbe, a neuroscientist at the OregonHealth and Science University, conducted anexperiment that showed how unknowable chance events can skew tests ofreplicability. He performed a series of experiments on mouse behavior in threedifferent science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted theexperiments, he tried to standardize every variable he could think of. The samestrains of mice were used in each lab, shipped on the same day from the samesupplier. The animals were raised in the same kind of enclosure, with the samebrand of sawdust bedding. They had been exposed to the same amount ofincandescent light, were living with the same number of littermates, and werefed the exact same type of chow pellets. When the mice were handled, it waswith the same kind of surgical glove, and when they were tested it was on thesame equipment, at the same time in the morning.
The premise of thistest of replicability, of course, is that each of the labs should havegenerated the same pattern of results. “If any set of experiments should havepassed the test, it should have been ours,” Crabbe says. “But that’s not theway it turned out.” In one experiment, Crabbe injected a particular strain ofmouse with cocaine. In Portland the mice giventhe drug moved, on average, six hundred centimetres more than they normallydid; in Albanythey moved seven hundred and one additional centimetres. But in the Edmonton lab they movedmore than five thousand additional centimetres. Similar deviations wereobserved in a test of anxiety. Furthermore, these inconsistencies didn’t followany detectable pattern. In Portland one strainof mouse proved most anxious, while in Albanyanother strain won that distinction.
The disturbingimplication of the Crabbe study is that a lot of extraordinary scientific dataare nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting newfact—it was a meaningless outlier, a by-product of invisible variables we don’tunderstand. The problem, of course, is that such dramatic findings are also themost likely to get published in prestigious journals, since the data are bothstatistically significant and entirely unexpected. Grants get written,follow-up studies are conducted. The end result is a scientific accident thatcan take years to unravel.
This suggests that thedecline effect is actually a decline of illusion. While Karl Popper imaginedfalsification occurring with a single, definitive experiment—Galileo refutedAristotelian mechanics in an afternoon—the process turns out to be much messierthan that. Many scientific theories continue to be considered true even afterfailing numerous experimental tests. Verbal overshadowing might exhibit thedecline effect, but it remains extensively relied upon within the field. Thesame holds for any number of phenomena, from the disappearing benefits ofsecond-generation antipsychotics to the weak coupling ratio exhibited bydecaying neutrons, which appears to have fallen by more than ten standarddeviations between 1969 and 2001. Even the law of gravity hasn’t always beenperfect at predicting real-world phenomena. (In one test, physicists measuringgravity by means of deep boreholes in the Nevada desert found atwo-and-a-half-per-cent discrepancy between the theoretical predictions and theactual data.) Despite these findings, second-generation antipsychotics arestill widely prescribed, and our model of the neutron hasn’t changed. The lawof gravity remains the same.
Such anomaliesdemonstrate the slipperiness of empiricism. Although many scientific ideas generateconflicting results and suffer from falling effect sizes, they continue to getcited in the textbooks and drive standard medical practice. Why? Because theseideas seem true. Because they make sense. Because we can’t bear to let them go.And this is why the decline effect is so troubling. Not because it reveals thehuman fallibility of science, in which data are tweaked and beliefs shapeperceptions. (Such shortcomings aren’t surprising, at least for scientists.)And not because it reveals that many of our most exciting theories are fleetingfads and will soon be rejected. (That idea has been around since Thomas Kuhn.)The decline effect is troubling because it reminds us how difficult it is toprove anything. We like to pretend that our experiments define the truth forus. But that’s often not the case. Just because an idea is true doesn’t mean itcan be proved. And just because an idea can be proved doesn’t mean it’s true.When the experiments are done, we still have to choose what to believe. ♦

Read more http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer#ixzz19RgdxRlE

No comments:

Post a Comment