Words-of-wisdom Dep’t

It’s always nice when you learn, years later, that a student actually remembers something you said. It’s even nicer when it’s something you’d forgotten ever saying.

Had coffee this week with someone I had in class maybe 18 years ago, now doing a fairly senior job in Washington. She said her current job had convinced her that I was right to say (as she recalls): “Never believe a statistic you didn’t make up yourself.”

Author: Mark Kleiman

Professor of Public Policy at the NYU Marron Institute for Urban Management and editor of the Journal of Drug Policy Analysis. Teaches about the methods of policy analysis about drug abuse control and crime control policy, working out the implications of two principles: that swift and certain sanctions don't have to be severe to be effective, and that well-designed threats usually don't have to be carried out. Books: Drugs and Drug Policy: What Everyone Needs to Know (with Jonathan Caulkins and Angela Hawken) When Brute Force Fails: How to Have Less Crime and Less Punishment (Princeton, 2009; named one of the "books of the year" by The Economist Against Excess: Drug Policy for Results (Basic, 1993) Marijuana: Costs of Abuse, Costs of Control (Greenwood, 1989) UCLA Homepage Curriculum Vitae Contact: Markarkleiman-at-gmail.com

20 thoughts on “Words-of-wisdom Dep’t”

  1. Yes. My year-long acquaintance with stats convinced me that if you haven’t got a Ph.D in it, it’s best not to get too excited. It is a Koolaid smorgasbord of error. Not that we shouldn’t have them, of course. I’m just agreeing with you.

  2. It depends on the meaning of “statistic”. I’m not inclined to disbelieve anyone’s data (unless I have taken against them for other reasons) but unless I have some great reason to trust the competence in statics or the scrupulous thoroughness of someone I’m very disinclined to trust claims of statistical significance that are made for data when such claims appear superficially dubious. Too few people really know their statistics and/or take sufficient care that the correct test is being properly applied; it is far more common for them to use a test that’s especially easy to apply, or that someone once told them is a good choice (in some context, possibly not the same one, if they were even right that time), or one that gives small error bars (Standard Error is often used instead of Standard Deviation for this reason, and only for this reason, not as a well-informed choice).

    1. Amen brother!

      Statistical tests are based on mathematical equations involving probability distributions and their interpretation depends critically upon the appropriateness of certain assumptions about the behavior of the underlying variables.

      Example of where this gets difficult: there is disagreement regarding the causal relationship between upper extremity activity and the development of carpal tunnel syndrome (CTS). Do workers with high-force and high-repetition jobs (like poultry workers) run an increased risk of developing CTS? Same problem is seen with other upper extremity conditions like tendonitis and epicondylitis, but there is more literature on CTS than for most other conditions, so this is the example to illustrate Warren’s point.

      It turns out that there are many studies that attempt to answer this question. If you are really in luck, you may have a robust and reliable measurement of exposure (cycles per second and pounds of force) that you can use to see whether higher levels of exposure are related to a higher risk of CTS. You may actually enjoy the favor of the gods and have valid data for both exposure and outcome.

      Now, many or most of the aforesaid studies use some form of regression (logistic regression being the most commonly used) to test the hypothesis of a relationship between exposure and outcome. And a lot of these studies turn out to be inconclusive or “negative,” meaning that the regression coefficient for exposure was not “statistically significant.” From this fact, many readers conclude that there is no relationship between force/repetition and CTS.

      BUT logistic regression assumes that there is a linear relationship between the level of exposure and the natural logarithm of the odds of developing CTS. The less exposure, the better; with increasing levels of exposure, the risk goes up, and the lowest level of exposure is the lowest level of risk. Some exposure is bad, and more is worse.

      Well, for many exposure/disease relationships in occupational epidemiology, this makes good sense; this is a sound assumption. For exposures like plutonium and dimethyl mercury, it is true that the optimal exposure is zero, and anything greater than that imposes risks of disease, with greater risks at increasing levels of exposure. It may not be exactly linear, but it is expected to be monotonic increasing.

      But what about an exposure like “using our arms and hands to do things”? Here, it is not at all clear that the lowest level of risk occurs at the lowest level of exposure. It is very likely that here is an exposure where some is good but too much is bad, where the lowest level of risk is at an exposure level greater than zero, with no exposure at all likely to be an unhealthy situation. For a relationship like this, the graph is likely to be shaped more like a U or a J. The lowest risk is associated with some exposure but not too much.

      Here is where the logistic regression analysis is likely to mislead the reader; even with impeccable measurement of both exposure and outcome, even with unimpeachable data and state-of-the art software, the statistical analysis and the resultant conclusions may be misleading. The phenomenon of interest did not obey the assumptions about the numerical distribution of the data; flawless data led to a flawed conclusion.

      Corollary: Never believe a statistic you didn’t make up yourself, and believe no more than half of the ones you did make up.

  3. The first SameFacts post with the tag “aphorisms.” Worth populating that field, I reckon.

  4. I like it. I’ll use that one next time my son-in-law tells me 87% of statistics are made up on the spot 95% of the time.

  5. If you have to use statistics you should have designed a better experiment.
    –Rutherford to Planck (maybe)

    1. Ah, KLJ, Rutherford and Planck were dealing with atomic particles where two conditions apply. First, the sample size is enormous; second, one atom is pretty much like another atom. Therefore, the amount of variation is infinitesimal in relation to the effect size. Statistical methods to detect signal from noise should not be required in a well-designed experiment.

      But if the particles of interest are human beings, and if the sample size is in the hundreds or in the dozens, the amount of variation is considerable in relation to any effect you want to measure. Statistical methods are required to find group differences amid the individual differences.

      Similarly, microbiologists can set up experiments in which you can culture a trillion organisms overnight, in which one E. coli bacterium is pretty much like its neighbor. Also, they do not need to get ten to the twelfth consent forms signed in order to proceed with their study. If an experiment is being done with human participants, this is a requirement.

      The soundest reason for statistical skepticism is the fact that the tests make assumptions about the data which are frequently violated by the phenomena being studied. Equally sound is the fact that there is generally no answer in the back of the book revealing the prior probability of a particular hypothesis being tested by the statistical calculation. You do not want to believe a statistic you did not make up yourself because the other guy does not share your assumptions about the nature of the data or the prior likelihood that the study hypothesis is true.

      1. I know, Ed, but that is still one of my favorite quotes. I’m a scientist myself who sees statistics used for the most spurious reasons. I actually heard a (former) bacterial geneticist in a seminar once say that he gave up genetics because yes/no answers could be a pain in the ass when the answer was “wrong”, while in biochemistry you can always add error bars to virtually any curve and get it published. I am perforce aware of the necessity for statistics in quantitative trait analysis, epidemiology, public health…OT, but I once heard a speaker answer a question about his 6 significant figures by saying that is as far as his calculator would go. Innumeracy appears in the strangest places.

        1. A bit peripheral, but there is now a Journal of Negative Results in Biomedicine http://www.jnrbm.com/ open access stuff. It is good to have a repository of statistically non-significant results where you don’t have to play with data in order to get the error bars to get something published.

          I agree that the Rutherford quote is a fine one. And that only good scientific judgment can detect a highly precise measurement of a totallly irrelevant variable.

        2. I was told of a Job Talk at a prestigious university where the candidate was asked why they had two significant figures on their numbers. Rather than explain, or consider the question, they replied that the numbers were good enough for (an extremely prestigious journal). They didn’t get the offer, though they did get a job at a similarly prestigious university, where there presumably was no similar awkwardness at the Job Talk.

          1. Heh. The editors/reviewers of Cell/Nature/Science can be such pushovers. For some people. Physics Review Letters, not so much, probably. But they (PRL) do publish valid results to 6 significant figures.

      2. “Ah, KLJ, Rutherford and Planck were dealing with atomic particles where two conditions apply. First, the sample size is enormous; second, one atom is pretty much like another atom. Therefore, the amount of variation is infinitesimal in relation to the effect size. Statistical methods to detect signal from noise should not be required in a well-designed experiment.”

        And IIRC, physicists are on the cutting edge of methods for finding one important data point among billions of points of nothing/noise.

        There’s a lot of wisdom in the sayings of old physicists, and a lot of foolishness.

        1. If you believe in an omniscient, omnipotent God, is it plausible to believe He isn’t clever enough to fake the evidence?

          1. I thought this was the whole “the inconsistencies with Biblical Creation present in the data were deliberately introduced in order to test you faith; true believers will see past the evidence before their eyes” argument.

            Mind you, I’ve always wondered whether those who advance that argument realize what it would suggest about the character of their God … but then, I’m from a people whose faith tradition declares, in at least some versions, that the faithful are reminded that we’re stuck with our God, but not necessarily in love with our God.

  6. Maybe I am parsing this too closely, but I am not sure I understand either the intention of this or what makes it clever. First there is the general cynical air about statistics — Never believe statistics from anyone else… and then this is amplified by the notion of statistics being “made up” which surely has at least the connotation of an untruth rather than just neutral creation — “concocted; falsely fabricated or invented” as in “he made up a story to explain his absence” rather than “put together; finished” as in “she made up an antenna from some old wire” Seemingly the only statistics are those that are falsely fabricated by others, or similarly concocted yourself. An odd thing for someone who spends his life trying to convince others of arguments based based on data.

    Also the fun may come from the question of whether one should “believe” false things that you yourself concoct, and the truth that sometimes we get taken in by shaky arguments that we ourselves create.

    Presumably an aphorism should point to a truth, and maybe the truth in public life as influenced as it is by moneyed interests and the spin cycle is that much of what is put out there is exaggerated or shorn of nuance and qualification at least to the edge of falsehood… but I don’t know that this is specific to statistics.

    Then there is the resonance with dim memories from philosophy of knowledge — “Humans can only understand what they make.” Not sure whether this was Michael Polanyi or Hannah Arendt but it can certainly be read backward as a caution not to believe that theoretical constructs are perfect substitutes for the actual world. In regard to statistics, I would say that no one should be guided by them without understanding the source and nature of the data and the methodological choices in producing the statistical information, and the biases introduced by what is presented and how it is presented as well as what is not presented. In other words one must almost replicate the creation of the statistic in order to be able to critically evaluate it. I do think this is a useful insight.

    But how is this different from any other sort of information? Each piece of information should be subjected to critical appraisal before it is even admitted to discourse, and then reviewed for appropriateness in the specific context if it is to move one to action. In this regard statistics are not different from witness testimony, arguments from authority, folk wisdom, or any of the other sorts of summary information that we rely on daily. Maybe the difference is that, compared to most forms of inference, statistical information is almost uniquely subject to this sort of replication of the context of creation, and so statistics “made up” by others or ourselves are actually more subject to equal belief (after the critical process) than most other forms of information.

    Finally this begs the question of the extent to which and the manner in which things that we generate — arguments, introspections, memories, perceptions, diagnoses — should in fact be privileged as having special validity compared to those generated by others, or whether they need to be subject to the same degree of critique.

    Or maybe that was the point all along..

Comments are closed.