Concerning replication

Heraclitus said that no one can step twice into the same river. By the same token, it isn’t really possible to do the same program in two different places.

I’m in London for meetings on criminal justice organized by the Centre for Justice Innovation (the UK arm of the Center for Court Innovation) and Policy Exchange. A small meeting that ran for two hours today tried to figure out whether the coerced-sobriety approach of the South Dakota 24/7 project (the alcohol version of HOPE) could be used in Scotland, with social problems, customs, and institutional arrangements quite unlike South Dakota’s. (In particular, Scotland has no such thing as a two-day term of confinement.)

The answer seems to be that you could try to do something based on similar principles and with similar aims, but you couldn’t really do the program in its trademark style, and even if you’re convinced 24/7 works on drunk drivers in Sioux Falls you can’t be sure that the alternative version would work on drunken wife-beaters in Glasgow.

That suggests a more general reflection. Heraclitus said that no one can step twice into the same river. By the same token, it isn’t really possible to do the same program in two different places.

Author: Mark Kleiman

Professor of Public Policy at the NYU Marron Institute for Urban Management and editor of the Journal of Drug Policy Analysis. Teaches about the methods of policy analysis about drug abuse control and crime control policy, working out the implications of two principles: that swift and certain sanctions don't have to be severe to be effective, and that well-designed threats usually don't have to be carried out. Books: Drugs and Drug Policy: What Everyone Needs to Know (with Jonathan Caulkins and Angela Hawken) When Brute Force Fails: How to Have Less Crime and Less Punishment (Princeton, 2009; named one of the "books of the year" by The Economist Against Excess: Drug Policy for Results (Basic, 1993) Marijuana: Costs of Abuse, Costs of Control (Greenwood, 1989) UCLA Homepage Curriculum Vitae Contact: Markarkleiman-at-gmail.com

12 thoughts on “Concerning replication”

  1. His antagonist Parmenides held the opposite one-line view: the world is one and doesn’t change. A moderate Parmenidian would say that the differences between people and between cultures are superficial compared to their commonalties. Tweak and run.

  2. With such differences, why not go with something completely different: coerced-yoga for the repeat alcohol/spousal abuser, instead of confinement which would only enrage the serial abuser!

  3. I’m part of these conversations a great deal, and my experience is that criminal justice practitioners tend to overwhelmingly overstate the significance of superficial differences and miss the importance of guiding principles and underlying commonalities. We’re seeing it right now in the debate about adopting proved US approaches to gang violence in the UK, with the overwhelming sentiment being “we’re different,” which misses both the deep similarities in gangs - which are all about pretty universal group dynamics - and the fact that the US approach has been very successfuly mapped onto Glasgow.

    One interesting aspect of the issue is that what are taken as the highest-quality and most dispositive evaluations - random clinical trials - tell us nothing whatsoever about why an intervention works (or doesn’t), just that it does (or doesn’t). So social science is under tremendous pressure to get answers that give no insight at all into what aspects of the intervention, or of the problem environment, do or don’t matter. When either changes, the debate starts over pretty much from the beginning.

    1. David, I agree with your first point: One can’t overlook how much of this has to do with a need to feel ownership of a program, and to not feel that one is lesser than the place that started the program. Examples: Divorce mediation produces settlements that tend to be quite similar to those arrived at in court, but adherence is much higher because the parties feel they had a stake in the development. Clinical practice guideline committees at university medical centers often refuse to use “someone else’s guidelines” because their hospital is a different special place. So they form a committee, review the evidence and come out with their own, branded guidelines…which look very similar to what they initially rejected from someplace else.

      But I don’t agree with your second. Badly designed trials tell us nothing about mechanism. Well-designed trials measure mechanisms as well as outcomes and are thus very useful, including sometimes showing that there was an effect when the outcome says there wasn’t (e.g., women were helped by the intervention but men were hurt, a moderator effect that cancels out as a null effect if one only measure the end point outcome)

      1. Keith, the trials in your field may be constructed differently. In my area they tell us nearly nothing about mechanism. Does Hope work because of pure deterrence - certainty of sanction; because of the increased legitimacy of the rules being established with supervisees due to new attitudes displayed by judges; or because of something particular associated with the meth-using population in Hawaii that will not travel to other populations? I don’t believe the existing RCT tells us anything along those lines, despite establishing impact. And unless I’m missing something, your example doesn’t disprove the larger point. That womeen were helped and men were hurt is a more refined “what,” not any additional insight into “why.”

        1. Good point on the “what” I gave an example of measured moderator rather than a mediator. The HOPE trial we have says nothing about mechanism, but we could have one that did measure all the presumed mediators (speed of response, judicial attitude, probationer understanding of the rules, probationer reactance) as well as the outcome, and then do chained analyses to see if, for example probationer reactance declined before the outcome, and how much of the change in the outcome was explained by the change in the mediator.

          1. Yes. Such evaluations are possible and extremely valuable. But the original point I was making was that the growing obsession with random clinical trials pushes research in a direction that does not foster such evaluations and can in fact make them more unlikely. It would not be possible, in any real research design, to RANDOMIZE across the dimensions you mention: speed of response, judicial attitude, etc. So random trials don’t generally include this kind of analysis or produce this kind of insight. The funding, credibility, and academic and career standing that comes with the RCTs does not attach to the kind of research that looks at things in this way. We’re therefore getting a lot of research, and possibly a greater proportion of research, that says, this particular intervention did or didn’t work, with very little insight into why or what is most relevant for similar-but-different adaptations and replications. This can be really dire for the more complicated (especially community-based) interventions, which have a lot of elements and moving parts. We’d really like to know why they’re producing results, and even which elements matter the most and how they interact, and mostly we don’t.

    2. If the clinical trial is of a drug, there’s usually a pretty convincing prior narrative involving lengthy earlier experiments on cells and proteins. “This works at the cell level and in rats. Does it work in humans?” Drugs fail clinical trials not because there’s no story how they work, but because they don’t work well enough to outweigh the side-effects. The story behind social interventions like HOPE that succeed is surely straightforward Pavlov. When they fail, the explanation is Darwin or Freud.

      1. It’s interesting, this is exactly the kind of issue Keith and I have been discussing. One of the things that is most interesting about HOPE is the degree to which good outcomes are driven by different relationships between the authorities and supervisees (Freud, I think, on your schema) - this falls straight into the developing literature on the legitimacy of authorities - and how much by straight deterrence (Pavlov, I think). There’s good reason to suspect that much of it might be legitimacy, and if that’s true it opens up a very different set of ideas about how to do this kind of work. In the absence of better evaluations with a different kind of insight, there’s likely to be a presumption that it’s Pavlov. We don’t know, at all, and the gold-standard RCT that has produced the finding that HOPE works tells us nothing on the matter.

  4. Interestingly, replication is considered a regular phenomenon among businesses. Starting with Intel’s Copy Exactly philosophy for its semiconductor fabs and ranging through many successful franchising systems, replication appears to be a feasible and successful strategy for company expansion. I’ve argued that the philosophy of replication masks some underlying adaptation, but even in surveys I’ve done of service firms aiming for replication appears to have value for companies trying to recreate successes in other locations.

    A citation: Replication as Strategy by Gabe Szulanski and Sid Winter,
    http://www.jstor.org/stable/3086044

    On the teaching front, I’ve “replicated” an approach mentioned by Michael on the blog here, having my students submit sentence outline, and I’ve even distributed (with permission) Michael’s own sentence outline describing the approach and the reasons for it.

  5. Mark: “By the same token, it isn’t really possible to do the same program in two different places.”

    It’s not really possible to do the same lab experiment in two different places. The question is ‘how close?’.

  6. Anecdotal evidence from the entertainment industry suggests that replication of principles is much more effective than replication of precise details.

    But in the social sciences (or their application to the real world) you can’t even replicate an experiment exactly in the same place. Your populations of experimenters and subjects both change, and some of that change is due to the results of the previous experiment.

Comments are closed.