High-stakes school testing - Page 2 - The Reality-Based Community

If a thing’s worth winning …

… it’s worth cheating for. The D.C. school system provides another example. Management-by-measurement needs to build in cheating-prevention features, including punishment for cheaters.

… it’s worth cheating for. Turns out that some of the “miracles” Michelle Rhee created in the DC schools were artifacts of test-tampering. This is no surprise. It’s also not, by itself, a reason to abandon the process of measurement. But it does mean that the higher the stakes on the process the more you have to invest in cheat-proofing it. It’s annoying that the educational reform movement, which puts so much stress on being able to fire bad teachers, puts so little effort into punishing cheaters. It appears that, in D.C. the Rhee administration preferred discretion to anything that might have rocked the boat of test-score improvements. That was certainly true of Rod Paige in Houston, who parlayed a faked miracle on dropout rates into a cabinet position.

Update It was, barely, possible that Rhee was culpably negligent in the cheating and the cover-up, but no worse. However, her slime-and-defend reaction to the exposure of the cheating eliminates that possibility. She was, and is, complicit in the cover-up, if not the cheating itself. This should be a complete disqualification for her ever having any active role in educational reform. (I say that as someone sympathetic to the goal of improving public schools, even if that requires breaking some eggs.)

Alas, it won’t be.

“If a thing’s worth winning, it’s worth cheating for”

It’s as true now as it was eighty years ago, when the W.C. Fields character said it in You Can’t Cheat an Honest Man. But the advocates of high-stakes low-quality standardized testing keep ignoring it.

That observation, made by the W.C. Fields character in “You Can’t Cheat An Honest Man,” points to one of the problems with trying to substitute high-stakes low-quality standardized testing for actual educational reform. “Dukenfield’s Law,” as I’ve dubbed it after Field’s birth-name (hey, if your birth certificate read William Claude Dukenfield, you’d think about changing it, too) is still operating, with an estimated 1-3% of all teachers cheating to improve their students’ test scores, sometimes with the active encouragement of their principals. And that’s in a world where having students memorize questions and answers from previous tests doesn’t count as cheating.

Testing fails

Now lemmesee….

-Using high-stakes tests to reward and punish schools and their staffs encourages cheating.

-The relatively cheap (on a per-student basis) tests that have to be used if testing is to be done on a census, rather than a sample, mean that the tests measure only a subset of what we want the students to know and to be able to do, which is likely to distort curricular decisions.

-Even accepting what the tests test for as a valid reflection of educational performance, sheer measurement error makes it hard to distinguish signal from noise in year-to-year variations. (Doing sample rather than census testing may improve validity, but it increases sampling error.)

-And now the largest study of the actual results of high-stakes testing programs suggest that they boost scores on the tests used, but actually reduce performance on nearly every externally validated measure. [Study by Audrey Amrein et al. at Arizona State. Report in today’s New York Times.]

The critiques of the study by the proponents of testing, including Chester Finn, are pretty pitiful; that suggests that, despite its funding by a coalition of teachers’ unions, the study must be methodologically sound. (Finn is reduced to suggesting that the other educational “reforms” in the state-level packages that included high-stakes testing must be at fault.)

So what we have here is a policy that won’t work in theory and fails in practice. Why, exactly, are we supposed to be for it?

These results put the proponents of high-stakes testing in what ought to be an inescapable rhetorical box: a dilemma in the proper sense of that term. If trying something out, measuring its results, and acting accordingly is the right thing to do, then having tried out high-stakes testing, measured its results, and found them to be bad, we ought to dump it, or at least fundamentally redesign it. If trying, measuring, and responding isn’t the right thing to do, then what’s the argument for high-stakes testing in the first place?

I have a very strong prejudice for managing by the numbers, especially in an area such as education where the non-quantitative theorizing is so wooly and our knowledge of the underlying processes so inadequate. (How to produce high-quality research in a field where the relevant university units engage mostly in training for a poorly paid, low-status profession is a different problem.) So I’d be inclined to strengthen the testing regime by broadening the base of knowledge and skill tested for and by making aggressive use of sampling, rather than just dumping the whole thing and letting the education establishment vapor on about how every child is different, every teacher is a skilled professional, and therefore nothing can be measured.

But there is now no case whatever for continuing to combine high stakes with low measurement quality. Been there, done that, got the T-shirt. Stinks. Next case, please.