“Why Do Good Policy Makers Use Bad Indicators?”*

Test scores are the coin of the educational realm in the U.S.. In No Child Left Behind, they are used to reward and punish districts, schools, and teachers for how well or poorly students score on state tests. In pursuit of federal dollars, The Race To The Top competition has shoved state after state into legislating that teacher evaluations include student test scores as part of judging teacher effectiveness.

Numbers glued to high stakes consequences, however, corrupt performance. Since the mid-1970s, social scientists have documented the untoward results of attaching high stakes to quantitative indicators not only for education but also across numerous institutions. They have pointed out that those who implement policies using specific quantitative measures will change their practices to insure better numbers.

The work of social scientist Donald T. Campbell and others about the perverse outcomes of incentives was available and known to many but went ignored. In Assessing the Impact of Planned Social Change, Campbell wrote:

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (p. 49).

Campbell drew instances of distorted behavior when police officials used clearance rates in solving crimes, the Soviets set numerical goals for farming and industry, and when the U.S military used “body counts” in Vietnam as evidence of winning the war.

That was nearly forty years ago. In the past decade, medical researchers have found similar patterns when health insurers and Medicare have used quantitative indicators to measure physician performance. For example, Medicare requires—as a quality measure—that doctors administer antibiotics to a pneumonia patient within six hours of arriving at the hospital. As one physician said: “The trouble is that doctors often cannot diagnose pneumonia that quickly. You have to talk to and examine the patient and wait for blood tests, chest X-rays and so on.” So what happens is that “more and more antibiotics are being used in emergency rooms today, despite all-too-evident dangers like antibiotic-resistant bacteria and antibiotic-associated infections.” He and other doctors also know that surgeons have been known to pick reasonably healthy patients for heart bypass operations and ignore elderly ones who have 3-5 chronic ailments to insure that results look good.

More examples.

TV stations charge for advertising on the basis of how many viewers they have during  “sweep” months (November, February, May, and July). Nielsen company has boxes in two million homes (representative of the nation’s viewership) that register whether the TV is on and what families are watching during those months. They also have viewers fill out diaries. Nielsen assumes that what the station shows in those months represents programming for the entire year (see 2011-2012-Sweeps-Dates). Nope. What TV networks and cable companies do is that during those “sweeps” they program new shows, films, extravaganzas, and sports that will draw viewers so they can charge higher advertising rates. They game the system and corrupt the measure (see p. 80).

And just this week, ripped from the headlines of the daily paper, online vendors secretly ask purchasers  of their products to write reviews and rate it with five stars in exchange for a kickback of the price the customer paid. Another corrupted measure.

Of course, educational researchers also have documented the link between standardized test scores and narrowed instruction to prepare students for test items, instances of state policymakers fiddling with cut-off scores on tests, increased dropouts, and straight out cheating by a few administrators. (see Dan Koretz, Measuring Up).

What Donald Campbell had said in 1976 about “highly corruptible indicators” applies not only in education but also to many different institutions.

So why do good policy makers use bad indicators? The answer is that numbers are highly prized in the culture because they are easy to grasp and use in making decisions.The simpler the number–wins/losses, products sold, profits made, test scores– the easier to judge worth. When numbers have high stakes attached to them, they then become incentives (either as a carrot or a stick) to make the numbers look good. And that is where  indicators turn bad as sour milk whose expiration date has long passed.

The best policymakers, not merely good ones, know that multiple measures for a worthy goal reduce the possibility of reporting false performance.


*Steven Glazerman and Liz Potamites, False Performance Gains: A Critique of Successive Cohort Indicators,” Working Paper, Mathematica Policy Research, December 2011, p. 13.

About these ads

8 Comments

Filed under Reforming schools

8 responses to ““Why Do Good Policy Makers Use Bad Indicators?”*

  1. Pingback: The Best Resources Showing Why We Need To Be “Data-Informed” & Not “Data-Driven” | Larry Ferlazzo’s Websites of the Day…

  2. rohit2093

    Hi Larry, i have been following your blogs. I am sorry to say, with this one, I had high hopes but was disappointed. I have heard, what you have said many times, that we don’t use right indicators. But I never get to know, what are those other “better/right” indicators/parameters? Would have really loved if you could discuss them and how they can be used along with quantitative indicators. I also understand that each school is different and they will have different parameters. But some examples would help. Thanks!

    • larrycuban

      It is a fair comment to ask that I just do not criticize but also offer “better” indicators. Keep in mind, however, that I am critiquing policymakers for harnessing tough consequences to a quantitative indicator. It is the high-stakes that corrupts the indicator, especially if the indicator–a standardized test created for one purpose, say differentiating among students’ achievement–is used for judging individual’s grasp of math,science, or history. So the larger point is that marriage of a quantitative indicator to significant consequences causes harm. So stripping away the high stakes from ambiguous or even shabby indicators would be the first step. Now, back to your criticism of indicators that could be used. In the last paragraph of the post, I suggested that multiple rather than single indicators would help get around the ease of distorting any single one. For evaluating teachers, for example, Denver’s ProComp which uses a medley of quantitative and qualitative indicators makes great sense to me. Determining a principal’s performance, for me, would entail a host of indicators including teacher surveys appraising the principal’s actions, parents’ opinions, students (if it is the secondary level), central office administrators, self-appraisal of the principal, and, yes, students’ performance in school on multiple measures from test scores, suspensions, vandalism,dropouts, etc. How am I doing?

  3. You’ve set yourself a really interesting challenge Larry. Reading your list (which makes eminent sense) I realised that in the business world, where performance management is standard practice, such a list would be unthinkably lengthy, complex and would be likely to provoke dispute between employees and their managers. Precision and clarity, between the two parties involved in any performance management process in business, is vital.

    • larrycuban

      You are probably correct, Joe, that using multiple measures–if that is what you are referring to in my post–would be complicated in private, for-profit companies. My point, of course, is that the grafting of “performance management,” as you call it, onto public schools–not profit-seeking businesses–using single, fairly primitive measures, often mismatched to the desired outcome of student learning, distorts the entire enterprise because of those single measures being married to serious consequences for underperforming. When such measures are adopted and hailed by those who govern schools without seeing that the loss in teacher morale due to perceived unfairness of the top-down decisions undercuts the entire effort–well, that becomes the pickle that the U.S. schools are currently in.

  4. Pingback: tdaxp, Ph.D. » Blog Archive » Test Validity & Teacher Performance

  5. Not so coincidentally, have a look at the blog posting below about “personalised learning” in the UK from Patrick Watson’s very well informed blog.

    There appears to be a problem created largely by people working in the public sector or for government, whose work brings them into contact with business but only from the outside. They tend to adopt business speak, and often ideas, without any real experience of the reality. I suspect the unworkable situation you outline so well, is just one of many such instances.
    http://montrose42.wordpress.com/2012/01/26/personalised-learning-remember-that-will-it-make-a-comeback/

  6. Pingback: This Week’s “Round-Up” Of Good School Reform Posts & Articles | Larry Ferlazzo’s Websites of the Day…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s