From an interview conducted in 2009 with Rick Hess, then Resident Scholar at The American Enterprise Institute. I have lightly abridged the interview. The original article upon which this interview is based is here.
Q: Rick, you recently published an article in Educational Leadership
arguing that the ways in which we rely on data to drive decisions in
schools has changed over time. Yet, you note that we have unfortunately only
succeeded in moving from the “old stupid” to the “new stupid.” What do you do
you mean by this?
A: A decade ago, it was only too easy to find education leaders who dismissed
student achievement data and systematic research as having only limited utility
when it came to improving schools. Today, we’ve come full circle. You can’t
spend a day at an education gathering without hearing excited claims about
“data-based decision making” and “research-based practice.” Yet these phrases
can too readily serve as convenient buzzwords that obscure more than they
clarify and that stand in for careful thought. There is too often an unfortunate
tendency to simply embrace glib solutions if they’re packaged as “data-driven.”
Today’s enthusiastic embrace of data has waltzed us directly from a petulant
resistance to performance measures to a reflexive reliance on a few simple
metrics–namely, graduation rates, expenditures, and grade three through eight
reading and math scores. The result has been a race from one troubling mindset
to another–from the “old stupid” to the “new stupid.”
Q: Can you give us an example of the “new stupid”?
A: Sure, here’s one. I was giving a presentation to a group of aspiring
superintendents. They were eager to make data-driven decisions and employ
research to serve kids. There wasn’t a shred of the old stupid in sight. I
started to grow concerned, however, when our conversation turned to value-added
assessment and teacher assignments. The group had recently read a research brief
highlighting the effect of teachers on achievement and the inequitable
distribution of teachers within districts. They were fired up and ready to put
this knowledge to use. One declared to me, to widespread agreement, “Day one,
we’re going to start identifying those high value-added teachers and moving them
to the schools that aren’t making AYP.” [AYP is the acronym from No Child Left Behind law (2002-20015); it means “Adequate Yearly Progress” in test scores for different groups of students.]
Now, I sympathize with the premise, but the certainty worried me. I started
to ask questions: Can we be confident that teachers who are effective in their
current classrooms would be equally effective elsewhere? What effect would
shifting teachers to different schools have on the likelihood that teachers
would remain in the district? Are the measures in question good proxies for
teacher quality? My concern was not that they lacked firm answers to these
questions–that’s natural enough even for veteran superintendents–it was that
they seemingly regarded such questions as distractions.
Q: What’s a concrete example of where educators and advocates
overenthusiastically used data to tout a policy, but where the results didn’t
pan out? What went wrong?
A: Take the case of class-size reduction. For two decades, advocates of
smaller classes have referenced the findings from the Student Teacher
Achievement Ratio (STAR) project, a class-size experiment conducted in Tennessee
in the late 1980s. Researchers found significant achievement gains for students
in small kindergarten classes and additional gains in first grade. The results
were famously embraced in California, which in 1996 adopted a program to reduce
class sizes that cost nearly $800 million in its first year. But the dollars
ultimately yielded disappointing results, with the only major evaluation–by AIR
and RAND–finding no effect on achievement.
What happened? Policymakers ignored nuance and context. California encouraged
districts to place students in classes of no more than 20–but that class size
was substantially larger than those for which STAR found benefits. Moreover,
STAR was a pilot program serving a limited population, which minimized the need
for new teachers. California’s statewide effort created a voracious appetite for
new educators, diluting teacher quality and encouraging well-off districts to
strip-mine teachers from less affluent communities. The moral is that even
policies or practices informed by rigorous research can prove ineffective if the
translation is clumsy or ill considered….
Q: In your mind, what are some of the main limitations of research as
they apply to schooling?
A: First, let me be clear: Good research has an enormous contribution to
make–but, when it comes to policy, this contribution is more tentative than we
might prefer. Scholarship’s greatest value is not the ability to end policy
disputes, but to encourage more thoughtful and disciplined debate.
In particular, rigorous research can establish parameters as to how big an
effect a policy or program might have, even if it fails to conclusively answer
whether it “works.” For instance, quality research has quieted assertions that
national-board-certified teachers are likely to have heroic impacts on student
achievement or that Teach For America recruits might adversely affect their
Especially when crafting policy, we should not expect research to dictate
outcomes but should instead ensure that decisions are informed by the facts and
insights that science can provide. Education leaders should not expect research
to ultimately resolve thorny policy disputes over school choice or teacher pay
any more than medical research has ended contentious debates over health
insurance or tort reform….
Q: What do you see as the main motivation behind the “new stupid”? Is
it simply an example of good intentions gone awry?
A: In a word: yes. It’s a strategy pursued with the best of intentions. But
the problem is threefold. First, as we’ve discussed, too many times those of us
in K-12 are unsophisticated about what a particular study or a particular data
set can tell us. Second, the very passion that infuses the K-12 sector creates a
sense of urgency. People want to fix problems now, using whatever tools are at
hand–and don’t always stop to realize when they’re trying to fix a Swiss watch
with a sledgehammer. Third, the reality is that we still don’t have the kinds of
data and research that we need. So, too often, the choice is to misapply extant
data or simply go data-free. Everyone involved means well; the trick is provide
the right training, the right data, and for practitioners, policymakers, and
reformers to ensure that compassion doesn’t swamp common sense.