This fictitious exchange between two passionate educators over making educational policy and influencing classroom practice through careful scrutiny of evidence–such as has occurred in medicine and the natural sciences–as opposed to relying on professional judgment anchored in expertise gathered in schools brings out a fundamental difference among educators and the public that has marked public debate over the past three decades. The center of gravity in making educational policy in the U.S. has shifted from counting resources that go into schooling and relying on professional judgment to counting outcomes students derive from their years in schools and what the numbers say.
That shift can be dated from the Elementary and Secondary Education Act of 1965 but gained sufficient traction after the Nation at Risk report (1983) to dominate debate over innovation, policy, and practice. Although this is one of the longest guest posts I have published, I found it useful (and hope that viewers will as well) in making sense of a central conflict that exist today within and among school reformers, researchers, teachers, policymakers and parents.
Francis Schrag is professor emeritus in the philosophy of education at the University of Wisconsin, Madison. This article appeared in Teachers College Record, March 14, 2014.
A dialogue between a proponent and opponent of Evidence Based Education Policy. Each position is stated forcefully and each reader must decide who has the best of the argument.
Danielle, a professor of educational psychology and Leo, a school board member and former elementary school teacher and principal, visit a middle-school classroom in Portland Maine where students are deeply engaged in building robots out of Lego materials, robots that will be pitted against other robots in contests of strength and agility. The project requires them to make use of concepts they’ve learned in math and physics. Everything suggests that the students are deeply absorbed in what is surely a challenging activity, barely glancing around to see who has entered their classroom.
Leo: Now this is exciting education. This is what we should be moving towards. I wish all teachers could see this classroom in action.
Danielle: Not so fast. I’ll withhold judgment till I have some data. Let’s see how their math and science scores at the end of the year compare with those of the conventional classroom we visited this morning. Granted that one didn’t look too out of the ordinary, but the teacher was really working to get the kids to master the material.
Leo: I don’t see why you need to wait. Can’t you see the difference in level of engagement in the two classrooms? Don’t you think the students will remember this experience long after they’ve forgotten the formula for angular momentum? Your hesitation reminds me of a satirical article a friend showed me; I think it came from a British medical journal. As I recall the headline went: “Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomized controlled trials.”
Danielle: Very cute, but let’s get serious. Spontaneous reactions can be misleading; things aren’t always what they appear to be, as I’m sure you’ll agree. I grant you that it looks as if the kids in this room are engaged, but we don’t know whether they’re engaged in the prescribed tasks and we don’t know what they’re actually learning, do we? We’ll have a much better idea when we see the comparative scores on the test. The problem with educators is that they get taken in with what looks like it works, they go with hunches, and what’s in fashion, but haven’t learned to consult data to see what actually does work. If physicians hadn’t learned to consult data before prescribing, bloodletting would still be a popular treatment.
Suppose you and I agreed on the need for students to study math and physics. And suppose that it turned out that the kids in the more conventional classroom learned a lot more math and physics, on average, as measured on tests, than the kids in the robotics classroom. Would you feel a need to change your mind about what we’ve just seen? And, if not, shouldn’t you? Physicians are now on board with Evidence Based Medicine (EBM) in general, and randomized controlled trials (RCTs) in particular, as the best sources of evidence. Why are teachers so allergic to the scientific method? It’s the best approach we have to determine educational policy.
Leo: Slow down Danielle. You may recall that a sophisticated RCT convincingly showed the benefits of smaller class sizes in elementary schools in Tennessee, but these results were not replicated when California reduced its elementary school class size, because there was neither room in the schools for additional classrooms nor enough highly skilled teachers to staff them. This example is used by Nancy Cartwright and Jeremy Hardie in their book on evidence-based policy to show that the effectiveness of a policy depends, not simply on the causal properties of the policy itself, but on what they call a “team” of support factors (2012, p. 25). If any one of these factors were present in the setting where the trial was conducted but is lacking in the new setting, the beneficial results will not be produced. This lack of generalizability, by the way, afflicts RCTs in medicine too. For instance, the populations enrolled in teaching hospital RCTs are often different from those visiting their primary care physician.
Danielle: I have to agree that educators often extrapolate from RCTs in a way that’s unwarranted, but aren’t you, in effect, calling for the collection of more and better evidence, rather than urging the abandonment of the scientific approach. After all, the Cartwright and Hardie book wasn’t written to urge policy makers to throw out the scientific approach and go back to so-called expert or professional judgment, which may be no more than prejudice or illicit extrapolation based on anecdotal evidence.
Leo: You seem to be willing to trust the data more than the judgment of seasoned professionals. Don’t you think the many hours of observing and teaching in actual classrooms counts for anything?
Danielle: If your district has to decide which program to run, the robotics or the traditional, do you really want to base your decision on the judgment of individual teachers or principals, to say nothing of parents and interested citizens? In medicine and other fields, meta-analyses have repeatedly shown that individual clinical judgment is more prone to error than decisions based on statistical evidence (Howick, 2011, Chap. 11). And, as I already mentioned, many of the accepted therapies of earlier periods, from bloodletting to hormone replacement therapy, turned out to be worse for the patients than doing nothing at all.
Now why should education be different? How many teachers have “known” that the so-called whole-word method was the best approach to teaching reading, and years later found out from well-designed studies that this is simply untrue? How many have “known” that children learn more in smaller classes? No, even if RCTs aren’t always the way to go, I don’t think we can leave these things to individual educator judgment; it’s too fallible.
And you may not need to run a new study on the question at issue. There may already be relevant, rigorous studies out there, testing more exploratory classrooms against more traditional ones in the science and math area for middle-schoolers. I recommend you look at the federal government What Works website, which keeps track of trial results you can rely on.
Leo: I’ve looked at many of these studies, and I have two problems with them. They typically use test score gains as their indicator of durable educational value, but these can be very misleading. Incidentally, there’s a parallel criticism of the use of “surrogate end points” like blood levels in medical trials. Moreover, according to Goodhart’s Law—he was a British economist—once a measure becomes a target, it ceases to be a good indicator. This is precisely what happens in education: the more intensely we focus on raising a test score by means of increasing test preparation to say nothing of cheating—everything from making sure the weakest, students don’t take the test to outright changing students’ answers—the less it tells us about what kids can do or will do outside the test situation.
Danielle: Of course we need to be careful about an exclusive reliance on test scores. But you can’t indict an entire approach because it has been misused on occasion.
Leo: I said there was a second problem, as well. You recall that what impressed us about the robotics classroom was the level of involvement of the kids. When you go into a traditional classroom, the kids will always look at the door to see who’s coming in. That’s because they’re bored and looking for a bit of distraction. Now ask yourself, what does that involvement betoken. It means that they’re learning that science is more than memorizing a bunch of facts, that math is more than solving problems that have no meaning or salience in the real world, that using knowledge and engaging in hard thinking in support of a goal you’ve invested in is one of life’s great satisfactions. Most kids hate math and the American public is one of the most scientifically illiterate in the developed world. Why is that? Perhaps it’s because kids have rarely used the knowledge they are acquiring to do anything besides solve problems set by the teacher or textbook.
I’m sure you recall from your studies in philosophy of education the way John Dewey called our attention in Experience and Education to what he called, the greatest pedagogical fallacy, “the notion that a person learns only the particular thing he is studying at the time” (Dewey, 1938, p. 48). Dewey went on to say that what he called “collateral learning,” the formation of “enduring attitudes” was often much more important than the particular lesson, and he cited the desire to go on learning as the most important attitude of all. Now when I look at that robotics classroom, I can see that those students are not just learning a particular lesson, they’re experiencing the excitement that can lead to a lifetime of interest in science or engineering even if they don’t select a STEM field to specialize in.
Danielle: I understand what Dewey is saying about “collateral learning.” In medicine as you know, side effects are never ignored, and I don’t deny that we in education are well behind our medical colleagues in that respect. Still, I’m not sure I agree with you and Dewey about what’s most important, but suppose I do. Why are you so sure that the kids’ obvious involvement in the robotics activity will generate the continuing motivation to keep on learning? Isn’t it possible that a stronger mastery of subject matter will have the very impact you seek? How can we tell? We’d need to first find a way to measure that “collateral learning,” then preferably conduct a randomized, controlled trial, to determine which of us is right.
Leo: I just don’t see how you can measure something like the desire to go on learning, yet, and here I agree with Dewey, it may be the most important educational outcome of all.
Danielle: This is a measurement challenge to be sure, but not an insurmountable one. Here’s one idea: let’s track student choices subsequent to particular experiences. For example, in a clinical trial comparing our robotics class with a conventional middle school math and science curriculum, we could track student choices of math and science courses in high school. Examination of their high school transcripts could supply needed data. Or we could ask whether students taking the robotics class in middle school were more likely (than peers not selected for the program) to take math courses in high school, to major in math or science in college, etc. Randomized, longitudinal designs are the most valid, but I admit they are costly and take time.
Leo: I’d rather all that money went into the kids and classrooms.
Danielle: I’d agree with you if we knew how to spend it to improve education. But we don’t, and if you’re representative of people involved in making policy at the school district level, to say nothing of teachers brainwashed in the Deweyian approach by teacher educators, we never will.
Leo: That’s a low blow, Danielle, but I haven’t even articulated my most fundamental disagreement with your whole approach, your obsession with measurement and quantification, at the expense of children and education.
Danielle: I’m not sure I want to hear this, but I did promise to hear you out. Go ahead.
Leo: We’ve had about a dozen years since the passage of the No Child Left Behind Act to see what an obsessive focus on test scores looks like and it’s not pretty. More and more time is taken up with test-prep, especially strategies for selecting right answers to multiple-choice questions. Not a few teachers and principals succumb to the temptation to cheat, as I’m sure you’ve read. Teachers are getting more demoralized each year, and the most creative novice teachers are finding jobs in private schools or simply not entering the profession. Meanwhile administrators try to game the system and spin the results. But even they have lost power to the statisticians and other quantitatively oriented scholars, who are the only ones who can understand and interpret the test results. Have you seen the articles in measurement journals, the arcane vocabulary and esoteric formulas on nearly every page?
And do I have to add that greedy entrepreneurs with a constant eye on their bottom lines persuade the public schools to outsource more and more of their functions, including teaching itself. This weakens our democracy and our sense of community. And even after all those enormous social costs, the results on the National Assessment of Educational Progress are basically flat and the gap between black and white academic achievement—the impetus for passing NCLB in the first place—is as great as it ever was.
Danielle: I agree that it’s a dismal spectacle. You talk as if educators had been adhering to Evidence Based Policy for the last dozen years, but I’m here to tell you they haven’t and that’s the main reason, I’d contend, that we’re in the hole that we are. If educators were less resistant to the scientific approach, we’d be in better shape today. Physicians have learned to deal with quantitative data, why can’t teachers, or are you telling me they’re not smart enough? Anyhow, I hope you feel better now that you’ve unloaded that tirade of criticisms.
Leo: Actually, I’m not through, because I don’t think we’ve gotten to the heart of the matter yet.
Danielle: I’m all ears.
Leo: No need to be sarcastic, Danielle. Does the name Michel Foucault mean anything to you? He was a French historian and philosopher.
Danielle: Sure, I’ve heard of him. A few of my colleagues in the school of education, though not in my department, are very enthusiastic about his work. I tried reading him, but I found it tough going. Looked like a lot of speculation with little data to back it up. How is his work relevant?
Leo: In Discipline and Punish, Foucault described the way knowledge and power are intertwined, especially in the human sciences, and he used the history of the school examination as a way of illustrating his thesis (1975/1995, pp. 184-194). Examinations provide a way of discovering “facts” about individual students, and a way of placing every student on the continuum of test-takers. At the same time, the examination provides the examiners, scorers and those who make use of the scores ways to exercise power over kids’ futures. Think of the Scholastic Assessment Tests (SATs) for example. Every kid’s score can be represented by a number and kids can be ranked from those scoring a low of 600 to those with perfect scores of 2400. Your score is a big determinant of what colleges will even consider you for admission. But that’s not all: Foucault argued that these attempts to quantify human attributes create new categories of young people and thereby determine how they view themselves. If you get a perfect SAT score, or earn “straight As” on your report card, that becomes a big part of the way others see you and how you see yourself. And likewise for the mediocre scorers, the “C” students, or the low scorers who not only have many futures closed to them, but may see themselves as “losers,” “failures,” “screw-ups.” A minority may, of course resist and rebel against their placement on the scale—consider themselves to be “cool”, unlike the “nerds” who study, but that won’t change their position on the continuum or their opportunities. Indeed, it may limit them further as they come to be labeled “misfits” “ teens at-risk,” “gang-bangers” and the like. But, and here’s my main point, this entire system is only possible due to our willingness to represent the capabilities and limitations of children and young people by numerical quantities. It’s nothing but scientism, the delusive attempt to force the qualitative, quirky, amazingly variegated human world into a sterile quantitative straight-jacket. You recall the statement that has been attributed to Einstein, don’t you, “Not everything that can be counted counts, and not everything that counts can be counted.” I just don’t understand your refusal to grasp that basic point; it drives me mad.
Danielle: Calm down, Leo. I don’t disagree that reducing individuals to numbers can be a problem; every technology has a dark side, I’ll grant you that, but think it through. Do you really want to go back to a time when college admissions folks used “qualitative” judgments to determine admissions? When interviewers could tell from meeting a candidate or receiving a letter of recommendation if he were a member of “our crowd,” would know how to conduct himself at a football game, cocktail party, or chapel service, spoke without an accent, wasn’t a grubby Jew or worse, a “primitive” black man or foreign-born anarchist or communist. You noticed I used the masculine pronoun: Women, remember, were known to be incapable of serious intellectual work, no data were needed, the evidence was right there in plain sight. Your Foucault is not much of a historian, I think.
Leo: We have some pretty basic disagreements here. I know we each believe we’re right. Is there any way to settle the disagreement?
Danielle: I can imagine a comprehensive, longitudinal experiment in a variety of communities, some of which would carry out EBEP and control communities that would eschew all use of quantification. After a long enough time, maybe twenty years, we’d take a look at which communities were advancing, which were regressing. Of course, this is just an idea; no one would pay to actually have it done.
Leo: But even if we conducted such an experiment, how would we know which approach was successful?
Danielle: We shouldn’t depend on a single measure, of course. I suggest we use a variety of measures, high school graduation rate, college attendance, scores on the National Assessment of Educational Progress, SATs, state achievement tests, annual income in mid-career, and so on. And, of course, we could analyze the scores by subgroups within communities to see just what was going on.
Leo: Danielle, I can’t believe it. You haven’t listened to a word I’ve said.
Danielle: What do you mean?
Leo: If my favored policy is to eschew quantitative evidence altogether, wouldn’t I be inconsistent if I permitted the experiment to be decided by quantitative evidence, such as NAEP scores or worse, annual incomes? Don’t you recall that I reject your fundamental assumption—that durable, significant consequences of educational experiences can be represented as quantities?
Danielle: Now I’m the one that’s about to scream. Perhaps you could assess a single student’s progress by looking at her portfolio at the beginning and end of the school year. How, in the absence of quantification, though, can you evaluate an educational policy that affects many thousands of students? Even if you had a portfolio for each student, you’d still need some way to aggregate them in order to be in a position to make a judgment about the policy or program that generated those portfolios. You gave me that Einstein quote to clinch your argument. Well, let me rebut that with a quotation by another famous and original thinker, the Marquis de Condorcet, an eighteenth century French philosopher and social theorist. Here’s what he said: “if this evidence cannot be weighted and measured, and if these effects cannot be subjected to precise measurement, then we cannot know exactly how much good or evil they contain” (Condorcet, 2012, p.138). The point remains true, whether in education or medicine. If you can’t accept it, I regret to say, we’ve reached the end of the conversation.
Cartwright, N & Hardie, J. (2012). Evidence-based policy: A practical guide to doing it better. Oxford and New York: Oxford University Press.
Condorcet, M. (2012). The sketch. In S. Lukes, and N. Urbinati (Eds.), Political Writings (pp. 1-147). Cambridge: Cambridge University Press.
Dewey, J. (1938/1973). Experience and education. New York: Collier Macmillan Publishers.
Foucault, M. (1995). Discipline and punish: The birth of the prison. (A. Sheridan, Trans.) New York: Vintage Books. (Original work published in 1975)
Howick, J. (2011). The Philosophy of evidence-based medicine. Oxford: Blackwell Publishing.