Using Test Scores To Out Ineffective Teachers

When I served as superintendent for seven years in Arlington, Virginia, the school board evaluated me annually. Boards members and I designed the evaluation. We agreed upon the criteria and the multiple measures to be used in reaching conclusions about my performance as school chief. While the discussions were private, the school board released to the public their judgment of my performance and my salary for the coming year. Because I was a highly visible public employee, taxpayers provided the funds to operate the schools, and I participated in the design of the evaluation, I had no reservations about the process or making the results public.

I do have strong reservations, however, about the recent Los Angeles Times article analyzing English and math test scores between 2002-2009 for 6,000 district elementary school teachers who taught third through fifth graders. For those partisans who endorse policies that use student test scores as part of an annual evaluation of a teacher’s performance, the article was nectar. For opponents of such policies, and I include myself among opponents, the article was flammable material.

Yet both partisans and opponents (or bloggers who had not yet made up their minds) about value-added measures as a component for evaluating teachers agreed that releasing the names of the teachers and labeling them ineffective publicly was teacher-bashing, inducing twinges, at outing “bad” teachers. See here and here and here.

Rather than summarize the usual arguments pro-and-con for value-added measures, I will focus on one plus of such data and then my reservations about the Los Angeles Times piece using student scores on standardized tests over seven years to determine a teacher’s “goodness.”

One plus of the newspaper analysis is that the data question the conventional wisdom of what is a “good” teacher. Consider third grade teacher Karen Caruso and her colleague down the hall, Nancy Polacheck at the Third Street Elementary School. The principal considers 26-year veteran Caruso one of the best teachers in the school. Caruso also prepares future teachers at UCLA, is certified by the National Board for Professional Teaching Standards, and parents express satisfaction with her. But the newspaper’s analysis of test scores puts her in the lowest 10 percent of teachers in raising student scores.

Nancy Polacheck a veteran of 38 years teaching in the same school without Caruso’s credentials and using different teaching approaches–according to the journalists that observed her classroom–ranks in the top 5 percent of the 6,000 teachers for raising student test scores.

What is happening in these two classrooms of veteran teachers? Is it teacher expectations of students? Pedagogy? Knowledge of reading and math? Some combination of these and other factors. No one knows. Yet finding out is crucial to figuring out why the differences in student test scores over seven years.

These comparisons of individual teachers in the article also raise serious issues for me.

My reservations begin with the common assumption often expressed explicitly by supporters of value-added measures that the Los Angeles Times hardly revealed anything since “just about everyone in any school can tell you who the really good teachers are in the building. Whether they will tell you is another story, perhaps, but everyone knows who’s good and who’s bad.”

If only that were the truth! The fact is that notions of “good” teachers vary among parents, other teachers, administrators, policymakers, researchers, and, of course, journalists–see above with Caruso and Polacheck. Traditional and progressive beliefs about how children should learn, what knowledge is of most worth, and how teachers should teach differ among the above groups and vary even more within each of those groups. Desired outcomes–high test scores, problem solving skills, independent thinking, creativity–will vary according to each version of “goodness.” Thus, relying on standardized test scores to evaluate effective–a synonym for “good”–teaching concentrates on only one version of being a “good” teacher since most of the other desired outcomes are missing from standardized test items.

A second reservation is that current value-added measures used to evaluate individual teachers is a beta version subject to many technical glitches and unreliability much as had occurred in first generation technologies from Microsoft Windows to cell phones. Ditto for standardized tests. When errors are made in evaluating individual teachers, however, false positives ruin careers and publicly shame the wrong teachers. Finally, imagine the chaos of using value added measures to evaluate individual teachers after a state adopts a new standardized test. The literature of testing is clear that in such an instance, student scores dip.

In time, value-added measures may be used to evaluate individual teachers as the technology of this methodology eliminates the abundant errors that now exist but without teacher participation in the design of the evaluation and added protections, this unreliable approach to evaluation (and to teacher pay) is damaging to teachers and, ultimately urban school reform.

15 Comments

Filed under how teachers teach

15 responses to “Using Test Scores To Out Ineffective Teachers

  1. Test scores can be such a powerful diagnostic tool in student learning. Educational leaders need to stop using this tool in a punitive fashion. Let your principals weed out the “bad teachers.” Let your superintendents weed out the “bad principals.”

  2. If I were a Los Angeles principal, I would be very nervous about the upcoming publication of data for each teacher. This could cause a line out the door of parents who all want the top rated teacher for their kid. If I had access to the data, I would try to determine what the highest rated teachers were doing that the lowest rated teachers were not doing. With this knowledge I could make an informed decision as to which practices I would promote or discourage. Comparing percentile scores from one year to the next would still work even if the test scores go down for any reason. It isn’t perfect but you have to start somewhere. Stay tuned for the release of the data by the LA Times, which they tell me will be “later this month.”

  3. José

    Larry,
    Inevitably, the debate about the purpose of public schooling creeps into this discussion and it further reveals our lack of consensus about what we value in education and, therefore, how we should best measure it.

    You write, “Desired outcomes–high test scores, problem solving skills, independent thinking, creativity–will vary according to each version of ‘goodness.'” It seems to me that underneath this push for linking teachers to student test scores is a desire to have a neat and tidy answer that seemingly eliminates ambiguity and is suitable for publishing in the newspaper. (I’m reminded of the opening anecdote in Daniel Koretz’s book “Measuring Up” in which he describes a neighbor’s desire for a concise and unambiguous answer to which schools are the “good” ones.)

    By this line of increasingly-accepted thinking, schools must be rated “good” or “bad”. Teachers must be rated “effective” or “ineffective”. It’s a seductive argument that both proponents of equity and those seeking to dismantle public education are equally enticed by. If only it were that easy.

  4. Daniel H. Pink in his new book “Drive: The Surprising Truth About What Motivates Us”, which summarizes several decades into extrinsic and intrinsic motivation, reports this surprising result:
    extrinsic motivation (rewards and punishment) actually has a deleterious effect on solving complex problems (like teaching or learning). This surprising result was so counter-intuitive that scientists felt the need to validate it several times over decades. This conclusion is now validated.
    We now know: shaming teachers (punishment) will be counter-productive to the outcome we desire.

  5. Witch-hunt comes to mind. If I read sci-fi, I’d have a more post-modern analogy. I tire from the moral rhetoric in our public discourse (thanks Larry for mentioning it). Among other things, it makes the public stupid(er). That is to say, it gives the false impression that “the individual” is at the center of what are essentially societal- and systems-level issues.

    Ach, I’m critiquing the foundations of our nation, aren’t I? I’ll surely go to a hell.

  6. john thompson

    What do you think the effect of this will be? I figured you’d wait before speculating too much, but I also figure that you, with your historical perspective and practical experience, have a good idea of what the district going to do, and the conflict it will engender. I’m curious about the groundrules for giving Times access to the classroom when the district must have known a lot about their plans. What did they know and when did they know it? And the same applies to Hechinger’s assistance for the VAMs.

    Its sounds like the VAM analaysis of the Times was even more primitive than normal. But think what will happen when secondary scores are included.

    One of the good things that will come out of this is we’ll have a chance to listen to your insights on it. i

  7. Catherine Lugg

    One hundred and fifty years ago when I was a doc student we were required to read “The Technological Bluff” and “The Technological Society,” both by Jacques Ellul (as well as a whole bunch of stuff by some historian with the last name of Cuban–*GRIN!*). What strikes me about “value added assessment” it is today’s perfect technological fix. It’s supposedly disinterested and technocratic, though the weaknesses in the technology are manifest.

    But given US public education’s longstanding infatuation with technology as “savior” this latest manifestation is cultural congruent if pedagogically woefully inadequate.

    And as my colleague Bruce Baker has repeatedly pointed out (see http://schoolfinance101.wordpress.com/2010/08/16/la-times-study-asian-math-teachers-better-than-black-ones/), the evaluation system is so flawed that it is an invitation to ENDLESS litigation. Perhaps “Value Added Assessment” should be renamed as “College Tuition for the children of Litigators” (and they will have earned their fees).

  8. Steve Davis

    Whether they orchestrated it or not, it seems as though the architects of reform in the Washington DC Public School System are essentially floating a trial balloon in California. For those that are familiar with the whole DCPS fiasco, I need say no more. If you don’t know what I am talking about I suggest you start Googling. Any wagers that LAUSD will have something rolled out next year that looks like IMPACT and ties test scores to evaluations to the tune of 55%?

  9. Jane

    Larry, I hope that you or someone like you will try to lay out the circumstances under which changes in individual student test scores could actually be a valid part of a teacher evaluation protocol. We have to accept the reality that it makes no sense to claim that test scores can never, under any circumstances, be used as a benchmark of anthing in school-level or individual teacher-level performance. So, what’s your best guess as to how they could legitimately be used?

  10. tim-10-ber

    As a parent of high achieving students I want to know that they have a highly effective teacher for my type of child…how do I know this? I don’t…this has the potential to hurt the high achieving kids in a given teacher’s classroom…

    So…it would be nice to know how a teacher is able to help students in the below basic, basic, proficient and advanced categories. I would love to see the data published for these categories by teacher for the beginning of the year assessment and end of the year assessment and then to see it over time…

    Yes each class is different as the kids in a given grade are different, but if you break it down by quartiles… I think you will see a different story…

    In my opinion, parents should have this information…

    • Steve Davis

      tim-10-ber,

      I agree that parents should have access to this data. I also assert that parents and other decision makers accept the limitations of the data. I would be hesitant to classify one teacher as more effective than another with a certain group based on seemingly significant differences in their value-added scores. The same teacher may have very different results with two classes based on the inputs in the classroom. Take a mainstream and accelerated class for example. The circumstances in the accelerated class are going to be such that it’s likely that more students will gain higher levels of proficiency over the course of the year than their counterparts (in the same sub-groups) in the mainstream class. The School Finance 101 blog does a good job pointing out other problems with value-added measures. Yes, make the data public and realize its limitations.

  11. There is doubt that Tim-10-ber’s high achievers need a top performing teacher since the vast number of people are average and do what lawyers call a “workman-like” job. The fact is that we don’t know where the top performers are. All we can do is measure those in place and they are not necessarily what you are looking for.

    Yesterday I spoke with a political footsoldier at my door who measured the effectiveness of charter schools by the “overwhelming numbers” of applicants. I told him parents were not necessarily reacting to the quality of teaching on offer, but other perceived benefits such as safety, small classes, and uniforms.

  12. Julia I.

    I wonder who will be left teaching when one risks being called out as “bad” by a single measure. Teachers certainly should be accountable to the community in which they teach, but this doesn’t appear at all like that to me.

  13. ANash of NJ

    Greetings:

    Your article was great! As a teacher my greatest concern is being held responsible for anything that is beyond my control. The established school curriculum, lack of learning in the previous schoolyear, and parenting after 3:00pm weekdays, weekends and on holidays; are all items that are… beyond my control.

    Learning is a fulltime, 24hr, 7day a week event. It works when everybody takes responsibility and does their part. This means that school districts understand and accept that some childern are below or significantly below grade level and need modifications to the existing curriculum to move them forward. It means that parents do their part by esuring homework is done and students come to school ontime and prepared for the business of learning. It means that kids understand that learning is essential and they must try their best. Finallym it means that teachers need to continue to develop and apply professional standards in the classroom.

    However, there are many times when a parent will go on vacation for 1 -2 weeks during the school, or keep kids up and/or out late on schoolnights. School districts will promote kids even though they are 1-2 grade levels behind. Lastly principals will force a grade change upward for a student based on who the kid’s parents are.

    You cannot hold the quarterback responsible for losing the game when the defense and offense are not doing their part. Losing the game was simple beyond the quarterback’s control!

  14. i am are u? work it backwards. From fifties to now our ed purpose has been tied to industrial expansion in the world and the reward for upward mobility/ aggression by the chairmen of the boards. it was a rising tide. But what we are beginning to realize is that the echo is changing, and it is changing not just because our economic life is shifting and eroding but because we have been educated by book and private point of view and our students are literature free, making testing them using old print media unfair and misleading, rewarding soldiers who want to fight with the old weapons. All teachers know students play video games a lot more than read or do homework. The resulting behaviors of these media changes(from book to now, all at onceness of electric tech. ibid Mmc cluhan) is to isolate boys in a world of myth while girls push for liberation from male dominated culture. Of course this is a simplification but it is clear that there are big disconnects between people who accept test scores for evaluating teachers and teachers and others who understand the lack of accuracy in the VAM? Perhaps it is the earnest parent seeking a settled simple numerical answer to teacher evaluation that is the problem. Would test scores be a good way to evaluate priests and mInisters, who after all have a higher place in the community than those ‘labor union socialist teachers’, The public school disconnect works in both directions, but i think most would agree that teaching is not the answer to this media change unless we can teach where their(students) prediliction lies,— or maybe we need to revist the progressive community education and ask again, where are we going ? no manufacturing, robots and computers reducing work loads on humans, we have to have a community or we are headed to Medieval times. Tests are one/sixth of a students abilites overall, and we can’t be fooled into thinking we can teach the last weapons to a group that functions via electricity and a kind of reactivity and then test them on last weapons(literacy) and expect meaningful results. The solution requires that many see their culture and where it is headed and this is why fox news is spending so much of its capital on confusing the masses.
    We are in magical times that give brutal results. Everybody should read McCluhan. i myself was rated least efficient in the school but have had high test scores while teaching students to write, or getting them started on centering and goal setting and conversation and reading and research, all elements of modern scholarship and useful tool for personal communication.

Leave a comment