When I served as superintendent for seven years in Arlington, Virginia, the school board evaluated me annually. Boards members and I designed the evaluation. We agreed upon the criteria and the multiple measures to be used in reaching conclusions about my performance as school chief. While the discussions were private, the school board released to the public their judgment of my performance and my salary for the coming year. Because I was a highly visible public employee, taxpayers provided the funds to operate the schools, and I participated in the design of the evaluation, I had no reservations about the process or making the results public.
I do have strong reservations, however, about the recent Los Angeles Times article analyzing English and math test scores between 2002-2009 for 6,000 district elementary school teachers who taught third through fifth graders. For those partisans who endorse policies that use student test scores as part of an annual evaluation of a teacher’s performance, the article was nectar. For opponents of such policies, and I include myself among opponents, the article was flammable material.
Yet both partisans and opponents (or bloggers who had not yet made up their minds) about value-added measures as a component for evaluating teachers agreed that releasing the names of the teachers and labeling them ineffective publicly was teacher-bashing, inducing twinges, at outing “bad” teachers. See here and here and here.
Rather than summarize the usual arguments pro-and-con for value-added measures, I will focus on one plus of such data and then my reservations about the Los Angeles Times piece using student scores on standardized tests over seven years to determine a teacher’s “goodness.”
One plus of the newspaper analysis is that the data question the conventional wisdom of what is a “good” teacher. Consider third grade teacher Karen Caruso and her colleague down the hall, Nancy Polacheck at the Third Street Elementary School. The principal considers 26-year veteran Caruso one of the best teachers in the school. Caruso also prepares future teachers at UCLA, is certified by the National Board for Professional Teaching Standards, and parents express satisfaction with her. But the newspaper’s analysis of test scores puts her in the lowest 10 percent of teachers in raising student scores.
Nancy Polacheck a veteran of 38 years teaching in the same school without Caruso’s credentials and using different teaching approaches–according to the journalists that observed her classroom–ranks in the top 5 percent of the 6,000 teachers for raising student test scores.
What is happening in these two classrooms of veteran teachers? Is it teacher expectations of students? Pedagogy? Knowledge of reading and math? Some combination of these and other factors. No one knows. Yet finding out is crucial to figuring out why the differences in student test scores over seven years.
These comparisons of individual teachers in the article also raise serious issues for me.
My reservations begin with the common assumption often expressed explicitly by supporters of value-added measures that the Los Angeles Times hardly revealed anything since “just about everyone in any school can tell you who the really good teachers are in the building. Whether they will tell you is another story, perhaps, but everyone knows who’s good and who’s bad.”
If only that were the truth! The fact is that notions of “good” teachers vary among parents, other teachers, administrators, policymakers, researchers, and, of course, journalists–see above with Caruso and Polacheck. Traditional and progressive beliefs about how children should learn, what knowledge is of most worth, and how teachers should teach differ among the above groups and vary even more within each of those groups. Desired outcomes–high test scores, problem solving skills, independent thinking, creativity–will vary according to each version of “goodness.” Thus, relying on standardized test scores to evaluate effective–a synonym for “good”–teaching concentrates on only one version of being a “good” teacher since most of the other desired outcomes are missing from standardized test items.
A second reservation is that current value-added measures used to evaluate individual teachers is a beta version subject to many technical glitches and unreliability much as had occurred in first generation technologies from Microsoft Windows to cell phones. Ditto for standardized tests. When errors are made in evaluating individual teachers, however, false positives ruin careers and publicly shame the wrong teachers. Finally, imagine the chaos of using value added measures to evaluate individual teachers after a state adopts a new standardized test. The literature of testing is clear that in such an instance, student scores dip.
In time, value-added measures may be used to evaluate individual teachers as the technology of this methodology eliminates the abundant errors that now exist but without teacher participation in the design of the evaluation and added protections, this unreliable approach to evaluation (and to teacher pay) is damaging to teachers and, ultimately urban school reform.