Evaluating Teachers Using Student Test Scores: Value-added Measures (Part 1)

In most organizations, supervisors measure and evaluate employees’ performance. Consequences, both positive and negative, flow from the judgments they make. Not in public schools where supervisors have commonly judged over 95 percent of all teachers “satisfactory.” Such percentages clearly do not distinguish between effective and ineffective teaching. The reform-driven agenda for the past decade that included testing, accountability, expanding parental choice through charter schools, and establishing a Common Core curriculum across the nation now includes in its to-do list distinguishing between good and poor teaching.[i]

The current generation of reform-driven policymakers, donors, and educational entrepreneurs are determined to sort out “good” from mediocre and poor teaching if for no other reason than identifying those high-performers who have had sustained effects on student learning and reward them with recognition, bonuses, and high salaries. They are equally determined to rid the teacher corps of persistently ineffective teachers.[ii]

How to identify the best and worst in the profession in ways that teachers perceive as fair, improves the craft of teaching, and retain their support for the process has, in most places, thwarted reformers. But not enough to stop policymakers and donors from launching a flurry of programs that seek to recognize high performers while firing time-servers.

Reform-minded policymakers, donors, and media have concentrated on annual test scores. Some big city districts like Washington, D.C., Los Angeles and New York have not only used student scores to determine individual teacher effectiveness but also permitted publication of each teacher’s “effectiveness” ranking  (e.g., Los Angeles Unified school district). Because teachers see serious flaws in using test scores to reward and punish teaching, they are far less enthusiastic about new systems to evaluate teaching and award bonuses.

Behind these new policies in judging teaching performance are models of teaching effectiveness containing complex algorithms drawn from research studies done a quarter-century ago by William Sanders and others called “value-added measures” (VAM).

How do value-added measures work? Using an end-of-year standardized achievement test in math and English, VAM predicts how well a student would do based on student’s attendance, past performance on tests, and other characteristics. Student growth in learning (as measured by a standardized test) is calculated. Thus, how much value—the test score–does the teacher add to each student’s learning in a year. Teachers of those students who take these end-of-year tests are then held responsible for getting their students to reach the predicted level. If a teacher’s students do reach or exceed their predicted test scores, the teacher is rated effective or highly effective. For those teachers whose students miss the mark, they are rated ineffective.

Most teachers perceive VAM as unfair. Less than half of all teachers (mostly in elementary schools and even a smaller percentage in secondary schools) have usable data (e.g., multiple years of students’ math and reading scores) to be evaluated. For those teachers lacking student test scores, new tests will have to be developed and other metrics will be used. Also teachers know that other factors such as student effort and family background play a part in their academic performance. They also know that other data drawn from peer and supervisor observations of lessons, the quality of instructional materials used by teachers, student and parent satisfaction with the teacher are weighed much less or even ignored in judging teaching performance.

Moreover, student scores are unstable year to year, that is, different students are being tested not the same ones as they move through the grades yet teacher “effectiveness” ratings are based on different cohorts of students. What this means is that a substantial percentage of teachers in one year might be ranked “highly effective” and in the next year may be ranked “ineffective.” False positives (e.g., tests that say you have cancer when you do not) are common in such situations. Furthermore, many teachers know that both measurement error and teaching experience (i.e., over time, teachers improve as they have bad years and good years) accounts for instability in ratings of teacher effectiveness. Finally, many teachers see the process of using student scores to judge effectiveness as pitting teacher against teacher, increasing competition among teachers rather than collaboration across grades and specialties within a school; such systems, they believe, are not aimed at helping teachers improve daily lessons but to name, blame, and defame teachers. [iii]

Yet with all of these negatives, there are also many teachers, principals, policymakers, and parents who are convinced that something has to be done to improve evaluation and distinguish between effective and ineffective teaching. In Washington, D.C. a new system of evaluation and pay-for-performance, inaugurated by former Chancellor Michelle Rhee, reveals both the strengths and the flaws in VAM.


[i] Daniel Weisberg, et. al., “The Widget Effect,” (Washington, D.C., The New Teacher Project, 2009).

[ii] There is a crucial distinction between “good” and “successful” teaching and an equally important one between “successful” teaching and “successful” learning that avid reformers ignore. See Gary Fenstermacher and Virginia Richardson, “On Making Determinations of Quality in Teaching,” Teachers  College Record, 2005, 107, pp. 186-213.

[iii] Linda Darling-Hammond and colleagues summarize the negatives of VAM in “Evaluating Teacher Evaluation,” Education Week, February 12, 2012 at: http://www.edweek.org/ew/articles/2012/03/01/kappan_hammond.html .   For another view that argues, on balance, that VAM is worthwhile in evaluating teachers, see Steven Glazerman, et. al., “Evaluating  Teachers: The Important Role of Value-Added,” Brookings Institution, November 17, 2010, at: http://www.brookings.edu/reports/2010/1117_evaluating_teachers.aspx

 For stability in teacher ratings over time see, Dan Goldhaber and Michael Hansen, “Is It Just a Bad Class? Assessing the Stability of Measured Teacher Performance,” Center for Education Data & Research, University of Washington, 2010, CEDR Working Paper #2010-3. On issues of reliability and validity in value-added measures, see Matthew Di Carlo posts April 12 and 20, 2012 at: http://shankerblog.org/?p=5621 ; http://nepc.colorado.edu/blog/value-added-versus-observations-part-two-validity .



Filed under how teachers teach, school reform policies

15 responses to “Evaluating Teachers Using Student Test Scores: Value-added Measures (Part 1)

  1. Reblogged this on David R. Taylor-Thoughts on Texas Education and commented:
    Value added measures are like trying to compare apples to watermelons. All students are different and yet we try to compare the to each other using a single test.

  2. Pingback: Evaluating Teachers Using Student Test Scores: Value-added Measures (Part 1) | CCSS Resources | Scoop.it

  3. John Thompson

    I’ll be looking forward to your other posts. I was struck by something in your balanced post. The evidence defending value-added for evaluations was dated from 2009 to 2010, while the 2012 studies offer devastating criticism of the method for high-stakes purposes. This is a pattern that holds up well. Even the recent Chetty et. al report says nothing about the validity of vams for evaluations in the toughest schools (for instance they excluded classes where 25% of students are on IEPs, and they did not address qualitative work by Jennings and Pallas on how students were actually assigned to schools.) If Gates had had any idea of the evidence that his work would find, I don’t think he would have invested in vams for evals. His scholars must be even more worried about the results they have found. I’m hopeful that those researchers will change their minds as the evidence accrues.

    Yes, “there are also many teachers, principals, policymakers, and parents who are convinced that something has to be done to improve evaluation and distinguish between effective and ineffective teaching.” But now, the evidence is becoming overwhelming that vams for high-stakes purposes aren’t a tool for improving education. Many reformers who are actually in schools would never use them for evaluating their own teachers. As Eric Hanushek implied, vams now are a weapon for destroying “the status quo,” not building better schools.

    And as for the teachers who support vams, in my experience they are a weapon of self-defense. Either they are young teachers threatened by LIFO, who believe (correctly) that vams could be a weapon against seniority, or they are union leaders who worry that tougher evals, conducted by principals alone could be an even more dangerous threat. So, vams provide a weapon against abusive evaluators.

    Since all educational politics are local, many of those pro-vam teachers are probably factually correct in terms of the effects on their own careers. Similarly, principals (and parents), frustrated by the most incompetent teachers, as well as the timidity of their central offices, are likely to use any weapon available to achieve their goals. I doubt that may pro-vams teachers, principals, and parents are thinking about the greater good for systems across the country, as opposed to grabbing at any weapon that they can use to defend themselves in the toxic era of “reform.”

    • larrycuban

      Thanks, John, for taking the time to make the points that you do. The recent sources you cite and points you make help clarify the issues that VAM raises for practitioners, researchers, and policymakers.

  4. This is not a defense of the methods used to evaluate, but our schools are too big, administrators removed by time and distance from the classrooms, and too many special interests acting. Using tests to evaluate teacher effectiveness does not even begin to measure the teacher effectiveness. It might, and I say it might, give a starting point, but even that is debatable.

  5. Pingback: Evaluating Teachers Using Student Test Scores: Value-added Measures (Part 1) By @larryCuban | A New Society, a new education! | Scoop.it

  6. Pingback: Evaluating Teachers Using Student Test Scores: Value-added Measures (Part 1) By @larryCuban « juandon. Innovación y conocimiento

  7. Bob Calder

    “Such percentages clearly do not distinguish between effective and ineffective teaching.” I don’t think it is a good idea to base the overhaul of performance evaluation of this kind of statement. What percentage of licensed professionals in any occupation are deemed incompetent. For lawyers and doctors, it must be far less than one percent. What occupation are teachers being compared to?

  8. “In most organizations, supervisors measure and evaluate employees’ performance.” If you go right back to the origins of the current drive, the increasing influence of commerce on all aspects of our lives, I think there’s something all interested parties need to grasp.

    Because I’ve experienced this kind of measuring both as a teacher and in business, I know that business performance measures are hugely more precise and uncontroversial than anything comparable in teaching. Relatively simple sales, or other data, often underpins business performance measures. Even with employees who have no sales role at all, who work in HR for example, or whose activity most closely resembles the essentially communicative nature of teaching, good companies find ways of articulating that activity in terms both manager and managed agree to.

    That agreement is a crucial element of practice in successful companies. Put very simply. The manager works out with their team member what the measures should be by examining what the company has employed them to do, what they actually do and even what they’d like to be doing.

  9. Bob, Bruce’s blog was a fascinating read. Reading all the various sources of information Larry’s accumulated here, it does look like an unholy mess.
    The most pragmatic and likely to succeed performance management model at school level for me, would be one that replicates good business practice.

    That would mean allowing a small chain of line managers to discuss and agree with their team members those three fundamental questions:
    What are you employed by the school to do? What do you actually do? and What would you like to do? From that discussion you can then draft something any individual can be measured against. I suspect many heads (principals) would find that exercise hugely beneficial.

    Some teachers might even learn, as I did, that it is actually quite a liberating experience to have hard numbers to meet, when they’re agreed. It does hand over to you a surprisingly high degree of control. But I can’t stress enough how crucial gaining that agreement is. Where the parties disagree: the end result is nearly always conflict.

    • Bob Calder

      I had a very productive experience where we had a department head (middle manager) that had precisely this function. The department was a magnet and worked fantastically well until the position was eliminated. So, I agree with you.

  10. Roque Burio, Jr.

    From Roque Burio Jr., the lemon who and dance but can also sing. Here is my song on the confusion on enforcing the correlation of student test scores and teachers performance. As I have stated in my previous comments, the teachers’ performance must be correlated with the progression and/ or regression of students test scores before and after the students have been enrolled in the current teachers; thereby, removing any biases Arising from any differing attributes or abilities of students learning. Students may remain below average but with correct intervention of his current teachers, his performance will be affected by them hence his performance can go up, remain the same or go down.
    It is the comparative performance of the students before and after being assigned to a teacher that would be correlated with that teacher performance.
    There may be other biases that may arise such the administrative favoritisms of principals and vice principals ouch as the clustering of difficult students in the classes of targeted teachers; however, the performance of the principals and vice principals must likewise be correlated with students test scores and teachers performances—therefrom forcing those unfair and unjust administrators to randomly distribute difficult students to different teachers. The correlations of students test scores to teaching and management performances will definitely lead to Standardizations of curricula, teaching techniques or pedagogies and learning styles in all classrooms, and to all teachers, and students.
    The correlation of students test scores is the foundation of the most objective Educational reform. You are very lucky now teachers compared to my situation before when I and other teachers were subjectively evaluated according to the whim and caprices of many hot headed menopausal administrators. Smile teachers you are now much luckier. Heh, heh, heh, and more heh, he, heh. Beside you cannot violate the Superior Court decision of Judge Chalfant who must have loved Aristotle and Plato.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s