In Part 1, I made the point that consumer-driven or educationally-oriented algorithms for all of their mathematical exactness and appearance of objectivity in regression equations contain different values among which programmers judge some to be more important than others. In making value choices (like everyone else, programmers are constrained by space, time, and resources), decisions get made that have consequences for both teachers and students. In this post, I look first at those algorithms used to judge teachers’ effectiveness (or lack of it) and then I turn to “personalized learning” algorithms customized for individual students.
Washington, D.C.’s IMPACT program of teacher evaluation
Much has been written about the program that Chancellor Michelle Rhee created during her short tenure (2007-2010) leading the District of Columbia public schools (see here and here). Under Rhee, IMPACT, a new system of teacher evaluation has been put into practice. The system is anchored in The “Teaching and Learning Framework,” that D.C. teachers call the “nine commandments” of good teaching.
1. Lead well-organized, objective-driven lessons.
2. Explain content clearly.
3. Engage students at all learning levels in rigorous work.
4. Provide students with multiple ways to engage with content.
5. Check for student understanding.
6. Respond to student misunderstandings.
7. Develop higher-level understanding through effective questioning.
8. Maximize instructional time.
9. Build a supportive, learning-focused classroom community.
IMPACT uses multiple measures to judge the quality of teaching. At first, 50 percent of an annual evaluation was based upon student test scores; 35 percent based on judgments of instructional expertise (see “nine commandments”) drawn from five classroom observations by the principal and “master educators,” and 15 percent based on other measures. Note that policymakers initially decided on these percentages out of thin air. Using these multiple measures, IMPACT has awarded 600 teachers (out of 4,000) bonuses ranging from $3000 to $25,000 and fired nearly 300 teachers judged as “ineffective” in its initial years of full operation. For those teachers with insufficient student test data, different performance measures were used. Such a new system caused much controversy in and out of the city’s schools (see here and here)
Since then, changes have occurred. In 2012, the 50 percent of a teacher’s evaluation based on student test scores had been lowered to 35 percent (why this number? No one says) and the number of classroom observations had been reduced. More policy changes have occurred since then (e.g., “master educator” observations have been abolished and now principals do all observations; student surveys of teachers added). All of these additions and subtractions to IMPACT mean that the algorithms used to judge teachers have had to be tweaked, that is, altered because some variables in the regression equation were deemed more (or less) important than others. These policy changes, of course, are value choices. For a technical report published in 2013 that reviewed IMPACT, see here.
And the content of the algorithms have remained secret. An email exchange between the overseer of the algorithm in the D.C. schools and a teacher (who gave her emails to a local blogger) in 2010-2011 reveal the secrecy surrounding the tinkering with such algorithms (see here). District officials have not yet revealed in plain language the complex algorithms to teachers, journalists, or the general public. That value judgments are made time and again in these mathematical equations is clear. As are judgements in the regression equations used to “personalize learning.”
Personalized Learning algorithms
“The consumerist path of least resistance in America takes you to Amazon for books, Uber for transportation, Starbucks for coffee, and Pandora for songs. Facebook’s ‘Trending’ list shows you the news, while Yelp ratings lead you to a nearby burger. The illusion of choice amid such plenty is easy to sustain, but it’s largely false; you’re being herded by algorithms from purchase to purchase.”
Mario Bustillos, This Brand Could be Your Life, June 28, 2016
Bustillos had no reason to look at “personalized learning” in making her case that consumers are “herded by algorithms from purchase to purchase.” Had she inquired into it, however, she would have seen the quiet work of algorithms constructing “playlists” of lessons for individual students and controlling students’ movement from one online lesson to another absent any teacher hand-prints on the skills and content being taught. Even though the rhetoric of “personalized learning” mythologizes the instructional materials and learning as student-centered, algorithms (mostly proprietary and unavailable for inspection) written by programmers making choices about what students should learn next are in control. “Personalized learning” is student-centered in its reliance on lessons tailored to ability and performance differences among students. And the work of teachers is student-centered in coaching, instructing, and individualizing their attention as well as monitoring small groups working together. All of that is important, to be sure. But the degree to which students are making choices out of their interests and strengths in a subject area, such as math, they have little discretion. Algorithms rule (see here, here, and here).
Deeply embedded in these algorithms are theories of learning that seldom are made explicit. For example, adaptive or “personalized learning” are contemporary, high-tech versions of old-style mastery learning. Mastery learning, then and now, is driven by behavioral theories of learning. The savaging of “behaviorism” by cognitive psychologists and other social scientists in the past few decades has clearly given the theory a bad name. Nonetheless, behaviorism and its varied off-shoots drive contemporary affection for “personalized learning” as it did for “mastery learning” a half-century ago (see here and here). I state this as a fact, not a criticism.
With advances in compiling and analyzing masses of data by powerful computers, the age of the algorithm is here. As consumers, these rules govern choices we make in buying material goods and, as this post claims, in evaluating teachers and “personalized learning.”
8 responses to “Algorithms in Use: Evaluating Teachers and “Personalizing” Learning (Part 2)”
Really interesting and needed post on the mystery of algorithms. In online learning, all of the choices are already preprogrammed. There is no adaptivity beyond what was already coded. I think this point gets lost, especially among people who have never coded. Perhaps the best reason to give students some experience in coding would be to help them see the limitations of the choices.
I also appreciate the capriciousness you acknowledge in the percentages for the teacher evaluation systems. I would argue that the same is true for cut off scores for standardized tests, and probably the scoring in general…well and probably the writing of the test items. There is not an universally agreed upon level of “advanced” or “basic” and there certainly is not an agreed upon way to write test questions, especially when it comes to the details of what to include in them. In physics, there are many opinions surrounding what is appropriate to assess, such as whether to include unit conversions or not, whether to include extraneous information, whether to make the assessment item context rich, whether to give the need information graphically or in some other form that requires additional work to get the needed info. I often say that if you put 10 physics teachers in a room, you will get 7 different opinions about details of teaching. I recently attended my physics teacher conference and went just such a discussion session. My stats were off just a bit, but there were definitely a lot of disagreements about approaches to assessment and class management.
Anecdote bringing these two points together: On the Washington Post comment section for an article about standardized testing assessment items, there was a lively discussion surrounding a question about the volume of a spherical tent. One poster noted that she looked at tents on Amazon to evaluate the question and then had ads for tents coming up on the WaPo site as she typed her comments!
Nice connections, Alice. Thanks for elaborating on physics teachers and pointing out how the Amazon algorithm based on previous inquiries produces ad after ad on tents for that intrepid person seeking answers to the volume of a tent.
Thanks, Alice, for making the connection to near-capriciousness of cut-off scores on standardized tests. Great stories about physics teachers and hunting answers on Amazon.
The centerpiece strategy for DCPS has been the IMPACT teacher evaluation system. It is used for two purposes: as the basis for additional compensation and as the basis for dismissal. The reviewers used only quantitative measures to analyze the results to see if IMPACT is working to retain effective teachers and get rid of ineffective ones. Their conclusion is that there is some statistical evidence that more highly effective teachers are staying and more ineffective teachers are leaving, but that is a measure of correlation, not causation. No one bothered to ask teachers what they think of IMPACT. The reviewers also found problems with their own method of analysis:
• Changes in IMPACT since its initial implementation make it hard to determine whether teacher effectiveness ratings have improved over time.
• One-year Value Added scores are used which testing experts say are less reliable
• There is little quality control in the judgement-based ratings
• IMPACT observation rubrics include a limited range of classroom practices
• Teachers in high poverty schools get significantly lower IMPACT scores than teachers in low-poverty schools which may mean there’s an evaluation bias, not that worse teachers teach in more challenging schools.
What’s missing from the reviewers analysis is any way of determining whether IMPACT is improving the actual quality of teaching and learning in the school system. No attempt was made to get a subjective assessment of whether teachers feel they are benefitting from IMPACT or whether IMPACT has had a positive or negative effect on keeping good teachers or helping struggling ones. The only DCPS personnel interviewed by the evaluators were three Associate Superintendents who are in charge of supervising principals. Meanwhile,
• One third of the teachers granted bonuses in the first year refused them and a significant number still refuse their bonuses.
• The WTU conducted a survey in 2012 and over 90% of teachers wanted IMPACT ended.
• There is little coordination between IMPACT and school-based coaches charged with teacher training and improving instruction
• IMPACT master educators observe between 80 and 100 teachers as clients, ensuring that they are able to provide little or no support, only judgements.
• The investment in IMPACT for central office-based Master Educators and bonuses for teachers rated highly effective is huge. It was subsidized by foundation grants for the first three years but now competes with other school needs and staffing.
• None in the charter sector has adopted IMPACT as their model
Several of these factors were mentioned in the NRC conclusions, but did not lead to summative conclusions about the success or failure of the approach in IMPACT. As Vicki Phillips, Director of the Gates Foundation’s education programs says, ”If a teacher evaluation system is not valued and trusted but teachers themselves, it fails.”
Thank you, Elizabeth, for taking the time to comment.Representing the teachers’ union, your knowledge of teachers’ perspectives on IMPACT is ample. Some of your points I was familiar with, others not so. Vicki Phillips’s quote is apt. I do wonder then why IMPACT continues to be used in D.C. The changes Kaya Henderson has made suggest union pushback but the system continues.
Not only is behaviorism inexorably taking over, but “instruction” is being reduced to granular doling out of examples of procedures. Connections and concepts? Well, if you’ve got some outside resource for that, you’ll do fine.
I am a retired DCPS teacher but I still volunteer at the middle school nearest my home.
Also, I mentored new math teachers for a while (2009-2010) through the Math for America – DC program (which is about the exact opposite of Teach for America in every important aspect). Through that program, I sat in on lots of HS and MS math classes in lots of DC public and charter schools.
In addition, I also earn money as a paid tutor for a number of MS, ES and HS students, mostly in DC, at private, public, and charter schools. Naturally, in those tutoring situations, I do not have the luxury of sitting in on the math classes, and can only go by what the students and their parents have to say, and the tests, quizzes, worksheets, and other written materials that they bring home to work on, or the occasional meeting with a teacher.
Bottom line: in my humble opinion
I was never a perfect math teacher, but I don’t think there is such a thing. I did win some teaching awards and also collaborated on a computer program written by a French math teacher for the study of geometry.
Bottom line: as an experienced but far-from-perfect mathematics teacher at the MS and HS level, I do not think that IMPACT and the Common Core have had a positive impact on the teaching of mathematics in DC public or charter schools. There are still clueless teachers out there, as there always have been, and if anything, the disciplinary situation in non-selective neighborhood schools in DC (both public and charter) appears to me to be worse than it was 10, 20, 30 or 40 years ago. The mandated test items (and pre-test, and pre-pre-test, and pre-pre-pre-test ad nauseam) that are foisted on the students in the name of the Common Core are simultaneously extremely verbose, confusing, and often flat-out incorrect. Students are still being passed along despite not having mastered any skills or concepts whatsoever — except in the selective schools.
Let us not forget the 62 numerical targets that Rhee and Henderson committed themselves to achieving when they got those extra millions from the Arnold and other foundations at the beginning of IMPACT. By my very generous accounting, they actually achieved one-and-a-half of those 62 goals. This never gets brought up.
But what it means is, that IMPACT, the Common Core, the Teaching and Learning Framework, and all of the rest have been — by the yardsticks developed by Rhee and Henderson,. not by me — complete and utter failures.
Unless you think that a score of 1.5 out of 62 is a good score.
Thank you, Mr. Brandenburg, for taking the time to comment on IMPACT and Elizabeth Davis’s comment.