Sport Stats and Teacher Stats

“Not everything that can be counted counts, and not everything that

                              counts can be counted.”         Attributed to Albert Einstein

I just saw the film “Moneyball” which is about General Manager Billy Beane (played by Brad Pitt) whose Oakland Athletics shocked the baseball world in 2002 by nearly beating the New York Yankees in the American League playoffs. Why “shocked?” Because Beane had a player payroll costing $41 million and the Yankees laid out $126 million yet both won 103 games that season.

According to the book of the same name published in 2003, Beane upended the conventional wisdom of evaluating players and fielded a team of under-rated athletes that won 20 straight games in the 2002 season (an American League record) before meeting the Yankees in the playoffs.

How did he do it?  Among baseball insiders, the conventional wisdom was to  evaluate players for their potential by making subjective judgments that included numbers. For pitchers their throwing speed, repertoire of pitches, and control. For an infielder or outfielder, counting how many of the basic “tools” (e.g., hitting average, running speed, fielding skills) he had. When scouts found a tool-rich player, the Athletics developed him into a star player. Then–here’s the bad news–that star would skip to another team to earn more money. The Yankees, Red Sox, and Phillies had deeper pockets and often bought these rising stars and in the case of the Oakland team left them with a seriously depleted team in 2001.

Cut to a scene in the film where Beane is listening to his professional scouts discuss “tools” of various ballplayers they are considering for the 2002 season. He stops the discussion and tells them that traditional ways of evaluating ballplayers won’t work for the Athletics because in a small market like Oakland, they can’t pay top prices that big market teams could. Beane tells the scouts they must think differently about evaluating ballplayers and use different metrics such as how many times a player got on base rather than batting average. He tells them that a high-priced star is a blend of several talents so let’s look for less costly players who, together, combine those different talents into what the one super-star had. Using measures that few baseball insiders had applied, Beane and his high-tech side-kick tell these experienced scouts to identify under-valued players that met their new metrics. That is how Beane intended to build a new team for the 2002 season.

In this scene, scouts were upset that Beane ignored their wisdom gained from decades of experience. They knew which ballplayers were stars-in-the-rough; they didn’t have to listen to geeky analysts that Beane had hired to reel off percentages or see computer print-outs. Beane was rejecting their intuitions, experience, and “feel” for the game.

I must confess that this scene showing the confrontation between professional scouts touting their traditional ways of evaluating talent and the new “business plan” of using  “sabermetric” principles reminded me, almost painfully, of the ongoing policy debates over evaluating degrees of teacher effectiveness. Why “painfully?”

Professional scouts, based upon their experiences, intuitions, and insights into the game of baseball, made qualitative judgments (always including relevant statistics) about players. Just like many researchers, teachers, and administrators, I have argued against reducing the complexity of classroom teaching to an overall number, in part or wholly, based on students’ test scores to make judgments about effectiveness and salary. Long equations factoring in different aspects of teaching are now used in districts to evaluate teachers, get rid of under-performing ones while paying bonuses to effective ones–all determined by complex algorithms.

Caption as printed in the New York Times, March 7, 2011: A statistical model the New York City school system uses in calculating the effectiveness of teachers. In Michael Winerip’s column: http://tinyurl.com/4ssvkbz

My arguments–and here is where I winced during the film–reminded me of what I heard those professional scouts say as they protested Beane’s new metrics and hiring and firing decisions.

Many aspects of teaching (e.g., respect for students, teacher-student relationships, inspiring students to achieve) strongly linked to student behavior and performance cannot be easily captured by numbers. None appear in these algorithms. The above quote from Albert Einstein challenges the sabermetric bias.

Yet the current “business plan” for schools is to use a downsized version of educational “sabermetrics” for evaluating and paying teachers. In 2010, Washington, D.C., using algorithms to evaluate teachers, fired 241 teachers (about 4 percent of the district’s teacher corp) and put an additional 730 on notice that if they do not improve, they could be fired. Ditto for New Haven (CT) in 2010 where 75  of 1846 district teachers (just over 2 percent) were put on a list to be dismissed.

Oh, by the way, Billy Beane is still the Oakland’s A’s General Manager in 2011. His team made the American League playoffs in 2003 and 2006. Not once since then.

21 Comments

Filed under how teachers teach

21 responses to “Sport Stats and Teacher Stats

  1. Baseball isn’t education. Beane was able to revolutionize baseball by using more relevant statistics. (OPS, WAR, etc.) The reason Oakland hasn’t won much since 2006 is that just about every other team adopted his system. The big money Red Sox did and have won the world series twice. Education is currently using flawed statistics from flawed tests. See my summary of “The Myths of Standardized Tests” at http://bit.ly/lJLUNR. Subjective judgments by teachers may not be perfect, but they are better than anything the NCLB/RTT tests can offer. One reason subjective assessments don’t work that well in baseball is that high school players are too far away from maturity to predict how well they will play in four or more years. We can learn a lesson from Money Ball and that is we are at the pre Billy Beane stage and we need the kind of revolution he brought to his game.

  2. Pingback: Sport Stats and Teacher Stats | Larry Cuban on School Reform and … - Angryteach

  3. I strongly agree with you that many aspects of teaching and how we measure student success cannot be measured in numbers. In my article “Fallacy of Good Grades” at Psychology Today (http://www.psychologytoday.com/blog/the-moment-youth/201108/the-fallacy-good-grades), I outline many qualities that are essential for positive youth development but don’t fit into any metric for school reform. A serious “business plan” for schools must incorporate more qualitative measures.

  4. Craig Hochbein

    Remember that one of the reasons for the lack of success after the book was published, was that other teams learned of the A’s competitive advantage and thus began using sabermetrics. The A’s lack of playoff appearances was not due to the statistical models, but rather the adoption of the technique by the wealthier competitors. The sentiment of these sabermetricians was echoed also in the baseball movie Bull Durham. The main character, explains the difference between an everyday baseball MLB player and one in the hall of fame is all but imperceptible to any observer, and statistics helps us make such a great determination. In both of these baseball examples, the statistics derive from daily measurements of performance and not a one time measure. To more accurately apply the tenets of these sabermetricians, educators should try to understand the daily activities of a teacher which accrue over a school year that increase the learning of a student.

    • larrycuban

      Thanks for the reminder of the Kevin Costner film “Bull Durham” and the different measures that might be applied to teachers besides test scores.

  5. Sabermetrics spends a great deal of time determining what numbers really matter. Our high stakes tests haven’t undergone the tuning that Bill James and friends undertook. We just make the numbers more complicated and fair – fair in the sense that the results are near random.

  6. So if we level the “playing field” in education, would it not translate into everyone winning? What a concept!

    • larrycuban

      No, I do not think that the concept of “leveling the playing field,” usually a metaphor for providing equal opportunity for all, was not intended to mean that everyone “wins.” Offering equal opportunity was not (and is not) intended to mean equal results–as I understand the metaphor and its application to schooling.

    • larrycuban

      Jon,

      That is a nice piece you wrote when the book “Moneyball” came out. In replies to other comments I will refer them to your post. Thanks for putting me on to it.

  7. There are parallels for sure but I think it’s even more removed/complex. It’s less like evaluating baseball players on their stats (person/direct action) and more like trying to evaluate the General Manager based on the combined stats of their players (in a situation where they can’t choose players or trade anyone and the entire team changes every year). So even if you get the right stats to measure people, it’s still not likely to give you the information you really want.

    • larrycuban

      Thanks, Tom, for the distinction you made between measuring the players and General Manager. Relying on professional scouts’ traditional measures of player potential (although they clearly used stats in their judgments) and judging teachers via complex equations of factors that are measurable (while leaving others out because they are hard or impossible to quantify) is the parallel that remains, in my opinion, apt. For another look at comparing teaching and the use of data with sport stats, see Jon Becker’s much earlier post at http://dangerouslyirrelevant.org/2006/10/dddm_and_moneyb.html

  8. Don

    I’d avoid too much of a parallel here, at least until you read the far more statistically rigorous book. In that you’ll find as much ally as enemy.

    For example, one of the things they discuss about Bill James’ sabermetrics is the fact that it represents a more accurate assessment of a player with regards to how their behavior turns into runs and therefor wins.

    Baseball traditionally was interested in errors, and still counts them. But you can often avoid getting an error counted against you by simply doing nothing. Is that really the behavior we want to track and potentially reward? Sabermetrics also disdains batting averages because it doesn’t reflect a player’s ability to get on base via, say, a walk.

    That doesn’t justify refusing to measure the things that matter but which we can’t currently or ever measure with numerical metrics. Beane’s use of sabermetrics had issues in that it ignored a lot of things that also make a team a successful financial venture. Fans like to come to see superstars, and if you always trade people off for more “affordable” players then maybe you’re not filling seats even if you’re winning games.

    But making an enemy of the algorithms isn’t a successful measure either. We can’t demand people not use measurements because they’re incomplete. We have to demand they improve the formulas and/or add other measurements. We have to say “that doesn’t account for X!” and make our voices heard.

  9. kwpalmer

    It’s good to know my own thoughts are trending in the right direction. I am currently taking a class with David Labaree at Stanford (your old class, I believe: History of School Reform). And I just wrote a paper making a similar connection between Moneyball and the modern education reform movement’s technocratic tendencies. In fact, you can find a version of the paper on my own education blog at the link below:

    educateddebate.org/2011/10/03/too-smart-by-half-education-moneyball/

    I would be thrilled if you took a look and we could compare notes!
    Kyle Palmer

    • larrycuban

      Kyle,
      Your point in the blog you wrote is that any version of sabermetrics applied to schools, be it Ellwood P. Cubberley, Arne Duncan, or the most recent enthusiast for value-added equations measuring effectiveness among teachers, is part of an historic pattern of technocratic and top-down school reform. I agree.

  10. Bob Calder

    Let’s not forget Jay Gould’s contribution to understanding/misunderstanding what metrics actually tell us in baseball. Most things stare us in the face for ages before we recognize we are making an error in thinking.

    I’m not sure that measurement adds value. Someone needs to prove it to me.

  11. Folks,

    I love watching baseball, but have little interest in stats. It’s the moments on the diamonds that ultimately cause the stats, but it’s in the moment that causes us the delights and the joys. How did Doubleday know that precisely 90 feet from home to first would cause the delights of the flashing double-play?

    That said, I wish somehow we could move this discussion of inappropriately testing teachers with student tests on instead to finding ways for students to better learn their lessons.

    Having spent the requisite semesters in stats courses, I still believe what accounts for most of the variance in learning is the student, and then, perhaps, the parent. If desire to learn and optimal planning and support aren’t there, we are spending our time on the pinhead counting angels.

    Thanks for this fine discussion!

    Tom King
    Former Project Director of the St. Paul Saturn School of Tomorrow

    PS: Take a look at what happens when you put the kids in charge. Just click the video: http://www.bobpearlman.org/Learning21/saturn.htm

  12. Pingback: Education Shouldn’t be an Unfair Game! | School Finance 101

Leave a comment