Tag Archives: Accountability

Pay-for-Performance for CEOs and Teachers

Oracle CEO Larry Ellison earns $37,692 an hour. No, that is not a typo or misplaced comma. Ellison’ annual salary ran $78.4 million, much of it in stock option awards. His salary was based on the annual performance of the company’s stock. Oracle’s Board of Directors set the pay scale (Ellison owns one-fourth of the company’s shares) to spur better management to increase profits and shareholders’ dividends.

They pay Ellison to perform well on the metric they have chosen (“company earnings before income taxes minus the costs of stock-based compensation, acquisitions, restructuring, and other items.” This CEO’s performance pay is not, however, a metric used by other major corporations for paying their top person. I return to the point of different measures used by companies to judge CEO performance later.

Switch now to the average U.S. public school teacher who earns an annual salary of over $55,000. That figure translates to around $27.00 for a 40-hour week. Like Ellison, hundreds of thousands of teachers are involved in pay-for-performance plans. In response to the federal Race To the Top competition, many states have mandated that teachers’ performance and salary be tied to students’ test scores to spur better teaching and student learning. Those test scores, as a factor in assessing effectiveness and determining salary (or bonuses), can range from as much as over half to one-quarter of the decision to set salary and retain or fire a teacher.

While I have written about this pay-for-performance reform over the past few years (see here, here, and here), for this post I want to inspect how the private sector–often a model for U.S. school reform–has its own problems, often undisclosed by business-oriented champions of school metrics, in determining CEO pay.

The lesson to learn from this post is: Paying for CEO performance in companies and schools is as flawed as the measures used to determine it.

A recent study of the metrics used in 195 large companies over the past five years showed that the most popular gauge measuring CEO performance was “total shareholder return.” Over half of the companies using that measure, however,  lost nearly two percent over the five-year period. Companies using less popular equations such as “earnings-per-share growth” gained almost three percent.

Now, here’s the clincher. Most companies judging CEO performance are relying on a metric that yielded loses for investors (“total shareholder return”)  yet,  at the same time, those very same companies continued to give their CEOs substantial raises year after year.

The authors of the study believe that the popularity of the performance measure, i.e., “total shareholder return,” stems from how easy it is for boards of directors and CEOs to manipulate the metric by “removing costs from the equation” such as “discontinuing product lines or closing factories.” Boards of directors then can reward CEOs with higher compensation packages. Earnings-per-share growth, a less popular metric and one of multiple measures that many firms use, sorts out under-performing from high performing firms, the authors found. This one as well as other measures, they concluded, are less easily manipulated by top corporate officials. CEO pay, then, can be better associated with company performance.

The main takeaways from this study is that boards of directors and CEOs do manipulate the numbers,  “one size does not fit all when measuring pay for performance,”and that multiple measures for determining effectiveness and salary have a better chance of capturing performance than single ones do.

Now, consider teacher pay-for-performance where one measure–student test scores–is often used to determine to what degree a teacher is effective. Like “total shareholder return” there are serious problems of using this metric alone or even in concert with other measures to judge teacher performance (see here and here).

Consider the following:

Incentives corrupt measures.

Since the mid-1970s, social scientists have criticized the use of specific quantitative measures to monitor or steer policies because those implementing such policies alter their practices to insure better numbers. The work of social scientist Donald T. Campbell and economists in the mid-1970s about the perverse outcomes of incentives was available but have largely been ignored. Campbell wrote in 1976.

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (Campbell 1976, p.54)

Campbell used examples drawn from statistics on police solving crimes (p. 55), the Soviets setting numerical goals in industry (p. 57), and the U.S.’s use of “body counts” in Vietnam as evidence of winning (p.58). For public schools, Campbell said that “achievement tests are … highly corruptible indicators (p.57).”

That was nearly forty years ago. In the past decade, researchers have documented  (also see here) the link between standardized test scores and narrowed instruction to prepare students for test items, instances of  state policymakers fiddling with cut-off scores on tests, increased dropouts, and straight out cheating. Although how the distortions occur are unclear, the evidence confirms Campbell’s insight.

Easy To Measure Indicators Trump Hard To Measure Ones

Few in business, medicine or education question that some indicators are easier to quantify than others. In medicine, for example, hospital mortality and surgical procedures are fairly easy to measure but the results even when compared to other hospitals and surgeons hide as much as they reveal about effective health care. So it is with standardized tests.

Because test scores are inexpensive and efficient to collect, they draw attention away from important but hard-to-measure aspects of teaching and learning such as student engagement, rapport between teachers and students, academic climate in classroom and school, and principal leadership. Cumulative practitioner experience and stories about teaching over centuries have established these as crucial factors in working with gifted and vulnerable students.


These are known results of using single measures to judge individual or organizational performance. Consequences of their use can be anticipated. Historical examples abound. Some districts (e.g., Denver) wisely have moved to using multiple measures with student outcomes included that go beyond test scores but in most states where such mandates reign, test scores still remain a major part of the equation used to judge teacher performance (e.g., New York City, Washington, D.C., Houston, Texas) and allocate bonuses to teachers and principals.

This manipulation of data and one-size-fits-all measures show up in businesses as well as schools raising serious questions about the worth of this frenetic passion for pay-for-performance in both public and private sectors.

In the meantime, if Oracle’s Larry Ellison read this post in his office–say 10 minutes–he would have earned over $6,000. Ah, to be a CEO.





Filed under school reform policies

Chinese Third Graders Fall Behind U.S. Students (The Onion)*


CHESTNUT HILL, MA—According to an alarming new report published Wednesday by the International Association for the Evaluation of Educational Achievement, third-graders in China are beginning to lag behind U.S. high school students in math and science.

The study, based on exam scores from thousands of students in 63 participating countries, confirmed that in mathematical and scientific literacy, American students from the ages of 14 to 18 have now actually pulled slightly ahead of their 8-year-old Chinese counterparts.

“This is certainly a wake-up call for China,” said Dr. Michael Fornasier, an IEA senior fellow and coauthor of the report. “The test results unfortunately indicate that education standards in China have slipped to the extent that pre-teens are struggling to rank among even the average American high school student.”

“Simply put, how can these third-graders be expected to eventually compete in the global marketplace if they’re only receiving the equivalent of a U.S. high school education?” Fornasier added.

Fornasier stressed that while the gap is not yet dramatically sizable, it has widened over the past two years after American high schoolers tested marginally higher in algebra, biology, and chemistry than, shockingly, most of China’s 8- and 9-year-olds.

“For decades, young children in China have scored at the expected level of their peers in American high schools, so this is a very worrying drop in performance,” said Fornasier, adding that the majority of Chinese third-graders are now a full year behind the average U.S. 12th-grader in their knowledge of calculus. “In the chemistry portion of the exam, for example, Chinese children proved to be slightly deficient compared to American teenagers in their understanding of the periodic table, molecular structure, and the essential principles of atomic theory.”

“And even when they did test at the same level in mathematics, it often took Chinese elementary school students 10 to 15 minutes longer to do simple things like factor a polynomial equation or compute the derivative of a continuous function,” Fornasier added. “That just isn’t normal.”

In addition to disappointing marks from grade school children in China, 10-year-olds in Germany, South Korea, Japan, Switzerland, and New Guinea also reportedly tested an average of three percentage points lower than U.S. high school seniors in physics, with education officials from each country expressing deep concerns about the increasingly mediocre quality of their primary schools.

In light of the alarming study, many in China have called for considerable reforms of the country’s education system, including implementing far stricter standards for teachers, investing in better learning materials, and increasing the length of school days.

“Our third grade classes clearly cannot afford to lag behind American high schools if they are to be successful in the future,” read an official statement from China’s Minister of Education, Yuan Guiren. “Frankly, the scores are unacceptable, and we have to turn this around immediately. If there’s an American 17-year-old who can do something academically that a Chinese 8-year-old can’t, that’s a very big problem.”

 “At that rate, how do we expect our Chinese 13-year-olds to be ready for American colleges?” Yuan continued.


*If you have reached the end of the piece and have not yet figured out that The Onion specializes in satire, parody, and comic humor, I want readers to know that this is a fictitious article poking fun at U.S. school reformers’ obsessive focus on international test score comparisons, the supposed high quality of Chinese education and perceived low academic quality of U.S. high schools.

Thanks to Joel Westheimer for sending this piece to me.


Filed under school reform policies

Buying iPads, Common Core Standards, and Computer-Based Testing

The tsunami of computer-based testing for public school students is on the horizon. Get ready.

For adults, computer-based testing has been around for decades. For example, I have taken and re-taken the California online test to renew my driver’s license twice in the past decade. To get certified to drive as a volunteer driver for Packard Children’s Hospital in Palo Alto, I had to read gobs of material about hospital policies and federal regulations on confidentiality before taking a series of computer-based tests. To obtain approval from Stanford University for a research project of which I am the principal investigator and where I would interview teachers and observe classrooms, I had to read online a massive amount of material on university regulations about consent of subjects to participate, confidentiality, and handling of information gotten from interviews and classroom observations.  And again, I took online tests that I had to pass in order to gain approval from the University to conduct research.  Beyond the California Department of Motor Vehicles, Children’s Hospital, and Stanford University, online assessment has been a staple in the business sector from hiring through employee evaluations.  So online testing is already part of adult experiences

What about K-12 students?  Increasingly, districts are adopting computer-based testing. For example, Measures of Academic Progress, a popular test used in many districts is online. Speeding up this adoption of computer-based testing is the Common Core Standards and the two consortia that are preparing assessments for the 45 states on the cusp of implementing the Standards. Many states have already mandated online testing for their own standardized tests to get prepared for impending national  assessments. These tests will require students to have access to a computer with the right hardware, software, and bandwidth to accommodate online testing by 2014-2015 (See here, here, and here).

There are many pros and cons with online testing as, say, compared with paper-and-pencil tests. But whatever those pros are for paper-and-pencil tests, they are outslugged and outstripped by the surge of buying new devices and piloting of computer-based tests to get ready for Common Core assessments (see here and here). Los Angeles Unified school district, the second largest in the nation, just signed a $50 million contract with Apple for  iPads. One of the key reasons to buy these devices for the initial rollout for 47 schools was Common Core standards and assessment. Each iPad comes with an array of pre-loaded software compatible with the state online testing system and impending national assessments. The entire effort is called The Common Core Technology Project.

The best (and most recent) gift to the hardware and software industry has been the Common Core standards and assessments. At a time of fiscal retrenchment in school districts across the country when schools are being closed and teachers are let go, many districts have found the funds to go on shopping sprees to get ready for the Common Core.

And here is the point that I want to make. The old reasons for buying technology have been shunted aside for a sparkling new one. Consider that for the past three decades the rationale for buying desktop computers, laptops, and now tablets has been three-fold:

1. Make schools more efficient and productive so that students learn more, faster, and better than they had before.

2. Transform teaching and learning into an engaging and active process connected to real life.

3. Prepare the current generation of young people for the future workplace.

After three decades of rhetoric and research, teachers, principals, students, and vendors have their favorite tales to prove that these reasons have been achieved. But for those who want more than Gee Whiz stories, who seek a reliable body of evidence that shows students learning more, faster, and better, that shows teaching and learning to have been transformed, that using these devices have prepared the current generations for actual jobs—well, that body of evidence is missing for each of these traditional reasons to buy computers.

With Common Core standards adopted, the rationale for getting devices has shifted. No longer does it  matter whether there is sufficient evidence to make huge expenditures on new technologies. Now, what matters are the practical problems of being technologically ready for the new standards and tests in 2014-2015: getting more hardware, software, additional bandwidth, technical assistance, professional development for teachers, and time in the school day to let students practice taking tests.

Whether the Common Core standards will improve student achievement–however measured–whether students learn more, faster, and better–none of this matters in deciding on which vendor to use. It is not whether to buy or not. The question is: how much do we have and when can we get the devices. That is tidal wave on the horizon.


Filed under technology, testing

Cartoons from Robert Rendo

For this month’s feature on cartoons*, I chose a selection from Robert Rendo. We met through my blog and he sent along a sampling of his cartoons from which I selected some  on students and testing. He sent me the following description of himself.

Robert Rendo grew up in New Hyde Park, New York. He lives in New York City and in Massachusetts with his wife Rachel, an educator and children’s clothing designer.

Robert Rendo is an editorial illustrator and president of PoliticalCartoonsOnline.com. A native New Yorker, Mr. Rendo’s  work has earned him publication in the Op/Ed section of the New York Times, the Chicago Tribune, and the Sacramento Bee. Recently, Rendo designed the logo, masthead, and branding for educational historian Diane Ravitch’s Network for Public Education. His images can also be found on Stephen Krashen’s blog, Education Notes Online, and the blog “Susan Ohanian Speaks Out”.

“I hope to provoke all readers,” says Rendo. “Editorial art is a precise genre.”

When asked how he foresees illustration in media, Rendo said, “I think it’ll be shared more equally between hardcopy and the internet. For me, It’s more satisfying to hold a periodical in your fingers and turn the pages. Whatever the format, illustration will always help chronicle the follies of man. And we humans screwing up never seems to be out of vogue. “

C59 copy

C119 copy 1

C17 copy

education 2 copy 5

C63 copy 3

C51 copy copy

fat cat 1 copy 3


In previous months, the following cartoons have been posted:  “Digital Kids in School,” “Testing,” “Blaming Is So American,”  “Accountability in Action,” “Charter Schools,” and “Age-graded Schools,” Students and Teachers, Parent-Teacher Conferences, Digital Teachers, Addiction to Electronic DevicesTesting, Testing, and Testing, Business and Schools, Common Core Standards, Problems and Dilemmas, Digital Natives (2),  Online Courses,  , Students and Teachers Again, “Doctors and Teachers,Parent/teacher conferences, Preschools,”and “Life at Lincoln Middle School.”


Filed under school reform policies

Testing, Testing, and Testing: More Cartoons

The U.S. has tests galore. Driving, alcohol, steroids, DNA, citizenship, blood,  pregnancy–and on and on. Most serve a specific purpose and carry personal consequences if one passes or fails. School tests, however, to pass a course, to be promoted to another grade, to graduate and to judge whether the school is satisfactory or on probation have proliferated dramatically in the past three decades. Opinions are split among Americans about these tests.

Surveys report that most teachers (but by no means all) believe that there is too much standardized testing. Some parents have mobilized to boycott annual tests. Most respondents to opinion polls, however, support curriculum standards, accountability, and, yes, state tests.

Of the many cartoons on testing that I have located, most reflect the opinion that there is too much testing and too much is made of the results. I have found very few–none that I can recall or that I have posted–endorsing standardized tests. Here is a sampling of those cartoons.

For those readers who wish to see previous monthly posts of cartoons, see: “Digital Kids in School,” “Testing,” “Blaming Is So American,”  “Accountability in Action,” “Charter Schools,” and “Age-graded Schools,” Students and Teachers, Parent-Teacher Conferences, Digital Teachers, and Addiction to Electronic Devices.


Filed under testing

Remembering Test Scores and Learning about Regression toward the Mean

Here is a story about test scores. I was superintendent of the Arlington (VA) public schools between 1974-1981. In 1979 something happened that both startled me and gave me insight into the public power of test scores. The larger lesson, however, came years after I left the superintendency when I began to understand the powerful drive that we have to explain something, anything, by supplying a cause, any cause, just to make sense of what occurred.

In Arlington then, the school board and I were responsible for a district that had declined in population (from 20,000 students to 15,000) and had become increasingly minority (from 15 percent to 30). The public sense that the district was in free-fall decline, we felt, could be arrested by concentrating on academic achievement, critical thinking, expanding the humanities, and improved teaching. After five years, both the board and I felt we were making progress.

State  test scores–the coin of the realm in Arlington–at the elementary level climbed consistently each year. The bar charts I presented at press conferences looked like a stairway to the stars and thrilled school board members. When scores were published in local papers, I would admonish the school board to keep in mind that these scores were  a very narrow part of what occurred daily in district schools. Moreover, while scores were helpful in identifying problems, they were largely inadequate in assessing individual students and teachers. My admonitions were generally swept aside, gleefully I might add, when scores rose and were printed school-by-school in newspapers. This hunger for numbers left me deeply skeptical about standardized test scores as signs of district effectiveness.

Then along came  a Washington Post article in 1979 that showed Arlington to have edged out Fairfax County, an adjacent and far larger district, as having the highest Scholastic Aptitude Test (SAT) scores among eight districts in the metropolitan area (yeah, I know it was by one point but when test scores determine winners  and losers in a horserace, Arlington had won by a nose).

I knew that SAT results had nothing whatsoever to do with how our schools performed. It was a national standardized instrument to predict college performance of individual students; it was not constructed to assess district effectiveness. I also knew that the test had little to do with what Arlington teachers taught. I told that to the school board publicly and anyone else who asked about the SATs.

Nonetheless, the Post article with the box-score of  test results produced more personal praise, more testimonials to my effectiveness as a superintendent, and, I believe, more acceptance of the school board’s policies than any single act during the seven years I served. People saw the actions of the Arlington school board and superintendent as having caused those SAT scores to outstrip other Washington area districts.

That is what I remember about the test scores in Arlington and that Post article in 1979.

Since then, I have learned about “regression toward the mean.” It was an eye-opener. Here’s a psychologist who defines regression toward the mean as “random fluctuations in the quality of performance” meaning that both luck and skill are involved but randomness is the key.

In sports, examples of this statistical concept are those athletes whose rookie year is outstanding and then they slump in their second year; best selling debut novelists write a subsequent one that tanks; hot TV shows soar in their initial season and then get low ratings the next year. They “regress to the mean” or average.

Another example from Wikipedia:

“A class of students takes two editions of the same test on two successive days….[T]he worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because student scores are determined in part by underlying ability and in part by chance. For the first test, some will be lucky, and score more than their ability, and some will be unlucky and score less than their ability. Some of the lucky students on the first test will be lucky again on the second test, but more of them will have (for them) average or below average scores. Therefore a student who was lucky on the first test is more likely to have a worse score on the second test than a better score. Similarly, students who score less than the mean on the first test will tend to see their scores increase on the second test.”

Because our mind loves causal explanations, we say that those students, those athletes, those novelists performed well and then had a bad year because their smarts and skills deteriorated. Instead of realizing and acknowledging that with regression toward the mean, good performance is usually followed by poor performance (and vice versa) not because of talent and skill failing but because of luck and the “inevitable fluctuations of a random process.”

And that is how I came to see that the one-point victory that Arlington achieved in the SATs in 1979 was not the school board and superintendent efforts but an instance of luck and the statistical chances embedded in regression toward the mean.


Filed under school leaders, school reform policies

Accountability in Action–Cartoons

More than any other word, “accountability” has become the keyword defining the past quarter-century in private and public sectors of life in America.  Presidents, governors, and mayors say that they answer to voters. CEOs and top managers proudly display their accountability to their boards of trustees. Small and mid-size owners of companies know that they are accountable to their customers. Appointed leaders and bureaucrats point to the outcomes they must meet in their evaluations. Or pay the consequences. So let’s call these political, market, and bureaucratic forms of accountability.

Anyone in K-12 or higher education knows that accountability is (and has been for decades) the magic word that opens doors for aspiring leaders and shows the exit to low-performing employees. For these institutions, “accountability imposes six demands” on educators at all levels that overlap these different versions of the accountability pervasive in the U.S.

“First, they must demonstrate that they have used their powers properly. Second,
they must show that they are working to achieve the mission or priorities set for their office or organization. Third, they must report on their performance, for ‘power is opaque, accountability is public’ … Fourth, the two “E” words of public stewardship—efficiency and effectiveness—require accounting ‘for the resources they use and the outcomes they create….’ Fifth, they must ensure the quality of the programs and services produced. Last, but far from least, they must show that they serve public needs.”

There are, then, political, market, and bureaucratic forms of  accountability across private and public sectors in the U.S. including  K-12 education.  Schools are political inventions approved by voters and taxpayers charged to carry out national and individual goals; with parental choice readily available a version of customers buying in a market economy has developed in U.S. schooling, and, well, for bureaucratic accountability, K-12 schools in urban, suburban, and rural districts are hierarchical, rule driven, and constantly reporting to superiors as well as being evaluated.

I found a sampling of cartoons that illustrate humorously and, at times, harshly, various features of accountability across public and private institutions.




If readers come across other cartoons that cause chuckles or pinch (or both) on the different forms of accountability, please send them along.


Filed under Reforming schools