Here is a story about test scores. I was superintendent of the Arlington (VA) public schools between 1974-1981. In 1979 something happened that both startled me and gave me insight into the public power of test scores. The larger lesson, however, came years after I left the superintendency when I began to understand the powerful drive that we have to explain something, anything, by supplying a cause, any cause, just to make sense of what occurred.
In Arlington then, the school board and I were responsible for a district that had declined in population (from 20,000 students to 15,000) and had become increasingly minority (from 15 percent to 30). The public sense that the district was in free-fall decline, we felt, could be arrested by concentrating on academic achievement, critical thinking, expanding the humanities, and improved teaching. After five years, both the board and I felt we were making progress.
State test scores–the coin of the realm in Arlington–at the elementary level climbed consistently each year. The bar charts I presented at press conferences looked like a stairway to the stars and thrilled school board members. When scores were published in local papers, I would admonish the school board to keep in mind that these scores were a very narrow part of what occurred daily in district schools. Moreover, while scores were helpful in identifying problems, they were largely inadequate in assessing individual students and teachers. My admonitions were generally swept aside, gleefully I might add, when scores rose and were printed school-by-school in newspapers. This hunger for numbers left me deeply skeptical about standardized test scores as signs of district effectiveness.
Then along came a Washington Post article in 1979 that showed Arlington to have edged out Fairfax County, an adjacent and far larger district, as having the highest Scholastic Aptitude Test (SAT) scores among eight districts in the metropolitan area (yeah, I know it was by one point but when test scores determine winners and losers in a horserace, Arlington had won by a nose).
I knew that SAT results had nothing whatsoever to do with how our schools performed. It was a national standardized instrument to predict college performance of individual students; it was not constructed to assess district effectiveness. I also knew that the test had little to do with what Arlington teachers taught. I told that to the school board publicly and anyone else who asked about the SATs.
Nonetheless, the Post article with the box-score of test results produced more personal praise, more testimonials to my effectiveness as a superintendent, and, I believe, more acceptance of the school board’s policies than any single act during the seven years I served. People saw the actions of the Arlington school board and superintendent as having caused those SAT scores to outstrip other Washington area districts.
That is what I remember about the test scores in Arlington and that Post article in 1979.
Since then, I have learned about “regression toward the mean.” It was an eye-opener. Here’s a psychologist who defines regression toward the mean as “random fluctuations in the quality of performance” meaning that both luck and skill are involved but randomness is the key.
In sports, examples of this statistical concept are those athletes whose rookie year is outstanding and then they slump in their second year; best selling debut novelists write a subsequent one that tanks; hot TV shows soar in their initial season and then get low ratings the next year. They “regress to the mean” or average.
Another example from Wikipedia:
“A class of students takes two editions of the same test on two successive days….[T]he worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because student scores are determined in part by underlying ability and in part by chance. For the first test, some will be lucky, and score more than their ability, and some will be unlucky and score less than their ability. Some of the lucky students on the first test will be lucky again on the second test, but more of them will have (for them) average or below average scores. Therefore a student who was lucky on the first test is more likely to have a worse score on the second test than a better score. Similarly, students who score less than the mean on the first test will tend to see their scores increase on the second test.”
Because our mind loves causal explanations, we say that those students, those athletes, those novelists performed well and then had a bad year because their smarts and skills deteriorated. Instead of realizing and acknowledging that with regression toward the mean, good performance is usually followed by poor performance (and vice versa) not because of talent and skill failing but because of luck and the “inevitable fluctuations of a random process.”
And that is how I came to see that the one-point victory that Arlington achieved in the SATs in 1979 was not the school board and superintendent efforts but an instance of luck and the statistical chances embedded in regression toward the mean.