Three Important Distinctions In How We Talk About Test Scores (Matt DiCarlo)

“Matthew Di Carlo is a senior fellow at the non-profit Albert Shanker Institute in Washington, D.C. His current research focuses mostly on education policy, but he is also interested in social stratification, work and occupations, and political attitudes/behavior.”  The post appeared May 25, 2012

In education discussions and articles, people (myself included) often say “achievement” when referring to test scores, or “student learning” when talking about changes in those scores. These words reflect implicit judgments to some degree (e.g., that the test scores actually measure learning or achievement). Every once in a while, it’s useful to remind ourselves that scores from even the best student assessments are imperfect measures of learning. But this is so widely understood – certainly in the education policy world, and I would say among the public as well – that the euphemisms are generally tolerated.

And then there are a few common terms or phrases that, in my personal opinion, are not so harmless. I’d like to quickly discuss three of them (all of which I’ve talked about before). All three appear many times every day in newspapers, blogs, and regular discussions. To criticize their use may seem like semantic nitpicking to some people, but I would argue that these distinctions are substantively important and may not be so widely-acknowledged, especially among people who aren’t heavily engaged in education policy (e.g., average newspaper readers).

So, here they are, in no particular order.

In virtually all public testing data, trends in performance are not “gains” or “progress.” When you tell the public that a school or district’s students made “gains” or “progress,” you’re clearly implying that there was improvement. But you can’t measure improvement unless you have at least two data points for the same students – i.e., test scores in one year are compared with those in previous years. If you’re tracking the average height of your tomato plants, and the shortest one dies overnight, you wouldn’t say that there had been “progress” or “gains,” just because the average height of your plants suddenly increased.

Similarly, almost all testing trend data that are available to the public don’t actually follow the same set of students over time (i.e., they are cross-sectional). In some cases, such as NAEP, you’re comparing a sample of fourth and eighth graders in one year with a different cohort of fourth and eighth graders two years earlier. In other cases, such as the results of state tests across an entire school, there’s more overlap – many students remain in the sample between years – but there’s also a lot of churn. In addition to student mobility within and across districts, which isoften high and certainly non-random, students at the highest tested grade leave the schools (unless they’re held back), while whole new cohorts of students enter the samples at the lowest tested grade (in middle schools serving grades seven and eight, this means that half the sample turns over every year).

So, whether it’s NAEP or state tests, you’re comparing two different groups of students over time. Often, those differences cannot be captured by standard education variables (e.g., lunch program eligibility), but are large enough to affect the results, especially in smaller schools (smaller samples are more prone to sampling error). Calling the differences between years “gains/progress” or “losses” therefore gives a false impression; at least in part, they are neither – reflecting nothing more than variations between the cohorts being compared.

 Proficiency rates are not “scores.” Proficiency or other cutpoint-based rates (e.g., percent advanced) are one huge step removed from test scores. They indicate how many students scored above a certain line. The choice of this line can be somewhat arbitrary, reflecting value judgments and, often, political considerations as to the definition of “proficient” or “advanced.” Without question, the rates are an accessible way to summarize the actual scale scores, which aren’t very meaningful to most people. But they are interpretations of scores, and severely limited ones at that.*

Rates can vary widely, using the exact same set of scores, depending on where the bar is set. In addition, all these rates tell you is whether students were above or below the designated line – not how far above it or below it they might be. Thus, the actual test scores of two groups of students might be very different even though they have the same proficiency ranking, and scores and rates can move in opposite directions between years.

To mitigate the risk of misinterpretation, comparisons of proficiency rates (whether between schools/districts or over time) should be accompanied by comparisons of average scale scores whenever possible. At the very least, the two should not be conflated.**

Schools with high average test scores are not necessarily “high-performing,” while schools with lower scores are not necessarily “low-performing.” As we all know, tests don’t measure the performance of schools. They measure (however imperfectly) the performance of students. One can of course use student performance to assess that of schools, but not with simple average scores.

Roughly speaking, you might define a high-performing school as one that provides high-quality instruction. Raw average test scores by themselves can’t tell you about that, since the scores also reflect starting points over which schools have no control, and you can’t separate the progress (school effect) from the starting points. For example, even the most effective school, providing the best instruction and generating large gains, might still have relatively low scores due to nothing more than the fact the students it serves have low scores upon entry, and they only attend the schools for a few years at most. Conversely, schools with very high scores might provide poor instruction, simply maintaining (or even decreasing) the already stellar performance levels of the students it serves.

We very clearly recognize this reality in how we evaluate teachers. We would never judge teachers’ performance based on how highly their students score at the end of the year, because some teachers’ students were higher-scoring than others’ at the beginning of the year.

Instead, to the degree that school (and teacher) effectiveness can be assessed using testing data, doing so requires growth measures, as these gauge (albeit imprecisely) whether students are making progress, independent of where they started out and other confounding factors. There’s a big difference between a high-performing school and a school that serves high-performing students; it’s important not to confuse them.


* Although this doesn’t affect the point about the distinction between scores and rates, it’s fair to argue that scale scores also reflect value judgments and interpretations, as the process by which they are calculated is laden with assumptions – e.g., about the comparability of content on different tests.

** Average scores, of course, also have their strengths and weaknesses. Like all summary statistics, they hide a lot of the variation. And, unlike rates, they don’t provide much indication as to whether the score is “high” or “low” by some absolute standard (thus making them very difficult to interpret), and they are usually not comparable between grades. But they are a better measure of the performance of the “typical student,” and as such are critical for a more complete portrayal of testing results, especially viewed over time.



Filed under testing

18 responses to “Three Important Distinctions In How We Talk About Test Scores (Matt DiCarlo)

  1. Iaviator

    Thoughtful and well-written commentary.

    Our policy makers stress the need for math and science yet continue to abuse the numbers we attach to tests in spite of efforts to suggest otherwise.

    Seems to me the situation has gotten worse in the last 20 years due in part to a plague of economists who decided to use education as their play pen. Why can’t economists be content with messing up the economy rather than inflicting their statistical voodoo on us?

  2. Pingback: Three Important Distinctions In How We Talk About Test Scores (Matt DiCarlo) | CCSS News Curated by Core2Class |

  3. Pingback: Three Important Distinctions In How We Talk About Test Scores Matt DiCarlo | Larry Cuban on School Reform and Classroom Practice « The Sharing Tree

  4. Jeff Bowen

    Very clear and helpful. I suggest that additional clarifications might include the meanings of “teaching to the test”, grading, and the distinction between testing and assessment. One of the more damaging and deceptive measures of “progress” or “achievement” is the normal curve.

  5. I’m not so sure the euphemisms “achievement” and “student learning” are as harmless as you suggest, especially if their use is tied to test scores as almost every evaluative term around learning is. I would argue that few parents think very critically around test results in terms of how they are “imperfect measures of learning” and all the implications that carries with it. In fact, most policy makers now in the process of changing laws and policy around school “reform” are counting on a simplistic connection between scores and achievement.

  6. As Will Richardson’s comment suggests, there is a gulf between what most parents actively interested in their children’s education think about testing, and what most policy makers (and educators politically influenced by them) think.

    Competition is fundamental to effective “schooling” (note I didn’t use “learning”) and far more parents are both comfortable with this and approving of it, than policy makers. The latter are in fact, often ideologically set on undermining it.

    A new book by Ian Robertson, Professor of Psychology at Trinity College, Dublin, “The Winner Effect,” looks at how winning effects us positively and brings with it more success. Good schools and parents have always known this and build it into their family and school cultures meaningfully. Poor teachers understand this but turn it into vacuous praise in the classroom. That very weakness found its way into the exam system where every possible manoeuvre has been employed in the last few decades to get as many students as possible to win. The insidious “all must have prizes” idea which is now being unpicked by the current administration here in the UK who are overhauling the entire school exam system in search of more rigour.

    And one sad reminder. There are very many parents who are not at all actively interested in their children’s education, before, when or after they attend school.

    • larrycuban

      Again, Joe, thanks for commenting. The Michael Gove speech you linked to illustrates well the thrust of school reform in UK. Polls in the U.S. show that parents–in general–support national standards, testing, and accountability regulations.

  7. Pingback: Three Important Distinctions In How We Talk About Test Scores (Matt DiCarlo) | Ripples |

  8. Cal

    ” There are very many parents who are not at all actively interested in their children’s education, before, when or after they attend school.”

    Ira Lit (Stanford) examined a voluntary desegregation program and its effects on the children in Bus Kids. When I read this sentence above I remembered something he wrote about the low income parents who signed their kids up for the”Canford” program in “Arbor Town”:

    “Even though the parents may not be making an intricately considered choice (to participate in the program), it is an active one. The process of enrolling in the Canford Program is not simple. Parents need to fill out paperwork in a timely manner, attend informational meetings, and follow through with school district procedures. These are parents who are clearly committed to, motivated about, and actively participating in the educational experiences of their children. This point is particularly relevant in light of some of the Arbor Town teachers’ comments regarding parental support and school participation. Canford parents are often not as actively and directly involved at the school as their Arbor Town counterparts, and some teachers read this fact as an indication that the parents ar disinterested in their children’s education. This stereotype is disproved by the process through which the parents enrolled their children in the Canford Program in the first place.”

    I’ve always thought parental involvement to be an overrated aspect of education myself, but whenever I read or hear someone talk about how the parents “don’t care”, I remember this passage. And inevitably, when I look closer at my own students, the ones whose parents I never seem to hear from or whose kids play truant or are wild in class, I find that the parents do care, even if they aren’t handling the situation in the way I’d prefer.

    Not that this has much to do with the larger topic of measuring achievement or student learning, so apologies for being off-topic.

    I thought the distinctions discussed were important and well-defined.

    • larrycuban

      Thanks, Cal, for the comment on parents.

    • You are exactly right in arguing that the parents do care. The problem is much more complex, having to do both with economic and social stressors (long hours, health problems, family issues) as well as parent levels of human capital.

      The latter is really complex, and which we usually use “education level” as a socio-economic proxy for. But it is basically the ability of a parent to provide a cognitively stimulating environment to the child, in the form of conversations and activities that involve higher-order thinking. It is also an emphasis on this type of thinking as a general, normative value. Lareau has done a lot of research on how this is a function of class. When I taught kindergarten in a poor, immigrant population, the parents all cared enormously. Yet their parenting style and ability had mostly not prepared the child very well for the academic rigor of the classroom environment.

      That said, we also can’t discount the effects of economic segregation. Poor communities have a large net deficit of societal and human capital. For instance, a family could be doing everything right, but be forced to work long hours, and leave a child in the care of elder siblings in an apartment complex where the rate of family dysfunction is going to be higher, and thus peer norms will be pulled down.

      Finally, there are certainly higher rates in poor communities of severely damaged and dysfunctional families. I currently teach at a continuation school where the parents are often as troubled as the students – many involved in gang activity themselves. I’ve heard of plenty horror stories of drug abuse, fighting, incarceration, etc. among parents and relatives. These too, of course have origins in SES disadvantage, but we must not kid ourselves that there aren’t a lot of really screwed up families in poor communities.

      • larrycuban

        Thanks, Eli, for parsing what “parental caring” means and the correlates that affect how the caring occurs both at home and at school.

  9. I absolutely acknowledge what Cal and Eli have said about some socially disadvantaged parents caring. One of my key complaints about so much political interference in education, is the promulgation of the Great Dickensian Lie…that poverty inevitably undermines moral behaviour. So I wouldn’t for a moment accept that dysfunctional families are in some inevitable way, linked to class. I’ve met plenty of dysfunctionality at both ends of the social and economic spectrum.

    I just think we do ourselves no favours by ignoring the reality that there are families whose approach to schooling is a major issue for the child and their chances of learning. A comment by the head teacher of a UK school I’ve worked with, should illustrate what I mean. She told me she had designed the entire school day to be as long as possible, including breakfast clubs and after schools clubs, specifically with the goal of keeping the pupils out of their homes for as long as possible. Her ambition was to create a boarding school because that was the best chance for them.

    And a less serious, but nonetheless apposite example. It’s not unusual (in the UK at least) to find schools organise their parents’ evenings around the TV soccer schedule. That’s the only way they can expect a turnout.

    • larrycuban

      In the U.S., the KIPP model is close to what your head teacher said about the extended school day. Different beliefs about the effects of poverty drive different models of schooling when it comes to parents and, in some instances, parenting where schools hold classes–at both ends of the socioeconomic spectrum.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s