Throwback (2006): The evidence-based way to interpret standardized language test scores

When you give a standardized language test to a child, how do you interpret his or her standard score to decide if a disorder is present, or if services should be offered? Is the magic number 85 (1 standard deviation below the mean)? Does your district or state give you some other number to use? Back in 2006, Spaulding et al. schooled us all by explaining that it depends on the test. As in, there is no universal magic number. Instead, we need more information to determine an appropriate cutoff score for each test. Using the wrong number has serious implications: “If the cutoff criteria do not match the test selected, typically developing children may be misdiagnosed as language impaired, and children with language impairments [language disorder] may go undiagnosed.”


The authors walk us through how to find and interpret values for sensitivity and specificity at a specific cutoff score listed in the test’s manual or in a research article. Remember, sensitivity tells us the percentage of children who have language disorder who were correctly identified by the test as having language disorder. Specificity tells us the percentage of children who are typically developing who were correctly identified by the test as typically developing. Ideally, both should be 80% or higher. If a test lists sensitivity and specificity values for multiple cutoff scores, we use the one that balances both. If sensitivity and specificity aren’t listed, we can look at the mean group difference between the groups with and without language disorder, but this evidence is not as strong.

The authors listed sensitivity and specificity values and cutoff scores for the tests they reviewed (see Table 4). The bad news is that many of the language tests reviewed by Spaulding et al. have been updated since this article was published. This means that we might have to dive into a test manual or the literature to find sensitivity and specificity values for a newer test like the CELF-5 (start with Table 10 in this recent article). 

Even if sensitivity and specificity are sufficient, we still need to make sure that the test had adequate reliability and validity and that the normative sample of the test included children like the client we’re testing (considering things like dialect and SES). The authors say that evaluating these things is important, but isn’t worth our time if sensitivity and specificity are lacking (see Figure 6 for a decision tree).

Overall, this article is a good reminder of the potential pitfalls of making diagnostic decisions based on standardized tests alone. It’s also a good one to have in your pocket if you want to challenge your state’s or district’s policy for standardized test cutoff scores.


Spaulding, T. J., Plante, E., & Farinella, K. A. (2006). Eligibility criteria for language impairment: Is the low end of normal always appropriate?. Language, Speech, and Hearing Services in Schools, 37(1), 61–72. doi: 10.1044/0161-1461(2006/007).