Assessment of Developmental Language Disorder

Before beginning:

1) Update your reporting information with us. This is required for reporting ASHA CEUs, but only needs done once annually:

2) This is an online, text-based course, with printable and audio access options. Select alternate access (optional) if you wish:

Also, note that:

  • You must take a quiz and survey at the end. These are required for reporting ASHA CEUs. Buttons are at the bottom of this page.

  • Also, this course contains links out to other websites. The links are provided for deeper reading, but are not required to fully understand and complete this course. If the links are distracting to you, don’t click; if you like the extra information, enjoy!


It’s one of the most fundamental questions we ask in our practices: which kids—of out this class, this school, this group coming through your clinic doors—have a language disorder? Which kids should be receiving our services? (Similar questions, but not always the same answer…) Straightforward questions, but given diverse populations, imperfect assessments, and restrictive policies from the powers-that-be, answering them can be confusing. And considering the implications of getting the answers wrong (withholding needed services, or kids getting wrongly labeled with a disability), it’s super important for us to understand the research. In this course, we’ve put together some of our favorite research reviews on assessment for language disorders. The focus is on published language tests and screeners, since that’s what a lot of us are relying on. For research specific to language sampling (the gold standard for assessment!), check out our course devoted to that subject.

The first review discusses the shift to the Developmental Language Disorder (DLD) term, away from Specific Language Impairment or other terms you might have learned or used in the past. Another four reviews discuss possible methods for screening school-aged children for language disorders, including some ideas that would save tons of time (woo!), like screening in groups or via teacher questionnaires.

Next, we share some research that should be required reading for anyone using standardized tests to diagnose language disorders, so, probably 95% of pediatric SLPs? We know that we have to consider diagnostic accuracy (AKA sensitivity and specificity), the population in the norming sample, and the other psychometrics of tests (reliability and validity) before we feel confident about them. These reviews dive into some of these issues, and show us why predetermined cutoff scores and cookie-cutter assessment processes can cause trouble.

Beyond general issues with standardized tests, we have to give special consideration when planning and analyzing evaluations for kids who speak non-mainstream dialects or are dual language learners*. The last two reviews dive into evaluating kids from those populations.

*This particular group is the focus of so much great research that we devoted a whole separate course to it, but you’ll get a taste here.


So, it's "developmental language disorder" now?

Typically, when we pull articles to review, we cover them one at a time. However, this topic ended up requiring a bit more than that. The initial paper of interest is a consensus study, which was a summary of recommendations from a panel of clinician and scientist experts on child language disorders (primarily from the U.K., but with representation from several English-speaking countries). The group convened to bring order to the diagnosis and description of child language disorders. Various points are covered in that article, but in this post we'll focus on terminology only, because it has been such a hot topic over the past few years. So, we ask:

For children with language disorders not associated with other conditions (e.g. autism, hearing loss), what is the term used for their language disorder?

In the consensus paper, the terms "Language Disorder" and “Developmental Language Disorder” are used. The recent DSM-5 uses “Language Disorder” as well. Dr. Susan Ebbels wrote a summary statement on the consensus here, indicating that for kids without co-existing conditions, current best practice is to use “Developmental Language Disorder” (big movement on Twitter these days to normalize this term, using #devlangdis). 

When I first noticed people using #devlangdis, it made sense why we’d want consistent terminology. After all, there is one term for autism, one term for dyslexia. Inconsistency in terms used to label developmental language disorder interferes with our ability to advocate for and treat these children. However, I couldn’t quite figure out why we should use *this* term over others. I’m pretty accustomed to using "Specific Language Impairment," and wasn’t particularly interested in parting with it (nor are others—see here). It wasn’t until I read this commentary that I began to understand and get behind #devlangdis (the commentary kicks off a series of papers that provide a more in-depth rationale). And the core arguments are this: a) consistency in terminology is important for action, and b) rationales for using other terms, like “Specific Language Impairment”, are problematic (e.g. SLI implies cognitive exclusionary criteria).

Still not convinced? Try these exercises, and you’ll at least feel the frustration that has motivated the call for consistency:

  • Pretend you’re a parent searching for information on your child’s language disorder. You may start with Wikipedia (spoiler alert: it’s a hot mess). Wikipedia isn't exactly a trustworthy resource, but it does tend to represents knowledge trends. And, as you'll see, there are multiple pages representing language disorders, each using different terminology (e.g. here & here) and pointing to reputable resources that only further confuse the reader. How on earth are parents, students, and non-SLPs supposed to make sense of anything if the same disorder is referred to with several different terms?

  • Now try visiting ASHA's website (!). You’ll find that even ASHA needs some editing. They’re pretty consistent with using “Language Disorder”, but it's still a bit confusing when they have one page labeled “Spoken Language Disorders” and another page labeled “Preschool Language Disorders.”  And within the former, it presents language disorder and SLI as though they’re entirely distinct diagnoses, confusing those who haven’t read up on the topic.

So, time for TISLP to start translating "language impairment" and SLI to DLD (and specifying differences in study samples, as needed)? Alright, then: #devlangdis 

Bishop, D.V.M., Snowling, M.J., Thompson, P.A., & Greenhalgh, T., & CATALISE consortium. (2016) CATALISE: A Multinational and Multidisciplinary Delphi Consensus Study. Identifying Language Impairments in Children. doi: 10.1371/journal.pone.0168066. 

What test do you want 30% of kindergarteners to fail? A language screener

Did you ever add a child to your caseload and think, “Why haven’t I seen this kid sooner?!” You’re not alone. Underidentification of developmental language disorder in young children is a major issue. So, how can we deal with this? One way is to identify good screening tools. Previous research shows that effective language screeners should result in a failure rate close to 30%, meaning that 30% of the children don’t pass, and you’ll capture the children most likely to have a language disorder.

The authors of this study found that probing for past-tense grammar was an effective way to screen for language disorder in kindergarten students. Specifically, they gave a large group of kindergarten students a screener of grammatical tense marking— the Rice Wexler Test of Early Grammatical Impairment (TEGI) Screening Test—which included past tense and third-person singular probes. Only the past-tense probes resulted in a failure rate close to 30%, showing their potential use as an effective screening tool. If children* fail past-tense probes, this is a red flag and tells us that close monitoring or a formal evaluation may be the next appropriate step.

The students were also screened for nonverbal intelligence, articulation, and emergent literacy skills. Interestingly, the children who failed the past-tense probe often had age-appropriate skills in these areas. What does this tell us? We can’t rely on screeners of related skills to identify children at risk for language disorder—we have to screen oral language directly. If we don’t, we may miss kids who fly under the radar due to their relatively stronger articulation or literacy abilities.

Want to know the best part? The TEGI Screening Test is FREE and available here!

*One very important note: the TEGI is only valid for children who speak Standard (Mainstream) American English. Students who speak African American English or Spanish-influenced English should not be screened with this tool. Check out this review for an alternative.

Weiler, B., Schuele, C. M., Feldman, J. I., & Krimm, H. (2018). A multiyear population-based study of kindergarten language screening failure rates using the Rice Wexler Test of Early Grammatical Impairment. Language, Speech, and Hearing Services in Schools. doi: 10.1044/2017_LSHSS-17-0071.

Teacher ratings as a language screening for dialect speakers

In the last review, we shared research on a potentially valid tool to screen Mainstream English-speaking kindergarteners for language disorders. But what about our kiddos who speak other dialects of English, like African American English (AAE) or Southern White English (SWE)? In this study, researchers gave a group of AAE- and SWE-speaking kindergarteners a handful of language and literacy screeners, to see which one(s) could best identify possible language disorders, while avoiding “dialect effects.”

Their most successful screener (and TISLP’s winner for best acronym of the month) was the TROLL, or Teacher Rating of Oral Language and Literacy—available here for free. And yes, that’s a teacher questionnaire, rather than another individually-administered assessment for our students who spend so much time testing already. Importantly, the teachers completed the ratings at the end of the kindergarten year, not the beginning, so they had time to really get to know the students and their abilities.

The researchers calculated a new cut score of 89 for this population, since the TROLL itself only suggests cut scores through age 5. This resulted in sensitivity of 77% for identification of language disorders. Now, 77% isn’t really high enough—we want a minimum of 80 for a good screener. But it may be a starting place until better tools come our way.

Gregory, K. D., & Oetting, J. B. (2018). Classification Accuracy of Teacher Ratings When Screening Nonmainstream English-Speaking Kindergartners for Language Impairment in the Rural South. Language, Speech, and Hearing Services in Schools. doi: 10.1044/2017_LSHSS-17-0045.


Using group screening to find students at risk of DLD and dyslexia

If you work in a school that uses a response to intervention (RtI) framework, you can probably relate to the balancing act associated with screening: you want to use tools that accurately identify students needing additional assessment, but that also make good use of your time and are relatively easy to administer.

What if you could screen a whole class at the same time?

The authors of this study administered two screeners to groups of second graders:

  • The Test of Silent Word Reading Fluency (TOSWRF), to screen for word reading difficulties

  • The Listening Comprehension subtest of the Group Reading Assessment and Diagnostic Evaluation (GRADE LC), to screen for developmental language disorder (DLD)**

The researchers analyzed 381 students’ performance on the screeners as well as additional, individual standardized testing (CELF-4, the Word Identification and Word Attack subtests of the WRMT-III, and the TONI-4). The screeners, in combination, could reliably classify children as being at risk for (a) language disorder, (b) dyslexia, or (c) both, as determined by their scores on the individual assessments. Accuracy was somewhat higher for predicting risk for dyslexia vs. language disorder, which makes some intuitive sense, because the screeners chosen were both geared toward reading. Interestingly, only about a third of the parents of the identified children had reported concerns about their child’s language or reading abilities. We can’t rely on individual referrals to catch everyone!

Although the efficiency of screening groups of students is certainly appealing, it is important to remember we don’t yet know what results the TOSWRF and GRADE LC screeners would yield with children in other age groups or populations. SLPs should be cautious and consider their individual contexts when applying these findings.

**Note: Most of the children in this study were those with Specific Language Impairment (SLI), which is a child with Developmental Language Disorder (DLD) and normal nonverbal intelligence. We use DLD throughout our website for consistency purposes (read more here).

Adlof, S. M., Scoggins, J., Brazendale, A., Babb, S., & Petscher, Y. (2017). Identifying children at risk for language impairment or dyslexia with group-administered measures. Journal of Speech, Language, and Hearing Research. doi: 10.1044/2017_JSLHR-L-16-0473.


Can past tense accuracy during oral reading identify language disorder?

We know that difficulty with verb tense marking can be a good indicator of developmental language disorder (DLD). Errors in spontaneous speech can point to DLD in kids up through 1st grade; for older kids (10–12), you can look at their writing. But what about those middle grades? The authors of this study looked to past tense marking in oral reading. They recorded 21 children (ages 7–10) with DLD** and 30 children with typical language as they read passages from the Woodcock Reading Mastery Tests - 3rd Edition (WRMT-III) and scored their productions of regular past tense verbs for accuracy. They (helpfully!) provided the list of which verbs to score in each WRMT-III passage.

Results showed that children with DLD were less accurate than children with typical language in using regular past tense verbs when reading. Next, the authors looked at whether children’s past tense verb accuracy could correctly classify them as having DLD or typical language. Using 90% accuracy as the cutoff, sensitivity was 86% (so, pretty good for identifying children with DLD) and specificity was 73% (not as good at identifying children with typical language). Note that 80% sensitivity/specificity is generally considered the minimum acceptable level for a diagnostic test, so these numbers aren’t ideal. These results are preliminary, but if someone on the team is already giving the WRMT-III to a 2nd or 3rd grader with language concerns, you might consider adding the past tense accuracy score to your DLD assessment protocol.

**Note: The children in this study were those with Specific Language Impairment (SLI), which is a child with Developmental Language Disorder (DLD) and normal nonverbal intelligence. We use DLD throughout our website for consistency purposes (read more here).

Werfel, K. L., Hendricks, A. E., & Schuele, C. M. (2017). The potential of past tense marking in oral reading as a clinical marker of specific language impairment in school-age children. Journal of Speech, Language, and Hearing Research. doi: 10.1044/2017_JSLHR-L-17-0115.


The evidence-based way to interpret standardized language test scores

When you give a standardized language test to a child, how do you interpret his or her standard score to decide if a disorder is present, or if services should be offered? Is the magic number 85 (1 standard deviation below the mean)? Does your district or state give you some other number to use? Spaulding et al. school us all by explaining that it depends on the test. As in, there is no universal magic number. Instead, we need more information to determine an appropriate cutoff score for each test. Using the wrong number has serious implications: “If the cutoff criteria do not match the test selected, typically developing children may be misdiagnosed as language impaired, and children with language impairments [language disorder] may go undiagnosed.”

The authors walk us through how to find and interpret values for sensitivity and specificity at a specific cutoff score listed in the test’s manual or in a research article. Remember, sensitivity tells us the percentage of children who have language disorder who were correctly identified by the test as having language disorder. Specificity tells us the percentage of children who are typically developing who were correctly identified by the test as typically developing. Ideally, both should be 80% or higher. If a test lists sensitivity and specificity values for multiple cutoff scores, we use the one that balances both. If sensitivity and specificity aren’t listed, we can look at the mean group difference between the groups with and without language disorder, but this evidence is not as strong.

The authors listed sensitivity and specificity values and cutoff scores for the tests they reviewed (see Table 4). The bad news is that many of the language tests reviewed by Spaulding et al. have been updated since this article was published. This means that we might have to dive into a test manual or the literature to find sensitivity and specificity values for a newer test like the CELF-5 (start with Table 10 in this recent article).

Even if sensitivity and specificity are sufficient, we still need to make sure that the test had adequate reliability and validity and that the normative sample of the test included children like the client we’re testing (considering things like dialect and SES). The authors say that evaluating these things is important, but isn’t worth our time if sensitivity and specificity are lacking (see Figure 6 for a decision tree).

Overall, this article is a good reminder of the potential pitfalls of making diagnostic decisions based on standardized tests alone. It’s also a good one to have in your pocket if you want to challenge your state’s or district’s policy for standardized test cutoff scores.

Spaulding, T. J., Plante, E., & Farinella, K. A. (2006). Eligibility criteria for language impairment: Is the low end of normal always appropriate?. Language, Speech, and Hearing Services in Schools. doi: 10.1044/0161-1461(2006/007).


Test norms problematic for high- and low-SES kids


In this study, the authors analyze high-SES and low-SES preschoolers’ language test scores, and demonstrate a big difference in group scores on the Preschool Language Scale—4 and Peabody Picture Vocabulary Tests—3. Now, many of us already recognize that SES predicts child language performance somewhat. However, this isn’t just “somewhat”. And—here’s the problem—our standardized tests are providing us with a single set of norms, meant to represent an entire population (matching US census data), not show us how certain groups (e.g. SES) perform. This study, combined with (many) previous, “indicate that typical cutoff decisions (for speech–language services) using published norms will lead to identification of both (a) a large proportion of children from low SES homes, perhaps as great as 50%, and (b) only a very small proportion of higher SES children, perhaps as little as 1%.” Ooph! That's going to mess with our clinical decision-making.

So, what should we be doing? Many of us are well aware that test norms should be only one factor used in making eligibility decisions. However, our states and school districts are still using firm cutoffs from norm-referenced tests to make qualification decisions. This could clearly put a disproportionate number of lower-SES children in SLPs' therapy rooms who do not have a language disorder, and also under-qualify higher-SES kids.

The authors call for test publishers to start providing us with sub-sample norms, so that a child’s age, grade, and SES could be considered when making peer comparisons. The authors suggest that while SLPs wait for test publishers to provide us with more useful normative data, our role is to simply make sure we understand the drawbacks of weighing test norms too heavily for certain groups of children. The authors state that there is an “… absolutely critical need for SLPs to consider family SES in the interpretation of child performance on norm-referenced measures of oral language when making eligibility decisions…” 

NOTE: This article would be great for SLP or SPED group reading. It's well-written, easy to read, short(ish), and has some fantastic discussion points, beyond what’s summarized above.

Abel, A.D., Schuele, C.M., Arndt, K.B., Lee, M.W., Blankenship, K.G. (2016). Another look at the influence of maternal education on preschoolers' performance on two norm-referenced measures. Communication Disorders Quarterly. doi:10.1177/1525740116679886.


Reviewing the validity and reliability of comprehensive language assessments: Or, which test is best?


Many factors influence which tests SLPs use when we evaluate language skills. What tests do I have? What will capture the skill gaps that I see in this client? What test will show progress after intervention? What test do I not hate giving? Just kidding… Evidence-based practice requires us to ask also: which tests have been shown empirically to be good tests – meaning that they actually measure what we think they are measuring? This study looks at this last question by taking a systematic approach to finding and evaluating evidence of reliability and validity for 15 language tests. Importantly, the authors looked at evidence from peer-reviewed papers in addition to the stuff in the front of the test manuals. The tests they selected were all recent (20 years old at most), diagnostic, comprehensive spoken language assessments normed on monolingual English-speaking children between 4 and 12 years old. Check out Tables 5 and 6 for the full lowdown on what tests they included and excluded, respectively.**

Reliability and Validity

Bear with us for a brief journey back in time to your grad (or even undergrad) Assessment class. Imagine a hazy pink dream sequence and harp music if it helps. This study looked into six dimensions of reliability (how stable and consistent the test scores are) and validity (whether the test measures what it claims to be measuring). Let’s take a moment and remember what these actually mean:

  1. Internal consistency – Do you get similar answers to similar questions?

  2. Reliability – Can you repeat the test and get the same score?

  3. Measurement error – How much might the score you measure vary from the “true score?”

  4. Content validity – Is the test actually measuring all of the content it’s supposed to be? Think of a final exam covering the entire semester.

  5. Structural Validity – How well does the test (e.g. an IQ test) measure what it’s supposed to be measuring (intelligence)?

  6. Hypothesis Testing – Can you make predictions based on some theory, and have them come out in the results of the test? Think of correlations between scores on two similar tests.

Check out Table 9 for a summary of the level of evidence the authors found in each area for the 15 targeted assessments. Because of issues with study methodologies, the authors found no compelling evidence of internal consistency, measurement error, or structural validity in ANY of the tests. Yikes. If there’s a test you give regularly, or one you have concerns about, it’s worth knowing specific strengths and weaknesses of that test.

So… which tests have the best evidence base?

“Whilst all assessments were identified as having notable limitations, four assessments: ALL, CELF-5, CELF:P-2, and PLS-5 were identified as currently having better evidence of reliability and validity. These four assessments are suggested for diagnostic use, provided they suit the purpose of the assessment process and are appropriate for the population being assessed.”

A few things to keep in mind

  • The authors are clear that, “…it should be noted that where evidence is reported as lacking, it does not mean that these assessments are not valid or reliable, but rather that further research is required to determine psychometric quality.”

  • As always, consider where the evidence is coming from. Most of the sources for reliability and validity data are the test manuals themselves. (And when the authors found independent sources of evidence, they didn’t always agree with the manuals.) The stuff in the manual is NOT peer reviewed, and you can only see it after you pony up for the test. This is not to say that it’s necessarily bad science, but we always want converging evidence from independent sources when possible.

  • ALL of the tests the authors looked at were found to have “limitations with regards to evidence of psychometric quality.” Meaning, there’s still a lot of work to be done. In the meantime, keep following best practices for evaluations. Don’t base a diagnosis or eligibility decision on a single test, and use other evaluation tools (language samples, dynamic assessment, interviews, RTI… all that good stuff) in addition to standardized testing.

**Important note. This review did NOT look at assessments published since 2014. This includes the CASL-2 and the TILLS.

Denman, D., Speyer, R., Munro, N., Pearce, W.M., Chen, Y., & Cordier, R. (2017). Psychometric Properties of Language Assessments for Children Aged 4-12 Years: A Systematic Review. Frontiers in Psychology. doi: 10.3389/fpsyg.2017.01515.


Modification to standardized tests for speakers of nonmainstream dialects


The authors of this paper discuss how, when an SLP evaluates a young speaker of a nonmainstream American English dialect (NMAE), s/he is faced with two tasks: first, to determine if the child is a speaker of a nonmainstream dialect, and then to determine if that child does or does not have a language disorder.

Though the task may seem straightforward at first glance, it can be incredibly challenging. One major barrier is that children use NMAE variably. Conversational contexts are more likely to elicit NMAE use, then use can also change per communication partner. Dialect use also changes with age; the authors state, “… the general trend is that use of NMAE features drops during the first few years of elementary school as students master code-switching strategies, and then increases during adolescence as students begin using NMAE dialect for more social reasons (N.P. Terry et al., 2010; Van Hofwegen & Wolfram, 2010).” This variability is challenging. Then, the overlap between what’s considered ungrammatical in mainstream American English and grammatical in NMAE makes it all the more challenging.

As part of the evaluation process, SLPs may choose to use a combination of language sample analysis (LSA) with standardized testing. An adjustment that is often made to the standardized test in order to account for the child’s dialect is to apply scoring modifications—that is, count an item on a test as “accurate” if it’s accurate per the child’s dialect. And this is in-line with what is recommended within testing manuals, e.g. per the CELF-4 and CELF-5.

In this study, the researchers examined what happens when you try using scoring modifications on the CELF-4 with a sample of 299 2nd-grade students. They found that:

  • without scoring modifications, NMAE speakers were over-identified as having a language disorder

  • but with scoring modifications, the over-identification of children as having a language disorder was improved, but the under-identification of NMAE speakers who do truly have a language disorder also increased

Yikes. It’s well-known that using a standardized language assessment for a speaker of a nonmainstream dialect, when the test wasn’t designed with speakers of that dialect in mind, can provide inaccurate diagnostic results (see article for review). However, this study also provides clear data that the scoring modifications don’t exactly work well, either.

Currently, there isn’t a perfect solution. For now, it’s important for SLPs to simply understand the potential pitfalls they may encounter during assessment. The authors suggest that good options to add to the assessment protocol include: detailed case histories of the child’s abilities at both home and school, peer comparisons, LSA, and dynamic assessment. The authors acknowledge the huge need for more research on how to streamline this process, because even with some of the strategies that look promising (like dynamic assessment), we still don’t have adequate research to fully guide diagnostic decision-making.

Hendricks, A.E., & Adlof, S.M. (2017). Language Assessment With Children Who Speak Nonmainstream Dialects: Examining the Effects of Scoring Modifications in Norm-Referenced Assessment. Language, Speech, and Hearing Services in Schools. doi:10.1044/2017_LSHSS-16-0060.


Diagnosing developmental language disorder in English learners

Your Mission:

  • Accurately identify ELL children with language impairments

DANGER! Potential Hazards!

  • “Waiting and seeing” so long you lose the benefits of early intervention

  • Over-referral of typical English learners to Special Ed

Fortunately, there are a decent number of assessments around for Spanish-English bilinguals, but for the one million kids out there with less common home languages? Yeah, pretty much nothing. Until the day we have a Tagalog–English CELF, we have to improvise.

These researchers wondered whether they could discriminate typical vs. language-impaired ELL children based on a handful of English-only assessments and a parent questionnaire, which asked about development of the L1 and asked parents to compare their child to other kids they know. The kids were around kindergarten age, from immigrant families with a diverse assortment of home languages, and hadn’t had regular English exposure before age three.

They were able to reach over 90% diagnostic accuracy with a combination of their parent questionnaire (the most important factor by far), tests of nonword repetition and tense morphology (from the CTOPP and TEGI, respectively, which together made a smaller but still important contribution), and a narrative task (the ENNI, which was the least important factor). They also gave the PPVT, but that wasn’t helpful. This makes sense, because we already know it’s not good at diagnosing DLD in anybody.

Bottom Line: You can differentiate young ELL children with language disorder from their typically-developing ELL peers IF…

  1. You get good input from the parent on L1 development

  2. You test the skills that are known to be hard for ALL kids with DLD (like nonword repetition and tense morphology). Don’t rely on tests of single-word vocabulary.

  3. You compare ELL kids to one another, not the monolingual norming sample of most assessments. See if ELL norms are available for the tests you use!

Paradis, J., Schneider, P., & Sorenson, T. (2013). Discriminating Children with Language Impairment Among English-Language Learners from Diverse First-Language Backgrounds. Journal of Speech, Language, and Hearing Research. doi: 10.1044/1092-4388(2012/12-0050).

Thank you for taking this course!

To earn continuing education credit, you must take this quiz (score 80% to pass; three attempts allowed):

Did you update your reporting info (top) and pass your quiz (above)? Then, congrats, you’re done! Please fill out a feedback survey:

Did you see a journal article you want (above), but don’t have access? Try our tips for finding free PDFs.

Need help? Contact us at