Speech Technology Has a Place in Ed Tech and English Language Learning
Speaking skills are a critical component of English-language proficiency and the academic success of non-native speakers in U.S. public schools. Yet, challenges across k-12 education, like teacher and resource shortages, make it difficult to provide English learners the supports they need to improve their speaking proficiency.
A February report from the U.S. Department of Education indicated that there are an estimated 4.9 million children in U.S. public schools learning the English language, an increase of more than 1 million students since 2000. The growth of this population underscores the importance of using high-quality, research-backed, and rigorous education technology to support learners of English. Recent, dramatic improvements in the performance of speech technologies, such as automatic speech recognition, pronunciation error detection, and speech scoring, enable the development and optimization of effective digital learning tools to help this growing population improve their English speaking skills.
The COVID-19 pandemic further challenged education, but a surprising silver lining emerged when the shift to remote learning accelerated a test pilot for a range of emerging ed tech tools. Given that children learning English face additional challenges with remote learning, the release of promising technology that supports the development of communication skills is critically important and timely. These novel digital learning offerings include several new applications that focus on a wide range of skills development, including speaking. However, the rapid deployment of these solutions sparks questions about their development and efficacy, such as: Were these tools previously user-tested? Were they developed based on research-backed principles? Is there evidence that the speech processing tools provide a valid method for improving speaking skills?
Questions about the validity and efficacy of novel ed tech tools for improving English-speaking proficiency might be difficult to answer definitively, but several applications that currently exist demonstrate the promise of this approach. Advancements in speech technology in education developed into effective digital solutions with users and have enabled learners to choose topics of interest, record spoken responses of 30 seconds or more, and then within seconds receive personalized feedback based on their responses, using their mobile devices at their convenience. The successful marriage of speech assessment technology, and actionable feedback allows learners to gauge and improve their skills anytime, anywhere.
Speech-enabled language learning technology allows users to practice their speaking skills in lower-stress environments, which can help boost their confidence without the anxiety brought on by making mistakes in the presence of others. Applications that provide automated feedback enable learners to see how they can improve their speaking proficiency, even when teachers do not have the bandwidth to provide direct support. The types of feedback provided to learners can offer detailed information about a range of aspects of English speaking proficiency. For example, the application can provide a transcript of learners' spoken responses using automated speech recognition and highlight specific portions of the responses that could be improved, such as unnaturally long pauses, pronunciation errors, grammar errors, issues with vocabulary collocations (e.g., saying strong computer instead of powerful computer), overuse of specific words, and other elements. Automated speech scoring technology can provide a numerical score for learners' responses to be compared against a reference population of English learners at different levels to see how individual learners are progressing toward their language learning goals.
Speech-enabled language learning applications can also be adaptive and personalized to the specific needs of individual learners. Using information about their strengths and weaknesses based on previous responses, systems can immediately provide new speaking activities, examples, and instructional materials that can help learners contextualize the feedback and improve more quickly. For example, in addition to learning which aspects need work (and where users excelled), learners can also see relevant content for the topics they chose, including vocabulary examples, potential ideas to include in future responses, and model responses to the same topics from both native and non-native English speakers.
The number of English learners is likely to continue to grow across the country, and the need for effective ed tech solutions to support those learners to improve their speaking skills is critical. Although the rapid shift to remote learning brought on by the COVID-19 pandemic has accelerated the deployment of novel solutions, the importance of collecting evidence of the effectiveness and efficacy of these tools remains.
Speech-enabled English learning applications, especially those that leverage recent advancements in speech technology, have the potential to enable English learners to receive targeted, personalized, and actionable feedback on their English proficiency. As solutions continue to be adopted and implemented more widely, it is necessary for researchers, product developers, policymakers, and educators to collect the evidence that is needed to demonstrate which approaches are most beneficial for helping English learners improve their skills as efficiently and effectively as possible.
G. Tanner Jackson is director of the Research Foundry in Research and Development at ETS. Keelan Evanini is director of speech research in Research and Development at ETS.