Yao, He, Chen, Zhu (2024) A Meta-Analysis of Second Language Phonetic Training
- Kathleen Brannen
- Jul 3, 2025
- 4 min read
Updated: Feb 27
A synopsis of the article:
Phonetic training
This meta-analysis seeks to answer these questions:
What is the effect of L2 phonetic training?
What factors moderate the effectiveness of L2 phonetic training? (education, proficiency, training approach, training stimuli, mode of delivery, phonetic subcompetence).
The meta-analysis looked at studies that met the following criteria. The study had to:
be an empirical phonetic training study;
look at L2 segments, not suprasegmentals;
report on accuracy and/or response time of participants' perception and/or production;
look at second or foreign language phonetic training;
have participants without speech, language, or hearing impairments.
Based on these criteria, the meta-analysis included 65 studies.
Data analyses were performed using Comprehensive Meta-analysis Version 3.0 (Biostat Inc.) Cohen's d was used to compare effect sizes treatment groups with control groups. To assess publication bias, Fail-Safe N and Trim and Fill analyses were conducted.
Results
The effect sizes between groups with phonetic training versus groups without was large (Cohen's d = 0.762), indicating that there is a larger difference between the groups relative to the variability in the data.
Moderator analyses were conducted to explore potential factors influencing the effects of L2 phonetic training. Results found highly significant differences (p < .001) in educational levels (university, language institute, high school, pre-middle school, unspecified), training approach (perceptual, production, combined), mode of delivery (auditory, visual, audiovisual), outcome measure ( identification, discrimination, both identification and discrimination, subjective perception judgement, objective acoustic measurement), and phonetic sub-competence (perception, production). Significant differences (p < .05) were found for training stimuli (natural, synthetic, combined). Language proficiency (advanced, intermediate, novice, unspecified) was not significant.
Educational levels: The largest effects were observed at the high school level.
Training approach: Perceptual training yielded the largest effect size.
Mode of delivery: The audiovisual mode yielded the largest effect size.
Outcome measure: Identification tasks generated the largest effect size.
Phonetic sub-competence: Perception yielded the largest effect size.
Training stimuli: Synthetic stimuli yielded the largest effect size.
Outcome measure: Identification tasks yielded the largest effect.
Phonetic Subcompetence: Perception yielded larger gains than production.
Generalizaton of Phonetic Training: Production training yielded a larger effect size than perceptual training.
Discussion
Phonetic training is helpful.
High school L2 learners performed better than university L2 learners. The authors suggest this may be to the younger brain being more plastic. However, pre-middle school learners performed worse than the high school learners. The authors suggest this may be due to task difficulties related to immature phonemic awareness.
Perceptual training transferred to production more than production training transferred to perception. This supports the hypothesis that perception is a precursor to production.
A combination of discrimination and identification tasks is more effective than each on its own.
Here are some questions I had after reading this article.
Do you have any questions or comments? Drop them below in the comments section.
What type of phonetic training was used in the studies used in this meta-analysis?
Some examples of phonetic training in studies examined in this meta-analysis:
*Bradlow, A. R., Akahane-Yamada, R., Pisoni, D. B., & Tohkura, Y. (1999). Training Japanese listeners to identify English /r/ and /l/: Long-term retention of learning in perception and production. Perception and Psychophysics, 61, 977–985. https://doi.org/10.3758/BF03206911
minimal-pair identification task
to encourage classification into broad phonetic categories rather than emphasizing the discrimination of fine-grained within-category acoustic differences
stimuli were naturally produced in a variety of phonetic environments
change
previous studies with synthetic stimuli not very successful in training /r/-/l/ distinction
stimuli were produced by multiple talkers (high-variability) of American English
*Carlet, A., & Cebrian, J. (2022). The roles of task, segment type, and attention in L2 perceptual training. Applied Psycholinguistics, 43(2), 271–299. https://doi.org/10.1017/S0142716421000515
Participants: Spanish & Catalan learners of English
Tasks:
High-variability phonetic training (HVPT) using
forced-choice identification task
AX discrimination task
Stimuli: CVC nonwords V = /æ ʌ ɪ i: ɜ:/ C = /p t k b d g/
e.g., vap, vup, vab, vub, deedge, teedge, vik, vig, parsh, barsh
*Earle, F. S., & Myers, E. B. (2015). Overnight consolidation promotes generalization across talkers in the identification of nonnative speech sounds. The Journal of the Acoustical Society of America, 137(1), EL91–EL97. https://doi.org/10.1121/1.4903918
Exploring the generalization of phonetic learning across talkers
Participants: Monolingual speakers of American English
Tasks:
Trained on Hindi dental-retroflex contrast /ɖɛ d̪ɛ ɖa d̪a/
forced-choice identification task
*Flege, J. E. (1995b). Two procedures for training a novel second language phonetic contrast. Applied Psycholinguistics, 16(4), 425–442. https://doi.org/10.1017/S0142716400066029
Exploring /t d/ in final position of English words
e.g., beat vs. bead
Participants: Mandarin learners of English
Tasks:
identification or same/different feedback training
*Georgiou, G. P. (2021a). Effects of phonetic training on the discrimination of second language sounds by learners with naturalistic access to the second language. Journal of Psycholinguistic Research, 50, 707–721. https://doi.org/10.1007/s10936-021-09774-3
Exploring the effect of HVPT on the discrimination of L2 vowel contrasts in a country where the L2 is dominant
Participants: Egyptian Arabic learners of Greek
Tasks:
identification task to categorize L2 vowels to the phonological categories of their L1
then an AXB task
stressed-unstressed /i e/, /o-u/
What is the difference between an identification task and a discrimination task?
An identification task requires the participant to classify a sound heard into a category stored in memory.
A discrimination task does not access stored (long-term) memory. It depends on short-term memory to make a comparison between two aural stimuli.
However, this dichotomy is not as clear-cut as it may seem. What happens when categories are not yet well-defined? Can long-term memory be accessed under certain conditions in a discrimination task?

Comments