Research on stuttering type detection based on YOLOv5

CHENG Zhen, JIA Jia-min, JIANG Zuo, WANG Xin

PDF(4398 KB)
PDF(4398 KB)
Journal of Yunnan University of Nationalities(Natural Sciences Edition) ›› 2025, Vol. 34 ›› Issue (01) : 84-92. DOI: 10.3969/j.issn.1672-8513.2025.01.011

Research on stuttering type detection based on YOLOv5

Author information +
History +

Abstract

The language communication efficiency score is a method to quantify the severity of stuttering. This method requires the time when the stuttering occurs. However, current related research can only determine whether there is stuttering in the speech segment, and cannot accurately locate the stuttering, which is not conducive to the identification of severity of stuttering. In view of the problem that the current deep learning detection of stuttering type cannot visually locate the target, this paper first uses short-term Fourier transform to convert the speech into a spectrogram, then marks the stuttering type, and finally uses YOLOv5 to detect the stuttering type. Under the basic framework of YOLOv5, four models of different depth and width of YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x are tried to realize the classification and positioning of stuttering types, and the efficient attention mechanism and CIOU target box loss function are introduced into with the best performance to improve the basic model. The experimental results show that the improved YOLOv5l model has a significant reduction in the training loss value, and the accuracy, recall and mAP_0.5 are increased by 1.2, 0.6 and 0.4 percentage point respectively, which is an improvement compared with the miss detection of the original model.

Key words

YOLOv5 / stuttering recognition / spectrogram / target detection

Cite this article

Download Citations
CHENG Zhen , JIA Jia-min , JIANG Zuo , et al. Research on stuttering type detection based on YOLOv5. Journal of Yunnan University of Nationalities(Natural Sciences Edition). 2025, 34(01): 84-92 https://doi.org/10.3969/j.issn.1672-8513.2025.01.011

References

1
GUITAR B. Stuttering: an integrated approach to its nature and treatment [M]. Baltimore, MD: Williams& Wilkins, 1998.
2
SALTUKLAROGLU T KALINOWSKI J. How effective is therapy for childhood stuttering? Dissecting and reinterpreting the evidence in light of spontaneous recovery rates[J]. International Journal of Language & Communication Disorders200540(3):359-374.
3
HOWELL P HAMILTON A KYRIACOPOULOS A.Automatic recognition of repetitions and prolongations in stuttered speech[C]//Proceedings of the First World Congress on Fluency Disorders.Nijmegen, The Netherlands: University Press Nijmegen, 19952: 372-374.
4
RAVIKUMAR K M KUDVA S RAJAGOPAL R, et al. Development of a procedure for the automatic recognition of disfluencies in the speech of people who stutter[C]//International Conference on Advanced Computing Technologies, Hyderbad, India. 2008: 514-519.
5
CHEE L S AI O C YAACOB S. Overview of automatic stuttering recognition system[C]//Proc. International Conference on Man-Machine Systems, no. October, Batu Ferringhi, Penang Malaysia. 2009: 1-6.
6
ALHARBI S HASAN M SIMONS A J H, et al. A lightly supervised approach to detect stuttering in children's speech[C]//Proceedings of Interspeech 2018. ISCA, 2018: 3433-3437.
7
KOURKOUNAKIS T HAJAVI A ETEMAD A. Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020: 6089-6093.
8
AMIR O SHAPIRA Y MICK L. The speech efficiency score (SES): A time-domain measure of speech fluency[J]. Journal of fluency disorders201858: 61-69.
9
REDMON J DIVVALA S GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
10
WANG Q WU B ZHU P,et al .ECA-Net: efficient channel attention for deep convolutional neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2020: 11534-11542.
11
REZATOFIGHI H TSOI N GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 658-666.
12
ZHENG Z WANG P LIU W, et al.Distance-IoU Loss: faster and better learning for bounding box regression[J].Proceedings of the AAAI Conference on Artificial Intelligence202034(7):12993-13000.
13
PETER H STEPHEN D,JON B.The university college London archive of stuttered speech (UCLASS).[J].Journal of speech, language, and hearing research : JSLHR200952(2):556-569.
14
American Speech-Language-Hearing Association. Childhood fluency disorders [EB/OL]. 2020/2021-10-19.

Comments

PDF(4398 KB)

Accesses

Citation

Detail

Sections
Recommended

/