PDF(4398 KB)
Research on stuttering type detection based on YOLOv5
CHENG Zhen, JIA Jia-min, JIANG Zuo, WANG Xin
PDF(4398 KB)
PDF(4398 KB)
Research on stuttering type detection based on YOLOv5
The language communication efficiency score is a method to quantify the severity of stuttering. This method requires the time when the stuttering occurs. However, current related research can only determine whether there is stuttering in the speech segment, and cannot accurately locate the stuttering, which is not conducive to the identification of severity of stuttering. In view of the problem that the current deep learning detection of stuttering type cannot visually locate the target, this paper first uses short-term Fourier transform to convert the speech into a spectrogram, then marks the stuttering type, and finally uses YOLOv5 to detect the stuttering type. Under the basic framework of YOLOv5, four models of different depth and width of YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x are tried to realize the classification and positioning of stuttering types, and the efficient attention mechanism and CIOU target box loss function are introduced into with the best performance to improve the basic model. The experimental results show that the improved YOLOv5l model has a significant reduction in the training loss value, and the accuracy, recall and mAP_0.5 are increased by 1.2, 0.6 and 0.4 percentage point respectively, which is an improvement compared with the miss detection of the original model.
YOLOv5 / stuttering recognition / spectrogram / target detection
| 1 |
|
| 2 |
|
| 3 |
|
| 4 |
|
| 5 |
|
| 6 |
|
| 7 |
|
| 8 |
|
| 9 |
|
| 10 |
|
| 11 |
|
| 12 |
|
| 13 |
|
| 14 |
American Speech-Language-Hearing Association. Childhood fluency disorders [EB/OL]. 2020/2021-10-19.
|
/
| 〈 |
|
〉 |