PDF(1052 KB)
Performance and analysis of large language models in the qualification examination for traditional Chinese medicine practitioners
Zhuoma XIANGBA, Zhenzhen WANG, Yansong ZHAO, Qin MA, Lei NI, Xingguang MA
PDF(1052 KB)
PDF(1052 KB)
Performance and analysis of large language models in the qualification examination for traditional Chinese medicine practitioners
objective This study aims to evaluate the performance of different large language models in the qualification examination for traditional Chinese medicine (TCM) practitioners. Methods Five large language models—ERNIE Bot 4.0, ChatGPT 4.0, Baichuan Big Model 3.0, Claude3-Sonnet, and ChatGLM 4.0—were tested for accuracy in answering questions from different disciplines within the TCM practitioner qualification exam question bank. Results ERNIE Bot 4.0 and Baichuan Big Model 3.0 achieved the highest overall accuracy rates across various TCM disciplines, with ERNIE Bot 4.0 exceeding an 80% accuracy rate. ChatGLM 4.0 showed the lowest overall accuracy. The models performed better in disciplines such as TCM Internal Medicine and Traditional Chinese Pharmacology, but their accuracy dropped significantly in Formula Science and TCM Classics, which require understanding ancient Chinese medical texts and advanced application skills. Differences in performance between the models were also observed. Conclusion The performance variations among the models indicate that factors such as the content and quality of training data, as well as the models' logical reasoning abilities, play a significant role in their effectiveness. As artificial intelligence continues to evolve, large language models are expected to become valuable teaching aids, potentially transforming education. Enhancing model training in specialized fields could improve their understanding and application of professional terminology, thereby better addressing educational needs and improving both teaching quality and learning efficiency.
artificial intelligence / large language models / qualification examination for traditional Chinese medicine practitioners / model evaluation
| 1 |
刘明,吴忠明,廖剑,等.大语言模型的教育应用:原理、现状与挑战:从轻量级BERT到对话式ChatGPT[J].现代教育技术,2023,33(8):19-28.
|
| 2 |
徐如瑶.面向垂直领域的知识增强预训练语言模型[D].上海:华东师范大学,2023.
|
| 3 |
陈慧敏,刘知远,孙茂松.大语言模型时代的社会机遇与挑战[J/OL].计算机研究与发展, (2024-02-20)[2024-04-05].
|
| 4 |
|
| 5 |
阮彤,卞俣昂,余广涯,等.医学大语言模型研究与应用综述[J].中国卫生信息管理杂志,2023,20(6):853-861.
|
| 6 |
夏琪,程妙婷,薛翔钟,等.从国际视野透视如何将ChatGPT有效纳入教育:基于对72篇文献的系统综述[J].现代教育技术,2023,33(6):26-33.
|
| 7 |
|
| 8 |
|
| 9 |
王野.教育领域人工智能基准测试:跨学科中文大型语言模型的综合评估[J].广西职业技术学院学报,2024,17(1):61-68.
|
| 10 |
|
| 11 |
周棪忠,罗俊仁,谷学强,等.大语言模型视角下的智能规划方法综述[J/OL].系统仿真学报,(2024-03-08)[2024-04-10].
|
| 12 |
王静仪.大型语言模型技术的影响、挑战与应对策略[J].华东科技,2023(06):96-98.
|
| 13 |
|
| 14 |
张峰,陈玮.ChatGPT与高等教育:人工智能如何驱动学习变革[J].重庆理工大学学报(社会科学版),2023,37(5):26-33.
|
| 15 |
罗文,王厚峰.大语言模型评测综述[J].中文信息学报,2024,38(1):1-23.
|
/
| 〈 |
|
〉 |