大型语言模型在中医执业医师资格考试中的表现与分析

向巴卓玛; 王珍珍; 赵岩松; 马勤; 倪磊; 马星光

doi:10.3969/j.issn.1003-305X.2025.01.263

PDF(1052 KB)

中医教育 ›› 2025, Vol. 44 ›› Issue (1) : 137-142. DOI: 10.3969/j.issn.1003-305X.2025.01.263

教学园地

大型语言模型在中医执业医师资格考试中的表现与分析

向巴卓玛 ¹ ,
王珍珍 ¹ ,
赵岩松 ¹ ,
马勤 ² ,
倪磊 ¹ ,
马星光 ¹

作者信息 +

Performance and analysis of large language models in the qualification examination for traditional Chinese medicine practitioners

Author information +

History +

摘要

目的评估不同大型语言模型在中医执业医师资格考试中的应用表现。方法选用中医执业医师资格考试题库中的不同学科对文心一言4.0、ChatGPT4.0、百川大模型3.0、Claude3-Sonnet、智谱清言4.0共5种大型语言模型进行答题准确率测试。结果文心一言4.0和百川大模型3.0在中医不同学科上的总准确率最高，而智谱清言4.0的总准确率最低。从不同中医学科目上比较，5种模型在中医内科学和中药学上准确率较高，但在方剂学和中医经典等需要理解中医古文典籍或应用能力方面的科目上，模型准确率较低，且各模型之间存在差异。结论不同模型的表现差异表明，模型的表现受训练数据的内容、质量及模型自身逻辑推理能力等多方面因素的影响。随着人工智能技术的不断迭代发展，将模型作为教学辅助工具，有望推动教育领域的变革。通过加强模型在特定专业领域的训练，可以进一步提升模型对相关专业术语的理解和应用能力，更好地满足教学领域的实际需求，进而提升教学质量和学习效率。

Abstract

objective This study aims to evaluate the performance of different large language models in the qualification examination for traditional Chinese medicine （TCM） practitioners. Methods Five large language models—ERNIE Bot 4.0， ChatGPT 4.0， Baichuan Big Model 3.0， Claude3-Sonnet， and ChatGLM 4.0—were tested for accuracy in answering questions from different disciplines within the TCM practitioner qualification exam question bank. Results ERNIE Bot 4.0 and Baichuan Big Model 3.0 achieved the highest overall accuracy rates across various TCM disciplines， with ERNIE Bot 4.0 exceeding an 80% accuracy rate. ChatGLM 4.0 showed the lowest overall accuracy. The models performed better in disciplines such as TCM Internal Medicine and Traditional Chinese Pharmacology， but their accuracy dropped significantly in Formula Science and TCM Classics， which require understanding ancient Chinese medical texts and advanced application skills. Differences in performance between the models were also observed. Conclusion The performance variations among the models indicate that factors such as the content and quality of training data， as well as the models' logical reasoning abilities， play a significant role in their effectiveness. As artificial intelligence continues to evolve， large language models are expected to become valuable teaching aids， potentially transforming education. Enhancing model training in specialized fields could improve their understanding and application of professional terminology， thereby better addressing educational needs and improving both teaching quality and learning efficiency.

关键词

人工智能 / 大型语言模型 / 中医执业医师资格考试 / 模型评价

Key words

artificial intelligence / large language models / qualification examination for traditional Chinese medicine practitioners / model evaluation

中图分类号

G642.474

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

向巴卓玛 , 王珍珍 , 赵岩松 , 等. 大型语言模型在中医执业医师资格考试中的表现与分析. 中医教育. 2025, 44(1): 137-142 https://doi.org/10.3969/j.issn.1003-305X.2025.01.263

Zhuoma XIANGBA, Zhenzhen WANG, Yansong ZHAO, et al. Performance and analysis of large language models in the qualification examination for traditional Chinese medicine practitioners[J]. Education of chinese medicine. 2025, 44(1): 137-142 https://doi.org/10.3969/j.issn.1003-305X.2025.01.263

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

1	刘明，吴忠明，廖剑，等.大语言模型的教育应用：原理、现状与挑战：从轻量级BERT到对话式ChatGPT［J］.现代教育技术，2023，33（8）：19-28. 本文引用 [1]

2	徐如瑶.面向垂直领域的知识增强预训练语言模型［D］.上海：华东师范大学，2023. 本文引用 [1]

3	陈慧敏，刘知远，孙茂松.大语言模型时代的社会机遇与挑战［J/OL］.计算机研究与发展，（2024-02-20）［2024-04-05］. https：//kns.cnki.net/kcms/detail/11.1777.TP.20240219.1454.026.html 本文引用 [1]

4	DAVE T， ATHALURI S A， SINGH S.ChatGPT in medicine：an overview of its applications，advantages，limitations，future prospects，and ethical considerations［J］.Front Artif Intell，2023，6：1169595. 本文引用 [1]

5	阮彤，卞俣昂，余广涯，等.医学大语言模型研究与应用综述［J］.中国卫生信息管理杂志，2023，20（6）：853-861. 本文引用 [1]

6	夏琪，程妙婷，薛翔钟，等.从国际视野透视如何将ChatGPT有效纳入教育：基于对72篇文献的系统综述［J］.现代教育技术，2023，33（6）：26-33. 本文引用 [1]

7	GILSON A， SAFRANEK C W， HUANG T，et al.Correction：how does ChatGPT perform on the United States medical licensing examination （USMLE）？ the implications of large language models for medical education and knowledge assessment［J］.JMIR Med Educ，2024，10：e57594. 本文引用 [1]

8	DENSEN P.Challenges and opportunities facing medical education［J］.Trans Am Clin Climatol Assoc，2011，122：48-58. 本文引用 [1]

9	王野.教育领域人工智能基准测试：跨学科中文大型语言模型的综合评估［J］.广西职业技术学院学报，2024，17（1）：61-68. 本文引用 [1]

10	MARVIN G， NAKAYIZA H， JJINGO D，et al.Prompt Engineering in Large Language Models［C］.Singapore：Springer Nature Singapore，2024：387-402. 本文引用 [1]

11	周棪忠，罗俊仁，谷学强，等.大语言模型视角下的智能规划方法综述［J/OL］.系统仿真学报，（2024-03-08）［2024-04-10］. https：//org/10.16182/j.issn1004731x.joss.23-1468 本文引用 [1]

12	王静仪.大型语言模型技术的影响、挑战与应对策略［J］.华东科技，2023（06）：96-98. 本文引用 [1]

13	FENG S， SHEN Y.ChatGPT and the future of medical education［J］.Acad Med，2023，98（8）：867-868. 本文引用 [1]

14	张峰，陈玮.ChatGPT与高等教育：人工智能如何驱动学习变革［J］.重庆理工大学学报（社会科学版），2023，37（5）：26-33. 本文引用 [1]

15	罗文，王厚峰.大语言模型评测综述［J］.中文信息学报，2024，38（1）：1-23. 本文引用 [1]

基金

北京中医药大学哲学社会科学培育基金项目(2024-JYB-PY-006)

北京中医药大学教育科学研究课题(XJY22048)

PDF(1052 KB)

Accesses

Citation

Detail

段落导航

Received	Published
2024-04-22	2025-01-30
Issue Date
2025-03-17

选择文件类型/文献管理软件名称

选择包含的内容