基于BERT-BiGRU-CRF模型的岩土工程实体识别

王权于; 李振华; 涂志鹏; 陈冠宇; 胡君; 陈嘉麒; 陈建军; 吕国斌

doi:10.3799/dqkx.2022.462

PDF(4039 KB)

地球科学 ›› 2023, Vol. 48 ›› Issue (08) : 3137-3150. DOI: 10.3799/dqkx.2022.462

基于BERT-BiGRU-CRF模型的岩土工程实体识别

王权于 ¹^,² ,
李振华 ¹^,² ,
涂志鹏 ² ,
陈冠宇 ¹ ,
胡君 ¹ ,
陈嘉麒 ² ,
陈建军 ² ,
吕国斌 ¹^,²

作者信息 +

Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model

Wang Quanyu ¹^,² ,
Li Zhenhua ¹^,² ,
Tu Zhipeng ² ,
Chen Guanyu ¹ ,
Hu Jun ¹ ,
Chen Jiaqi ² ,
Chen Jianjun ² ,
Lv Guobin ¹^,²

Author information +

History +

摘要

岩土工程实体识别是岩土工程文本挖掘和知识谱图的工作基础和重要前提. 针对岩土工程实体识别问题，参考《GB/T 50279-2014：岩土工程基本术语标准》等国家行业标准规范，设计和构建了一个小规模的岩土工程命名实体语料库；提出了一种岩土工程文本命名实体识别深度学习模型BERT-BiGRU-CRF（简称：GENER）：表示学习层采用BERT预训练语言模型实现了岩土工程文本特征的迁移表示学习；BiGRU上下文编码层实现对岩土工程文本上下文特征编码；CRF标签解码层解决了标签间依赖约束，生成符合标注规律的岩土工程命名实体标签序列；最后，基于岩土工程命名实体语料库，对GENER模型进行了实验分析. 在对照实验中，取得了良好效果：精确率P达到了90.94%，召回率R达到了92.88%，F1值达到了91.89%，模型训练速度提升了4.735%. 实验结果表明相比基线模型BiLSTM-CRF和其他预训练模型，GENER模型在小规模语料岩土工程命名实体识别方面效果更优，未来可推广应用到其他地质类文本命名实体识别任务.

Abstract

Geotechnical engineering named entity recognition is an important prerequisite and the work foundation for geotechnical information mining and knowledge Graph. Aiming at the recognition and classification of named entities in geotechnical texts, this article first designs and constructs a named entity corpus of geotechnical engineering according to Standard for Fundamental Terms of Geotechnical Engineering (GB/T 50279-2014) and other national industry standards; and based on deep learning technologies, a named entity recognition and classification deep learning model GENER is proposed for geotechnical engineering text. In GENER, the distributed representation learning of geotechnical engineering text features is realized based on the BERT pretrained language model; the geotechnical engineering text context feature encoding is achieved based on the BiGRU context coding layer; and based on the label decoding layer of CRF, the context features are decoded to generate the label sequence of geotechnical engineering named entity. Finally, based on the geotechnical engineering corpus, the GENER model is experimentally analyzed. comparing with other deep learning models for named entity recognition based on pretrained language models, the GENER model has better performance. The precision reaches 90.94%, the recall reaches 92.88%, the F1-score reaches 91.89%and model training speed increased by 4.735% respectively.Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geotechnical engineering entity recognition.

关键词

命名实体识别 / 深度学习 / 岩土工程 / 语料库 / 地质大数据

Key words

named entity recognition / deep learning / geotechnical engineering / corpus / geological bigdata

中图分类号

P642

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

王权于 , 李振华 , 涂志鹏 , 等. 基于BERT-BiGRU-CRF模型的岩土工程实体识别. 地球科学. 2023, 48(08): 3137-3150 https://doi.org/10.3799/dqkx.2022.462

Wang Quanyu, Li Zhenhua, Tu Zhipeng, et al. Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model[J]. Earth Science. 2023, 48(08): 3137-3150 https://doi.org/10.3799/dqkx.2022.462

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

Bengio, Y.,Courville, A., Vincent, P., 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(8): 1798-1828.https://doi.org/10.1109/TPAMI.2013. 50

本文引用 [1]

Cho, K., Van, M., Gulcehre, C., et al., 2014. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv Preprint ArXiv:1406.1078.

本文引用 [1]

Chu, D. P., Wan, P., Li, H., et al., 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science. 46(8): 3039-3048(in Chinese with English abstract).

Chung, J., Gulcehre, C., Cho, K. H., et al.,2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555.

本文引用 [1]

Devlin, J., Chang, M., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. ArXiv Preprint ArXiv:1810.04805.

本文引用 [1]

Dong, L., Yang, N., Wang, W., et al., 2019. Unified Language Model Pre-Training for Natural Language Understanding and Generation. ArXiv Preprint ArXiv:1905.03197.

本文引用 [2]

Fan, R., Wang, L., Yan, J., et al.,2020. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 9(1): 15.https://doi.org/10.3390/ijgi9010015

本文引用 [1]

Goyal, A., Gupta, V., Kumar, M., 2018. Recent Named Entity Recognition and Classification Techniques: A Systematic Review. Computer Science Review, 29: 21-43. https://doi.org/10.1016/j.cosrev.2018.06.001

本文引用 [1]

He,Y.X., Luo,C.W., Hu,B.Y., 2015. Geographic Entity Recognition Method Based on CRF Model And Rules Combination. Computer Application and Software. 2015, 32(1): 179(in Chinese with English abstract).

Lafferty, J., Mccallum, A., Pereira, F., 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, 282–289.

本文引用 [2]

Lample, G., Ballesteros, M., Subramanian, S., et al., 2016. Neural Architectures for Named Entity Recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego. https://doi.org/10.18653/v1/n16-1030

本文引用 [1]

Li, J., Sun, A.X., Han, J.L., et al.,2022.ASurvey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1): 50-70.https://doi.org/10.1109/TKDE.2020.2981314

本文引用 [1]

Liu， D. S., Liu， H. L., Wu, Y., et al, 2022. Genetic Features of Geo-Materials and Their Testing Metohd.Journal of Civil and Environmental Engineering, 44(04): 1-9 (in Chinese with English abstract).

Liu, H. L., Zhang, R. H., Liu, D. S., et al., 2021. Study on the Characteristics of Physical and Mechanical Parameters of Engineering Geology Based on Data Fusion. Journal of Civil and Environmental Engineering, 1-11(in Chinese with English abstract).

Liu, X., Zhang, S., Wei, F., et al.,2011 Recognizing Named Entities in Tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, USA, 359-367.

本文引用 [1]

Liu, Y., Ott, M., Goyal, N., et al., 2019.RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv Preprint ArXiv:1907.11692.

本文引用 [1]

Marrero, M.,Urbano, J., Sánchez-Cuadrado, S., et al., 2013. Named Entity Recognition: Fallacies, Challenges and Opportunities. Computer Standards & Interfaces, 35(5): 482-489. https://doi.org/10.1016/j.csi.2012.09.004

Ministry of Housing and Urban Rural Development of The People's Republic of China, 2013. GB/T 50330-2013: Construction Side Slope Engineering technology Stand. Beijing: China Architecture & Building Pres(in Chinese).

Ministry of Housing and Urban Rural Development of The People's Republic of China, 2015. JTGT 84-2015: Terminology Standard for geotechnical investigation. Beijing: China Architecture & Building Pres( in Chinese).

Ministry of Water Resources of the People's Republic of China, 2014. GB/T 50279-2014：Basic Nomenclature Standard of Geotechnical Engineer. China Planning Press,Beijing(in Chinese).

Nadeau, D., Sekine, S., 2007. A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes, 30(1):3-26. https://doi.org/10.1075/li.30.1.03nad

本文引用 [1]

Qiu, Q., Xie, Z., Wu, L., et al., 2019. BiLSTM-CRF for Geological Named Entity Recognition from The Geoscience Literature. Earth Science Informatics, 12(4): 565-579.https://doi.org/10.1007/s12145-019-00390-3

Qiu, Q., Xie, Z., Wu, L., et al., 2019. GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931-946.https://doi.org/10.1029/2019EA000610

Qiu, X., Sun, T., Xu, Y., et al., 2020. Pre-Trained Models for Natural Language Processing: A Survey.Science China Technological Sciences, 63(10): 1872-1897. https://doi.org/10.1007/s11431-020-1647-3

本文引用 [1]

Quimbaya, A. P., Múnera, A. S, Rivera, R. A. G., et al.,2016. Named Entity Recognition over Electronic Health Records through a Combined Dictionary-Based Approach. Procedia Computer Science, 100: 55-61. https://doi.org/10.1016/j.procs.2016.09.123

本文引用 [1]

Ritter, A., Clark, S., Etzioni, O.,2011. Named Entity Recognition in Tweets: an Experimental Study. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, USA.

本文引用 [1]

Rocktäschel, T., Weidlich, M., Leser, U.,2012.ChemSpot: a Hybrid System for Chemical Named Entity Recognition. Bioinformatics, 28(12): 1633-1640. https://doi.org/10.1093/bioinformatics/bts183

本文引用 [1]

Sharnagat, R.,2014.Named Entity Recognition: A Literature Survey. Center For Indian Language Technology, 8-20.

本文引用 [1]

Wang, C., Ma, X., Chen, J., et al.,2018. Information Extraction and Knowledge Graph Construction from Geoscience Literature. Computers & Geosciences, 112: 112-120.https://doi.org/10.1016/j.cageo.2017.12.007

本文引用 [1]

Yang, J., Zhang, Y., Li, L., et al.,2018. YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, Australia. https://doi.org/10.18653/v1/P18-4006

本文引用 [1]

Yang, Z., Dai, Z., Yang, Y., et al., 2019. Xlnet: Generalized Autoregressive Pretraining for Language Understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc.,NewYork.

Zhang, G. Y., Fu, J. Y., Ouyang, Z. Z.,et al., 2020. The Importance of Space Database Establishment Based on DGSS in Big Data Environment. Earth Science. 45(9):3451-3460(in Chinese with English abstract).

Zhang, S., Elhadad, N.,2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of biomedical informatics, 46(6): 1088-1098. https://doi.org/10.1016/j.jbi.2013.08.004

本文引用 [1]

Zhang, S. D., Elhadad, N., 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of Biomedical Informatics, 46(6): 1088-1098. https://doi.org/10.1016/j.jbi.2013.08.004

Zhang,X. Y., Ye, P., Wang,S.,et al.,2018.Geological Entity Recognition Method Based on Deep Belief Networks. Acta Petrologica Sinica. 34(2): 343-351(in Chinese with English abstract).

Zhang,X.Y.,Zhu,S. N.,Zhang,C. J.,2012. Annotation of Geographical Named Entities in Chinese Text. Acta Geodaetica et Cartographica Sinica, 41(1): 115-120. (in Chinese with English abstract).

Zhang, Z., Han, X., Liu, Z., et al., 2019.ERNIE: Enhanced Language Representation with Informative Entities. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence. https://doi.org/10.18653/v1/P19-1139

本文引用 [1]

储德平，万波，李红，等，2021.基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别.地球科学，46(8):3039-3048.

何炎祥，罗楚威，胡彬尧，2015.基于 CRF 和规则相结合的地理命名实体识别方法.计算机应用与软件，32(1): 179.

本文引用 [1]

张雪英，叶鹏，王曙，等，2018.基于深度信念网络的地质实体识别方法.岩石学报，34(2):343-351.

本文引用 [1]

刘汉龙，章润红，刘东升，等，2021.基于数据融合的工程地质物理力学参数特征研究.土木与环境工程学报（中英文），1-11.

本文引用 [1]

刘东升，刘汉龙，吴越，等，2022.岩土材料的基因特征及其测试方法研究.土木与环境工程学报（中英文），44(04): 1-9.

本文引用 [1]

张雪英，朱少楠，张春菊，2012.中文文本的地理命名实体标注.测绘学报，41(1)：115-120.

本文引用 [1]

张广宇，付俊彧，欧阳兆灼，等，2020.大数据时代基于dgss系统下空间数据库建立的重要性.地球科学，45(9): 3451-3460.

本文引用 [1]

中华人民共和国水利部，2014. GB/T 50279-2014：岩土工程基本术语标准.北京：中国计划出版社.

本文引用 [2]

中华人民共和国住房和城乡建设部，2013， GB/T 50330-2013：建筑边坡工程技术规范. 北京：中国建筑工业出版社.

本文引用 [2]

中华人民共和国住房和城乡建设部，2015. JTGT 84-2015：岩土工程勘察术语标准.北京：中国建筑工业出版社.

本文引用 [2]

基金

认知智能全国重点实验室开放课题(COGOS-2023HE09)

国家自然科学基金的基金(42103024;42130307)

PDF(4039 KB)

Accesses

Citation

Detail

段落导航

Received	Published
2022-12-01	2023-08-30
Issue Date
2025-06-18

选择文件类型/文献管理软件名称

选择包含的内容