
基于BERT-BiGRU-CRF模型的岩土工程实体识别
王权于, 李振华, 涂志鹏, 陈冠宇, 胡君, 陈嘉麒, 陈建军, 吕国斌
基于BERT-BiGRU-CRF模型的岩土工程实体识别
Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model
岩土工程实体识别是岩土工程文本挖掘和知识谱图的工作基础和重要前提. 针对岩土工程实体识别问题,参考《GB/T 50279-2014:岩土工程基本术语标准》等国家行业标准规范,设计和构建了一个小规模的岩土工程命名实体语料库;提出了一种岩土工程文本命名实体识别深度学习模型BERT-BiGRU-CRF(简称:GENER):表示学习层采用BERT预训练语言模型实现了岩土工程文本特征的迁移表示学习;BiGRU上下文编码层实现对岩土工程文本上下文特征编码;CRF标签解码层解决了标签间依赖约束,生成符合标注规律的岩土工程命名实体标签序列;最后,基于岩土工程命名实体语料库,对GENER模型进行了实验分析. 在对照实验中,取得了良好效果:精确率P达到了90.94%,召回率R达到了92.88%,F1值达到了91.89%,模型训练速度提升了4.735%. 实验结果表明相比基线模型BiLSTM-CRF和其他预训练模型,GENER模型在小规模语料岩土工程命名实体识别方面效果更优,未来可推广应用到其他地质类文本命名实体识别任务.
Geotechnical engineering named entity recognition is an important prerequisite and the work foundation for geotechnical information mining and knowledge Graph. Aiming at the recognition and classification of named entities in geotechnical texts, this article first designs and constructs a named entity corpus of geotechnical engineering according to Standard for Fundamental Terms of Geotechnical Engineering (GB/T 50279-2014) and other national industry standards; and based on deep learning technologies, a named entity recognition and classification deep learning model GENER is proposed for geotechnical engineering text. In GENER, the distributed representation learning of geotechnical engineering text features is realized based on the BERT pretrained language model; the geotechnical engineering text context feature encoding is achieved based on the BiGRU context coding layer; and based on the label decoding layer of CRF, the context features are decoded to generate the label sequence of geotechnical engineering named entity. Finally, based on the geotechnical engineering corpus, the GENER model is experimentally analyzed. comparing with other deep learning models for named entity recognition based on pretrained language models, the GENER model has better performance. The precision reaches 90.94%, the recall reaches 92.88%, the F1-score reaches 91.89%and model training speed increased by 4.735% respectively.Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geotechnical engineering entity recognition.
命名实体识别 / 深度学习 / 岩土工程 / 语料库 / 地质大数据
named entity recognition / deep learning / geotechnical engineering / corpus / geological bigdata
P642
Bengio, Y.,Courville, A., Vincent, P., 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(8): 1798-1828.https://doi.org/10.1109/TPAMI.2013. 50
|
Cho, K., Van, M., Gulcehre, C., et al., 2014. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv Preprint ArXiv:1406.1078.
|
Chu, D. P., Wan, P., Li, H., et al., 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science. 46(8): 3039-3048(in Chinese with English abstract).
|
Chung, J., Gulcehre, C., Cho, K. H., et al.,2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555.
|
Devlin, J., Chang, M., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. ArXiv Preprint ArXiv:1810.04805.
|
Dong, L., Yang, N., Wang, W., et al., 2019. Unified Language Model Pre-Training for Natural Language Understanding and Generation. ArXiv Preprint ArXiv:1905.03197.
|
Fan, R., Wang, L., Yan, J., et al.,2020. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 9(1): 15.https://doi.org/10.3390/ijgi9010015
|
Goyal, A., Gupta, V., Kumar, M., 2018. Recent Named Entity Recognition and Classification Techniques: A Systematic Review. Computer Science Review, 29: 21-43. https://doi.org/10.1016/j.cosrev.2018.06.001
|
He,Y.X., Luo,C.W., Hu,B.Y., 2015. Geographic Entity Recognition Method Based on CRF Model And Rules Combination. Computer Application and Software. 2015, 32(1): 179(in Chinese with English abstract).
|
Lafferty, J., Mccallum, A., Pereira, F., 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, 282–289.
|
Lample, G., Ballesteros, M., Subramanian, S., et al., 2016. Neural Architectures for Named Entity Recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego. https://doi.org/10.18653/v1/n16-1030
|
Li, J., Sun, A.X., Han, J.L., et al.,2022.ASurvey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1): 50-70.https://doi.org/10.1109/TKDE.2020.2981314
|
Liu, D. S., Liu, H. L., Wu, Y., et al, 2022. Genetic Features of Geo-Materials and Their Testing Metohd.Journal of Civil and Environmental Engineering, 44(04): 1-9 (in Chinese with English abstract).
|
Liu, H. L., Zhang, R. H., Liu, D. S., et al., 2021. Study on the Characteristics of Physical and Mechanical Parameters of Engineering Geology Based on Data Fusion. Journal of Civil and Environmental Engineering, 1-11(in Chinese with English abstract).
|
Liu, X., Zhang, S., Wei, F., et al.,2011 Recognizing Named Entities in Tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, USA, 359-367.
|
Liu, Y., Ott, M., Goyal, N., et al., 2019.RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv Preprint ArXiv:1907.11692.
|
Marrero, M.,Urbano, J., Sánchez-Cuadrado, S., et al., 2013. Named Entity Recognition: Fallacies, Challenges and Opportunities. Computer Standards & Interfaces, 35(5): 482-489. https://doi.org/10.1016/j.csi.2012.09.004
|
Ministry of Housing and Urban Rural Development of The People's Republic of China, 2013. GB/T 50330-2013: Construction Side Slope Engineering technology Stand. Beijing: China Architecture & Building Pres(in Chinese).
|
Ministry of Housing and Urban Rural Development of The People's Republic of China, 2015. JTGT 84-2015: Terminology Standard for geotechnical investigation. Beijing: China Architecture & Building Pres( in Chinese).
|
Ministry of Water Resources of the People's Republic of China, 2014. GB/T 50279-2014:Basic Nomenclature Standard of Geotechnical Engineer. China Planning Press,Beijing(in Chinese).
|
Nadeau, D., Sekine, S., 2007. A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes, 30(1):3-26. https://doi.org/10.1075/li.30.1.03nad
|
Qiu, Q., Xie, Z., Wu, L., et al., 2019. BiLSTM-CRF for Geological Named Entity Recognition from The Geoscience Literature. Earth Science Informatics, 12(4): 565-579.https://doi.org/10.1007/s12145-019-00390-3
|
Qiu, Q., Xie, Z., Wu, L., et al., 2019. GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931-946.https://doi.org/10.1029/2019EA000610
|
Qiu, X., Sun, T., Xu, Y., et al., 2020. Pre-Trained Models for Natural Language Processing: A Survey.Science China Technological Sciences, 63(10): 1872-1897. https://doi.org/10.1007/s11431-020-1647-3
|
Quimbaya, A. P., Múnera, A. S, Rivera, R. A. G., et al.,2016. Named Entity Recognition over Electronic Health Records through a Combined Dictionary-Based Approach. Procedia Computer Science, 100: 55-61. https://doi.org/10.1016/j.procs.2016.09.123
|
Ritter, A., Clark, S., Etzioni, O.,2011. Named Entity Recognition in Tweets: an Experimental Study. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, USA.
|
Rocktäschel, T., Weidlich, M., Leser, U.,2012.ChemSpot: a Hybrid System for Chemical Named Entity Recognition. Bioinformatics, 28(12): 1633-1640. https://doi.org/10.1093/bioinformatics/bts183
|
Sharnagat, R.,2014.Named Entity Recognition: A Literature Survey. Center For Indian Language Technology, 8-20.
|
Wang, C., Ma, X., Chen, J., et al.,2018. Information Extraction and Knowledge Graph Construction from Geoscience Literature. Computers & Geosciences, 112: 112-120.https://doi.org/10.1016/j.cageo.2017.12.007
|
Yang, J., Zhang, Y., Li, L., et al.,2018. YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, Australia. https://doi.org/10.18653/v1/P18-4006
|
Yang, Z., Dai, Z., Yang, Y., et al., 2019. Xlnet: Generalized Autoregressive Pretraining for Language Understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc.,NewYork.
|
Zhang, G. Y., Fu, J. Y., Ouyang, Z. Z.,et al., 2020. The Importance of Space Database Establishment Based on DGSS in Big Data Environment. Earth Science. 45(9):3451-3460(in Chinese with English abstract).
|
Zhang, S., Elhadad, N.,2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of biomedical informatics, 46(6): 1088-1098. https://doi.org/10.1016/j.jbi.2013.08.004
|
Zhang, S. D., Elhadad, N., 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of Biomedical Informatics, 46(6): 1088-1098. https://doi.org/10.1016/j.jbi.2013.08.004
|
Zhang,X. Y., Ye, P., Wang,S.,et al.,2018.Geological Entity Recognition Method Based on Deep Belief Networks. Acta Petrologica Sinica. 34(2): 343-351(in Chinese with English abstract).
|
Zhang,X.Y.,Zhu,S. N.,Zhang,C. J.,2012. Annotation of Geographical Named Entities in Chinese Text. Acta Geodaetica et Cartographica Sinica, 41(1): 115-120. (in Chinese with English abstract).
|
Zhang, Z., Han, X., Liu, Z., et al., 2019.ERNIE: Enhanced Language Representation with Informative Entities. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence. https://doi.org/10.18653/v1/P19-1139
|
储德平,万波,李红,等,2021.基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别.地球科学,46(8):3039-3048.
|
何炎祥,罗楚威,胡彬尧,2015.基于 CRF 和规则相结合的地理命名实体识别方法.计算机应用与软件,32(1): 179.
|
张雪英,叶鹏,王曙,等,2018.基于深度信念网络的地质实体识别方法.岩石学报,34(2):343-351.
|
刘汉龙,章润红,刘东升,等,2021.基于数据融合的工程地质物理力学参数特征研究.土木与环境工程学报(中英文),1-11.
|
刘东升,刘汉龙,吴越,等,2022.岩土材料的基因特征及其测试方法研究.土木与环境工程学报(中英文),44(04): 1-9.
|
张雪英,朱少楠,张春菊,2012.中文文本的地理命名实体标注.测绘学报,41(1):115-120.
|
张广宇,付俊彧,欧阳兆灼,等,2020.大数据时代基于dgss系统下空间数据库建立的重要性.地球科学,45(9): 3451-3460.
|
中华人民共和国水利部,2014. GB/T 50279-2014:岩土工程基本术语标准.北京:中国计划出版社.
|
中华人民共和国住房和城乡建设部,2013, GB/T 50330-2013:建筑边坡工程技术规范. 北京:中国建筑工业出版社.
|
中华人民共和国住房和城乡建设部,2015. JTGT 84-2015:岩土工程勘察术语标准.北京:中国建筑工业出版社.
|
/
〈 |
|
〉 |