上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (2): 117-123.doi: 10.16183/j.cnki.jsjtu.2020.009

所属专题: 《上海交通大学学报》2021年12期专题汇总专辑 《上海交通大学学报》2021年“自动化技术、计算机技术”专题

• • 上一篇    下一篇

融入BERT的企业年报命名实体识别方法

张靖宜1, 贺光辉1(), 代洲2, 刘亚东1   

  1. 1.上海交通大学 电子信息与电气工程学院,上海  200240
    2.南方电网物资有限公司,广州  510641
  • 收稿日期:2020-01-08 出版日期:2021-02-01 发布日期:2021-03-03
  • 通讯作者: 贺光辉 E-mail:guanghui.he@sjtu.edu.cn
  • 作者简介:张靖宜(1996-),女,河南省南阳市人,硕士生,主要从事自然语言处理的研究.

Named Entity Recognition of Enterprise Annual Report Integrated with BERT

ZHANG Jingyi1, HE Guanghui1(), DAI Zhou2, LIU Yadong1   

  1. 1.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
    2.China Southern Power Grid Materials Co. , Ltd. , Guangzhou 510641, China
  • Received:2020-01-08 Online:2021-02-01 Published:2021-03-03
  • Contact: HE Guanghui E-mail:guanghui.he@sjtu.edu.cn

摘要:

自动提取企业年报关键数据是企业评价工作自动化的重要手段.针对企业年报领域关键实体结构复杂、与上下文语义关联强、规模较小的特点,提出基于转换器的双向编码器表示-双向门控循环单元-注意力机制-条件随机场(BERT-BiGRU-Attention-CRF)模型.在BiGRU-CRF模型的基础上,首先引入BERT预训练语言模型,以增强词向量模型的泛化能力,捕捉长距离的上下文信息;然后引入注意力机制,以充分挖掘文本的全局和局部特征.在自行构建的企业年报语料库内进行实验,将该模型与多组传统模型进行对比.结果表明:该模型的F1值(精确率和召回率的调和平均数)为93.69%,对企业年报命名实体识别性能优于其他传统模型,有望成为企业评价工作自动化的有效方法.

关键词: 命名实体识别, 企业年报, BERT, 注意力机制, 双向门控循环单元

Abstract:

Automatically extracting key data from annual reports is an important means of business assessments. Aimed at the characteristics of complex entities, strong contextual semantics, and small scale of key entities in the field of corporate annual reports, a BERT-BiGRU-Attention-CRF model was proposed to automatically identify and extract entities in the annual reports of enterprises. Based on the BiGRU-CRF model, the BERT pre-trained language model was used to enhance the generalization ability of the word vector model to capture long-range contextual information. Furthermore, the attention mechanism was used to fully mine the global and local features of the text. The experiment was performed on a self-constructed corporate annual report corpus, and the model was compared with multiple sets of models. The results show that the value of F1 (harmonic mean of precision and recall) of the BERT-BiGRU-Attention-CRF model is 93.69%. The model has a better performance than other traditional models in annual reports, and is expected to provide an automatic means for enterprise assessments.

Key words: named entity recognition, enterprise annual report, BERT, attention mechanism, BiGRU

中图分类号: