J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (5): 1065-1072.doi: 10.1007/s12204-023-2675-y

• • 上一篇    

基于ALBERT的中国诗酒文化命名实体识别

  

  1. 四川轻化工大学 自动化与信息工程学院,四川宜宾 643002;2. 人工智能四川省重点实验室,四川宜宾 643002
  • 收稿日期:2023-03-08 接受日期:2023-05-10 出版日期:2025-09-26 发布日期:2023-12-01

Named Entity Identification of Chinese Poetry and Wine Culture Based on ALBERT

杨壮1,李兆飞1, 2 ,王继华1,魏旭东1,张逸杰1   

  1. 1. School of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 643002, Sichuan, China, 2. Artificial Intelligence Key Laboratory of Sichuan Province, Sichuan University of Science and Engineering, Yibin 643002, Sichuan, China
  • Received:2023-03-08 Accepted:2023-05-10 Online:2025-09-26 Published:2023-12-01

摘要: 中国诗酒文化中文命名实体识别任务是构建该领域知识图谱与问答系统的关键步骤;针对中国诗酒文化实体长短不一,以及现阶段命名实体识别模型训练成本高的特点,本研究提出一种轻量级BERT-双向长短期记忆网络-注意力机制-条件随机场(ALBERT-BILSTM-Att-CRF)的中国诗酒文化深度识别方法。该方法首先通过ALBERT模块获得字符级别的语义信息,然后由BILSTM模块抽取其高维特征,由Attention层对原始词向量和学习后的文本向量进行加权,最后在CRF模块预测出真实的标签(包括:诗词题目,作者,时间,体裁和类型五类)序列。通过对中国诗酒文化相关数据集进行实验,结果表明:该方法的效果高于现有的主流模型,可以高效提取中国诗酒文化中的重要实体信息,是一种针对长短不一诗歌类命名实体识别的有效方法。

关键词: 诗酒文化, 命名实体识别, 深度学习, ALBERT模型, 双向长短期记忆网络(BILSTM), 注意力机制(Att), 条件随机场(CRF)

Abstract: The task of identifying Chinese named entities of Chinese poetry and wine culture is a key step in the construction of a knowledge graph and a question and answer system. Aimed at the characteristics of Chinese poetry and wine culture entities with different lengths and high training cost of named entity recognition models at the present stage, this study proposes a lite BERT+bi-directional long short-term memory+ attentional mechanisms +conditional random field (ALBERT+BILSTM+Att+CRF). The method first obtains the characterlevel semantic information by ALBERT module, then extracts its high-dimensional features by BILSTM module, weights the original word vector and the learned text vector by attention layer, and finally predicts the true label in CRF module (including five types: poem title, author, time, genre, and category). Through experiments on data sets related to Chinese poetry and wine culture, the results show that the method is more effective than existing mainstream models and can efficiently extract important entity information in Chinese poetry and wine culture, which is an effective method for the identification of named entities of varying lengths of poetry.

中图分类号: