J Shanghai Jiaotong Univ Sci ›› 2024, Vol. 29 ›› Issue (6): 1169-1180.doi: 10.1007/s12204-022-2534-2

• Computer Technologies • Previous Articles     Next Articles

Named Entity Recognition of Design Specification Integrated with High-Quality Topic and Attention Mechanism

融入优质主题和注意力机制的设计规范命名实体识别方法

ZHOU Cheng (周成), JIANG Zuhua (蒋祖华)   

  1. (School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China)
  2. (上海交通大学 机械与动力工程学院,上海200240)
  • Received:2022-04-24 Accepted:2022-07-18 Online:2024-11-28 Published:2024-11-28

Abstract: Automatic extraction of key data from design specifications is an important means to assist in engineering design automation. Considering the characteristics of diverse data types, small scale, insufficient character information content and strong contextual relevance of design specification, a named entity recognition model integrated with high-quality topic and attention mechanism, namely Quality Topic-Char Embedding-BiLSTMAttention-CRF, was proposed to automatically identify entities in design specification. Based on the topic model,an improved algorithm for high-quality topic extraction was proposed first, and then the high-quality topic information obtained was added into the distributed representation of Chinese characters to better enrich character features. Next, the attention mechanism was used in parallel on the basis of the BiLSTM-CRF model to fully mine the contextual semantic information. Finally, the experiment was performed on the collected corpus of Chinese ship design specification, and the model was compared with multiple sets of models. The results show that F-score (harmonic mean of precision and recall) of the model is 80.24%. The model performs better than other models in design specification, and is expected to provide an automatic means for engineering design.

Key words: named entity recognition, design specification, topic model, high-quality topic, attention mechanism

摘要: 从设计规范中自动提取关键数据是辅助工程设计自动化的重要手段。针对设计规范数据类型多、规模小、字符信息含量不足、上下文相关性强等特点,提出了一种集成高质量主题与注意力机制的命名实体识别模型,即“高质量主题-字符嵌入- BiLSTM- CRF”,用于设计规范实体的自动识别。在主题模型的基础上,提出了一种改进的高质量主题提取算法,然后将获得的高质量主题信息加入到汉字的分布式表示中,以更好地丰富汉字特征。其次,在BiLSTM-CRF模型的基础上并行使用注意机制,充分挖掘上下文语义信息。最后,在收集到的中国船舶设计规范语料上进行了实验,并与多组模型进行了比较。结果表明:该模型的F-score(召回率和准确率的调和平均值)为80.24%。该模型在设计规范方面优于其他模型,有望为工程设计提供一种自动化手段。

关键词: 命名实体识别,设计规范,主题模型,优质主题,注意力机制

CLC Number: