J Shanghai Jiaotong Univ Sci ›› 2024, Vol. 29 ›› Issue (3): 537-556.doi: 10.1007/s12204-022-2474-x

• Automation & Computer Technologies • Previous Articles     Next Articles

Semantic Entity Recognition and Relation Construction Method for Assembly Process Document

面向装配工艺文档的装配语义实体识别与关系构建方法

GU Xinghai顾星海),HUA Bao(花 豹),LIU Yahui(刘亚辉),SUN Xuemin(孙学民),BAO Jinsong(鲍劲松)   

  1. (College of Mechanical Engineering, Donghua University, Shanghai 201620, China)
  2. (东华大学 机械工程学院,上海201620)
  • Received:2021-08-06 Accepted:2021-11-08 Online:2024-05-28 Published:2024-05-28

Abstract: Assembly process documents record the designers’ intention or knowledge. However, common knowledge extraction methods are not well suitable fo assembly process documents, because of its tabular form and unstructured natural language texts. In this paper, an assembly semantic entity recognition and relation construction method oriented to assembly process documents is proposed. First, the assembly process sentences are extracted from the table through concerned region recognition and cell division, and they will be stored as a key-value object file. Then, the semantic entities in the sentence are identified through the sequence tagging model based on the specific attention mechanism for assembly operation type. The syntactic rules are designed for realizing automatic construction of relation between entities. Finally, by using the self-constructed corpus, it is proved that the sequence tagging model in the proposed method performs better than the mainstream named entity recognition model when handling assembly process design language. The effectiveness of the proposed method is also analyzed through the simulation experiment in the small-scale real scene, compared with manual method. The results show that the proposed method can help designers accumulate knowledge automatically and efficiently.

Key words: assembly process design, knowledge extraction, named entity recognition, text extraction in table, dependency syntactic parsing, attention mechanism

摘要: 装配工艺文档记录了工艺设计者的意图或知识。然而,由于其表格形式和非结构化的自然语言文本,普通知识抽取方法不适合于处理装配工艺文档。本文提出了一种面向装配工艺文档的装配语义实体识别与关系构建方法。首先,通过有效区域识别和单元格划分,从表格中提取装配工艺语句,并将其存储为键-值对象文件。然后,面向装配操作类型,通过基于注意力机制的序列标注模型识别语句中的语义实体,并设计句法规则实现实体间关系的自动构建。最后,通过使用自建的语料库,证明了该方法提出的序列标注模型在处理装配工艺设计语言时比主流的命名实体识别模型表现更好。并且,通过小规模真实场景下的仿真实验与人工方法进行比较,证明了该方法的有效性。结果表明,该方法可以帮助设计者自动、有效地积累知识。

关键词: 装配工艺设计,知识抽取,命名实体识别,表格文本抽取,依存句法分析,注意力机制

CLC Number: