Journal of Shanghai Jiao Tong University ›› 2022, Vol. 56 ›› Issue (11): 1554-1560.doi: 10.16183/j.cnki.jsjtu.2021.079

• Electronic Information and Electrical Engineering • Previous Articles    

Grammatical Error Correction by Transferring Learning Based on Pre-Trained Language Model

HAN Mingyue, WANG Yinglin()   

  1. School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
  • Received:2021-03-16 Online:2022-11-28 Published:2022-12-02
  • Contact: WANG Yinglin


Grammatical error correction (GEC) is a low-resource task, which requires annotations with high costs and is time consuming in training. In this paper, the MASS-GEC is proposed to solve this problem by transferring learning from a pre-trained language generation model, and masked sequence is proposed to sequence pre-training for language generation (MASS). In addition, specific preprocessing and postprocessing strategies are applied to improve the performance of the GEC system. Finally, this system is evaluated on two public datasets and a competitive performance is achieved compared with the state-of-the-art work with limited resources. Specifically, this system achieves 57.9 in terms of F0.5 score which emphasizes more on precision on the CoNLL2014 task. On the JFLEG task, the MASS-GEC achieves 59.1 in terms of GLEU score which measures the n-gram coincidence between the output of the model and the correct answer manually annotated. This paper provides a new perspective that the low-resource problem in GEC can be solved well by transferring the general language knowledge from the self-supervised pre-trained language model.

Key words: grammatical error correction (GEC), natural language generation, sequence to sequence

CLC Number: