Journal of shanghai Jiaotong University (Science) ›› 2015, Vol. 20 ›› Issue (2): 143-148.doi: 10.1007/s12204-015-1602-2

Previous Articles     Next Articles

Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Regression Applied to Semantic Textual Similarity

Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Regression Applied to Semantic Textual Similarity

SU Bai-hua1 (苏柏桦), WANG Ying-lin2* (王英林)   

  1. (1. Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200240, China; 2. Department of Computer Science and Technology, Shanghai University of Finance and Economics,Shanghai 200433, China)
  2. (1. Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200240, China; 2. Department of Computer Science and Technology, Shanghai University of Finance and Economics,Shanghai 200433, China)
  • Online:2015-04-30 Published:2015-04-02
  • Contact: WANG Ying-lin (王英林) E-mail:yinglin.wang@outlook.com

Abstract: Semantic textual similarity (STS) is a common task in natural language processing (NLP). STS measures the degree of semantic equivalence of two textual snippets. Recently, machine learning methods have been applied to this task, including methods based on support vector regression (SVR). However, there exist amounts of features involved in the learning process, part of which are noisy features and irrelative to the result. Furthermore, different parameters will significantly influence the prediction performance of the SVR model. In this paper, we propose genetic algorithm (GA) to select the effective features and optimize the parameters in the learning process, simultaneously. To evaluate the proposed approach, we adopt the STS-2012 dataset in the experiment. Compared with the grid search, the proposed GA-based approach has better regression performance.

Key words: support vector regression (SVR)| feature selection| semantic textural similarity (STS)

摘要: Semantic textual similarity (STS) is a common task in natural language processing (NLP). STS measures the degree of semantic equivalence of two textual snippets. Recently, machine learning methods have been applied to this task, including methods based on support vector regression (SVR). However, there exist amounts of features involved in the learning process, part of which are noisy features and irrelative to the result. Furthermore, different parameters will significantly influence the prediction performance of the SVR model. In this paper, we propose genetic algorithm (GA) to select the effective features and optimize the parameters in the learning process, simultaneously. To evaluate the proposed approach, we adopt the STS-2012 dataset in the experiment. Compared with the grid search, the proposed GA-based approach has better regression performance.

关键词: support vector regression (SVR)| feature selection| semantic textural similarity (STS)

CLC Number: