Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Regression Applied to Semantic Textual Similarity

doi:10.1007/s12204-015-1602-2

Abstract

Abstract: Semantic textual similarity (STS) is a common task in natural language processing (NLP). STS measures the degree of semantic equivalence of two textual snippets. Recently, machine learning methods have been applied to this task, including methods based on support vector regression (SVR). However, there exist amounts of features involved in the learning process, part of which are noisy features and irrelative to the result. Furthermore, different parameters will significantly influence the prediction performance of the SVR model. In this paper, we propose genetic algorithm (GA) to select the effective features and optimize the parameters in the learning process, simultaneously. To evaluate the proposed approach, we adopt the STS-2012 dataset in the experiment. Compared with the grid search, the proposed GA-based approach has better regression performance.

Key words: support vector regression (SVR)| feature selection| semantic textural similarity (STS)

摘要： Semantic textual similarity (STS) is a common task in natural language processing (NLP). STS measures the degree of semantic equivalence of two textual snippets. Recently, machine learning methods have been applied to this task, including methods based on support vector regression (SVR). However, there exist amounts of features involved in the learning process, part of which are noisy features and irrelative to the result. Furthermore, different parameters will significantly influence the prediction performance of the SVR model. In this paper, we propose genetic algorithm (GA) to select the effective features and optimize the parameters in the learning process, simultaneously. To evaluate the proposed approach, we adopt the STS-2012 dataset in the experiment. Compared with the grid search, the proposed GA-based approach has better regression performance.

关键词: support vector regression (SVR)| feature selection| semantic textural similarity (STS)

CLC Number:

TP 311.5

SU Bai-hua1 (苏柏桦), WANG Ying-lin2* (王英林). Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Regression Applied to Semantic Textual Similarity[J]. Journal of shanghai Jiaotong University (Science), 2015, 20(2): 143-148.

References 17

[1]	ˇSari′c F, Glavaˇs G, Karan M, et al. Takelab:Systems for measuring semantic text similarity[C]//Proceedings of the First Joint Conference on Lexical and Computational Semantics. Montreal, Canada:Association for Computational Linguistics, 2012: 441-448.
[2]	B¨ar D, Biemann C, Gurevych I, et al. Ukp: Computing semantic textual similarity by combining multiple content similarity measures [C]//Proceedings of the First Joint Conference on Lexical and Computational Semantics. Montreal, Canada: Association for Computational Linguistics, 2012: 435-440.
[3]	Fr¨ohlich H, Chapelle O, Sch¨olkopf B. Feature selection for support vector machines by means of genetic algorithm [C]//Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence.Washington, DC, USA: IEEE, 2003: 142-148.
[4]	Huang C L, Wang C J. A GA-based feature selection and parameters optimization for support vector machines [J]. Expert Systems with Applications, 2006,31(2): 231-240.
[5]	John G H, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem [C]//Proceedings on Machine Learning’94. [s. l.]: Morgan Kauffmann Publishers,1994: 121-129.
[6]	Kohavi R, John G H. Wrappers for feature subset selection [J]. Artificial Intelligence, 1997, 97(1): 273-324.
[7]	Witten I H, Frank E. Data mining: Practical machine learning tools and techniques [M]. Burlington,Massachusetts, USA: Morgan Kaufmann Publishers,2005.
[8]	Vapnik V. The nature of statistical learning theory[M]. Berlin, Germany: Springer-Verlag, 2000.
[9]	Drucker H, Burges C J C, Kaufman L, et al. Support vector regression machines [J]. Advances in Neural Information Processing Systems, 1997, 9: 155-161.
[10]	Sch¨olkopf B, Smola A J. Learning with kernels:Support vector machines, regularization, optimization,and beyond [M]. Cambridge MA, USA: MIT press,2002.
[11]	Smola A J, Sch¨olkopf B. A tutorial on support vector regression [J]. Statistics and Computing, 2004,14(3): 199-222.
[12]	Vapnik V, Golowich S E, Smola A. Support vector method for function approximation, regression estimation,and signal processing [J]. Advances in Neural Information Processing Systems, 1997, 9: 281-287.
[13]	Bennett K P, Mangasarian O L. Robust linear programming discrimination of two linearly inseparable sets [J]. Optimization Methods and Software, 1992,1(1): 23-34.
[14]	Cortes C, Vapnik V. Support-vector networks [J].Machine Learning, 1995, 20(3): 273-297.
[15]	Salzberg S L. On comparing classifiers: Pitfalls to avoid and a recommended approach [J]. Data Mining and Knowledge Discovery, 1997, 1(3): 317-328.
[16]	Hsu C W, Chang C C, Lin C J. A practical guide to support vector classification [R]. Taipei, China: Department of Computer Science, National Taiwan University,2003.
[17]	Chang C C, Lin C J. LIBSVM: A library for support vector machines [J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27.