[1] MARON M E. Automatic indexing: An experimental inquiry [J]. Journal of the ACM, 1961, 8(3): 404-417.
[2] COVER T, HART P. Nearest neighbor pattern classification [J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27.
[3] JOACHIMS T. Text categorization with support vector machines: Learning with many relevant features [M]//Machine learning: ECML-98. Berlin, Heidelberg: Springer, 1998: 137-142.
[4] SCHNEIDER K M. A new feature selection score for multinomial naive Bayes text classification based on KL-divergence [C]// ACL Interactive Poster and Demonstration Sessions. Barcelona: ACL, 2004: 186-189.
[5] DAI W, XUE G R, YANG Q, et al. Transferring naive Bayes classifiers for text classification [C]// 22nd National Conference on Artificial Intelligence. Vancouver: AAAI, 2007: 540-545.
[6] CORTES C, VAPNIK V. Support-vector networks [J]. Machine Learning, 1995, 20(3): 273-297.
[7] JOACHIMS T. Transductive inference for text classification using support vector machines [C]// 16th International Conference on Machine Learning. Bled: IMLS, 1999: 200-209.
[8] LAI S W, XU L H, LIU K, et al. Recurrent convolutional neural networks for text classification [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2015, 29(1): 2267-2273.
[9] SUTSKEVER I, MARTENS J, HINTON G E. Generating text with recurrent neural networks [C]// 28th International Conference on Machine Learning. Bellevue: IMLS, 2011: 1017-1024.
[10] MANDIC D P, CHAMBERS J. Recurrent neural networks for prediction: learning algorithms, architectures and stability [M]. Chichester: John Wiley & Sons, Inc., 2001.
[11] JIANG M Y, LIANG Y C, FENG X Y, et al. Text classification based on deep belief network and softmax regression [J]. Neural Computing and Applications, 2018, 29(1): 61-70.
[12] LEWIS M, LIU Y H, GOYAL N, et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension [C]// 58th Annual Meeting of the Association for Computational Linguistics. Online: ACL, 2020: 7871-7880.
[13] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [EB/OL]. [2024-12-01]. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf
[14] ZHANG Q, CHEN X. Applying BERT on the classification of Chinese legal documents [M]//Advances in Internet, data & web technologies. Cham: Springer, 2023: 215-222.
[15] WANG J, ZHANG J, HU B F. Optimal class-dependent discretization-based fine-grain hypernetworks for classification of microarray data [J]. Journal of Shanghai Jiao Tong University, 2013, 47(12): 1856-1862 (in Chinese).
[16] KOWSARI K, HEIDARYSAFA M, BROWN D E, et al. RMDL: Random multimodel deep learning for classification [C]// 2nd International Conference on Information System and Data Mining. Lakeland: ACM, 2018: 19-28.
[17] WU Y, JIANG M, XU J, et al. Clinical named entity recognition using deep learning models [C]//AMIA Annual Symposium Proceedings. Washington: AMIA, 2017: 1812-1819.
[18] MAGGE A, SCOTCH M, GONZALEZ-HERNANDEZ G. Clinical NER and relation extraction using Bi-Char-LSTMs and random forest classifiers [C]// 1st International Workshop on Medication and Adverse Drug Event Detection. Worcester: PMLR, 2018: 25-30.
[19] BAXTER J. A model of inductive bias learning [J]. Journal of Artificial Intelligence Research, 2000, 12: 149-198.
[20] THRUN S. Is learning the n-th thing any easier than learning the first [C]// 9th International Conference on Neural Information Processing Systems. Denver: NIPS, 1995: 640-646.
[21] CARUANA R. Multitask learning [M]//Learning to learn. Boston,: Springer, 1998: 95-133.
[22] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[23] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [DB/OL]. (2013-01-16). https://arxiv.org/abs/1301.3781
[24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// 31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 1-11.
[25] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]// 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: ACL, 2019: 4171-4186.
[26] DAI Z H, YANG Z L, YANG Y M, et al. Transformer-XL: Attentive language models beyond a fixed-length context [DB/OL]. (2019-01-09). https://arXiv.org/abs/1901.02860
[27] SUN Y, WANG S, FENG S, et al. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation [DB/OL]. (2021-07-05). https://arxiv.org/abs/2107.02137
[28] DAUPHIN Y N, FAN A, AULI M, et al. Language modeling with gated convolutional networks [C]// 34th International Conference on Machine Learning. Sydney: PMLR, 2017: 933-941.
[29] LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [C]//18th International Conference on Machine Learning. Williamstown: IMLS, 2001: 282-289.
[30] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [DB/OL]. (2014-12-11). https://arxiv.org/abs/1412.3555 |