Towards textual data augmentation for neural networks: synonyms and maximum loss
DOI:
https://doi.org/10.7494/csci.2019.20.1.3023Keywords:
deep learning, data augmentation, neural networks, natural language processing, sentence classificationAbstract
Data augmentation is one of the ways of dealing with labeled data scarcity and overfitting. Both these problems are crucial for modern deep learning algorithms which require massive amounts of data. The problem is better explored in the context of image analysis than for text. This work is a step forward to close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The augumentation is based on the substitution of words using a thesaurus as well as the Princeton WordNet. Our method improves upon the baseline in almost all cases. In terms of accuracy the best of the variants is 1.2% (pp.) better than the baseline.
Downloads
References
Harvard Kim CNN implementation. github.com/harvardnlp/sent-conv-torch. Accessed July 19, 2018.
PyDictionary. pypi.org/project/PyDictionary/. Accessed July 19, 2018.
Thesaurus.com. www.thesaurus.com. Accessed July 19, 2018.
WordNet online. wordnet.princeton.edu. Accessed July 19, 2018.
Bojanowski P., Grave E., Joulin A., Mikolov T.: Enriching Word Vectors with Subword Information. In: Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017. ISSN 2307-387X.
Bottou L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, 2010.
Bottou L., Bousquet O.: The tradeoffs of large scale learning. In: Advances in neural information processing systems, pp. 161–168. 2008.
Ciregan D., Meier U., Schmidhuber J.: Multi-column deep neural networks for image classification. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, pp. 3642–3649. IEEE, 2012.
Cireşan D.C., Meier U., Masci J., Gambardella L.M., Schmidhuber J.: Highperformance neural networks for visual object classification. In: arXiv preprint arXiv:1102.0183, 2011.
Collobert R., Weston J., Bottou L., Karlen M., Kavukcuoglu K., Kuksa P.: Natural language processing (almost) from scratch. In: Journal of Machine Learning Research, vol. 12(Aug), pp. 2493–2537, 2011.
Duchi J., Hazan E., Singer Y.: Adaptive subgradient methods for online learning and stochastic optimization. In: Journal of Machine Learning Research, vol. 12(Jul), pp. 2121–2159, 2011.
Faucett L.: Fundamentals of neural networks. In: Architecture, Algorithms, 1994.
Fawzi A., Samulowitz H., Turaga D., Frossard P.: Adaptive data augmentation for image classification. In: Image Processing (ICIP), 2016 IEEE International Conference on, pp. 3688–3692. Ieee, 2016.
Friedman J., Hastie T., Tibshirani R.: The elements of statistical learning, vol. 1. Springer series in statistics New York, 2001.
Goodfellow I., Bengio Y., Courville A.: Deep Learning. MIT Press, 2016. http: //www.deeplearningbook.org.
Joulin A., Grave E., Bojanowski P., Mikolov T.: Bag of tricks for efficient text classification. In: arXiv preprint arXiv:1607.01759, 2016.
Karlik B., Olgac A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. In: International Journal of Artificial Intelligence and Expert Systems, vol. 1(4), pp. 111–122, 2011.
Kim Y.: Convolutional neural networks for sentence classification. In: arXiv preprint arXiv:1408.5882, 2014.
Krizhevsky A., Sutskever I., Hinton G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105. 2012.
LEBRET R.P.: Word Embeddings for Natural Language Processing. Ph.D. thesis, COLE POLYTECHNIQUE FDRALE DE LAUSANNE, 2016.
LeCun Y.: Une procedure d’apprentissage ponr reseau a seuil asymetrique. In: proceedings of Cognitiva 85, pp. 599–604, 1985.
LeCun Y., Bengio Y., Hinton G.: Deep learning. In: nature, vol. 521(7553), p. 436, 2015.
Li X., Roth D.: Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics-Volume 1, pp. 1–7. Association for Computational Linguistics, 2002.
Manning C.D.: Computational linguistics and deep learning. In: Computational Linguistics, vol. 41(4), pp. 701–707, 2015.
Mikolov T., Chen K., Corrado G., Dean J.: Efficient Estimation of Word Representations in Vector Space. In: CoRR, vol. abs/1301.3781, 2013. URL http://arxiv.org/abs/1301.3781.
Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119. 2013.
Miller G.: WordNet: An electronic lexical database. MIT press, 1998.
Miller G.A.: WordNet: a lexical database for English. In: Communications of the ACM, vol. 38(11), pp. 39–41, 1995.
Nair V., Hinton G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814. 2010.
Parker D.B.: Learning logic. In: , 1985.
Paulin M., Revaud J., Harchaoui Z., Perronnin F., Schmid C.: Transformation pursuit for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 3646–3653. IEEE, 2014.
Pennington J., Socher R., Manning C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. 2014.
Ptaszyński M., Leliwa G., Piech M., Smywiński-Pohl A.: Cyberbullying Detection – Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology. In: arXiv preprint arXiv:1808.00926, 2018. URL http://arxiv.org/abs/1808.00926.
Quijas J.K.: Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks. The University of Texas at El Paso, 2017.
Ratner A.J., Ehrenberg H., Hussain Z., Dunnmon J., Ré C.: Learning to Compose Domain-Specific Transformations for Data Augmentation. In: Advances in Neural Information Processing Systems, pp. 3239–3249. 2017.
Rosario R.R.: A Data Augmentation Approach to Short Text Classification. Ph.D. thesis, University of California, Los Angeles, 2017.
Rumelhart D.E., Hinton G.E., Williams R.J.: Learning representations by backpropagating errors. In: nature, vol. 323(6088), p. 533, 1986.
Simard P.Y., Steinkraus D., Platt J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol. 3, pp. 958–962. 2003.
Socher R., Lin C.C., Manning C., Ng A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp. 129–136. 2011.
Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R.:
Dropout: a simple way to prevent neural networks from overfitting. In: Journal
of machine learning research, vol. 15(1), pp. 1929–1958, 2014.
Toutanova K., Klein D., Manning C.D., Singer Y.: Feature-rich part-of-speech
tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference
of the North American Chapter of the Association for Computational Linguistics
on Human Language Technology-Volume 1, pp. 173–180. Association for Compu-
tational Linguistics, 2003.
Wan L., Zeiler M., Zhang S., Le Cun Y., Fergus R.: Regularization of neural
networks using dropconnect. In: International Conference on Machine Learning,
pp. 1058–1066. 2013.
Werbos P.: Beyond regression: new fools for prediction and analysis in the be-
havioral sciences. In: PhD thesis, Harvard University, 1974.
Wong S.C., Gatt A., Stamatescu V., McDonnell M.D.: Understanding data aug-
mentation for classification: when to warp? In: Digital Image Computing: Tech-
niques and Applications (DICTA), 2016 International Conference on, pp. 1–6.
IEEE, 2016.
Young T., Hazarika D., Poria S., Cambria E.: Recent trends in deep learning
based natural language processing. In: arXiv preprint arXiv:1708.02709, 2017.
Zeiler M.D.: ADADELTA: an adaptive learning rate method. In: arXiv preprint
arXiv:1212.5701, 2012.
Zhang X., Zhao J., LeCun Y.: Character-level convolutional networks for text
classification. In: Advances in neural information processing systems, pp. 649–
2015.