Towards textual data augmentation for neural networks: synonyms and maximum loss


  • Michał Jungiewicz AGH University of Science and Technology
  • Aleksander Smywinski-Pohl AGH University of Science and Technology



deep learning, data augmentation, neural networks, natural language processing, sentence classification


Data augmentation is one of the ways of dealing with labeled data scarcity and overfitting. Both these problems are crucial for modern deep learning algorithms which require massive amounts of data. The problem is better explored in the context of image analysis than for text. This work is a step forward to close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The augumentation is based on the substitution of words using a thesaurus as well as the Princeton WordNet. Our method improves upon the baseline in almost all cases. In terms of accuracy the best of the variants is 1.2% (pp.) better  than the baseline.


Download data is not yet available.


Harvard Kim CNN implementation. Accessed July 19, 2018.

PyDictionary. Accessed July 19, 2018. Accessed July 19, 2018.

WordNet online. Accessed July 19, 2018.

Bojanowski P., Grave E., Joulin A., Mikolov T.: Enriching Word Vectors with Subword Information. In: Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017. ISSN 2307-387X.

Bottou L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, 2010.

Bottou L., Bousquet O.: The tradeoffs of large scale learning. In: Advances in neural information processing systems, pp. 161–168. 2008.

Ciregan D., Meier U., Schmidhuber J.: Multi-column deep neural networks for image classification. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, pp. 3642–3649. IEEE, 2012.

Cireşan D.C., Meier U., Masci J., Gambardella L.M., Schmidhuber J.: Highperformance neural networks for visual object classification. In: arXiv preprint arXiv:1102.0183, 2011.

Collobert R., Weston J., Bottou L., Karlen M., Kavukcuoglu K., Kuksa P.: Natural language processing (almost) from scratch. In: Journal of Machine Learning Research, vol. 12(Aug), pp. 2493–2537, 2011.

Duchi J., Hazan E., Singer Y.: Adaptive subgradient methods for online learning and stochastic optimization. In: Journal of Machine Learning Research, vol. 12(Jul), pp. 2121–2159, 2011.

Faucett L.: Fundamentals of neural networks. In: Architecture, Algorithms, 1994.

Fawzi A., Samulowitz H., Turaga D., Frossard P.: Adaptive data augmentation for image classification. In: Image Processing (ICIP), 2016 IEEE International Conference on, pp. 3688–3692. Ieee, 2016.

Friedman J., Hastie T., Tibshirani R.: The elements of statistical learning, vol. 1. Springer series in statistics New York, 2001.

Goodfellow I., Bengio Y., Courville A.: Deep Learning. MIT Press, 2016. http: //

Joulin A., Grave E., Bojanowski P., Mikolov T.: Bag of tricks for efficient text classification. In: arXiv preprint arXiv:1607.01759, 2016.

Karlik B., Olgac A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. In: International Journal of Artificial Intelligence and Expert Systems, vol. 1(4), pp. 111–122, 2011.

Kim Y.: Convolutional neural networks for sentence classification. In: arXiv preprint arXiv:1408.5882, 2014.

Krizhevsky A., Sutskever I., Hinton G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105. 2012.

LEBRET R.P.: Word Embeddings for Natural Language Processing. Ph.D. thesis, COLE POLYTECHNIQUE FDRALE DE LAUSANNE, 2016.

LeCun Y.: Une procedure d’apprentissage ponr reseau a seuil asymetrique. In: proceedings of Cognitiva 85, pp. 599–604, 1985.

LeCun Y., Bengio Y., Hinton G.: Deep learning. In: nature, vol. 521(7553), p. 436, 2015.

Li X., Roth D.: Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics-Volume 1, pp. 1–7. Association for Computational Linguistics, 2002.

Manning C.D.: Computational linguistics and deep learning. In: Computational Linguistics, vol. 41(4), pp. 701–707, 2015.

Mikolov T., Chen K., Corrado G., Dean J.: Efficient Estimation of Word Representations in Vector Space. In: CoRR, vol. abs/1301.3781, 2013. URL

Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119. 2013.

Miller G.: WordNet: An electronic lexical database. MIT press, 1998.

Miller G.A.: WordNet: a lexical database for English. In: Communications of the ACM, vol. 38(11), pp. 39–41, 1995.

Nair V., Hinton G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814. 2010.

Parker D.B.: Learning logic. In: , 1985.

Paulin M., Revaud J., Harchaoui Z., Perronnin F., Schmid C.: Transformation pursuit for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 3646–3653. IEEE, 2014.

Pennington J., Socher R., Manning C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. 2014.

Ptaszyński M., Leliwa G., Piech M., Smywiński-Pohl A.: Cyberbullying Detection – Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology. In: arXiv preprint arXiv:1808.00926, 2018. URL

Quijas J.K.: Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks. The University of Texas at El Paso, 2017.

Ratner A.J., Ehrenberg H., Hussain Z., Dunnmon J., Ré C.: Learning to Compose Domain-Specific Transformations for Data Augmentation. In: Advances in Neural Information Processing Systems, pp. 3239–3249. 2017.

Rosario R.R.: A Data Augmentation Approach to Short Text Classification. Ph.D. thesis, University of California, Los Angeles, 2017.

Rumelhart D.E., Hinton G.E., Williams R.J.: Learning representations by backpropagating errors. In: nature, vol. 323(6088), p. 533, 1986.

Simard P.Y., Steinkraus D., Platt J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol. 3, pp. 958–962. 2003.

Socher R., Lin C.C., Manning C., Ng A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp. 129–136. 2011.

Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R.:

Dropout: a simple way to prevent neural networks from overfitting. In: Journal

of machine learning research, vol. 15(1), pp. 1929–1958, 2014.

Toutanova K., Klein D., Manning C.D., Singer Y.: Feature-rich part-of-speech

tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference

of the North American Chapter of the Association for Computational Linguistics

on Human Language Technology-Volume 1, pp. 173–180. Association for Compu-

tational Linguistics, 2003.

Wan L., Zeiler M., Zhang S., Le Cun Y., Fergus R.: Regularization of neural

networks using dropconnect. In: International Conference on Machine Learning,

pp. 1058–1066. 2013.

Werbos P.: Beyond regression: new fools for prediction and analysis in the be-

havioral sciences. In: PhD thesis, Harvard University, 1974.

Wong S.C., Gatt A., Stamatescu V., McDonnell M.D.: Understanding data aug-

mentation for classification: when to warp? In: Digital Image Computing: Tech-

niques and Applications (DICTA), 2016 International Conference on, pp. 1–6.

IEEE, 2016.

Young T., Hazarika D., Poria S., Cambria E.: Recent trends in deep learning

based natural language processing. In: arXiv preprint arXiv:1708.02709, 2017.

Zeiler M.D.: ADADELTA: an adaptive learning rate method. In: arXiv preprint

arXiv:1212.5701, 2012.

Zhang X., Zhao J., LeCun Y.: Character-level convolutional networks for text

classification. In: Advances in neural information processing systems, pp. 649–





How to Cite

Jungiewicz, M., & Smywinski-Pohl, A. (2019). Towards textual data augmentation for neural networks: synonyms and maximum loss. Computer Science, 20(1).