Comparison of incomplete data handling techniques for neuro-fuzzy system

Authors

  • Marcin Sikora Independent researcher
  • Krzysztof Simiński Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science

DOI:

https://doi.org/10.7494/csci.2014.15.4.441

Keywords:

incomplete data, marginalization, imputation, neuro-fuzzy system, ANNBFIS, PDS, IFCM, OCS, NPS

Abstract

Real-life data sets sometimes miss some values. The incomplete data needs specialized algorithms or preprocessing that allows the use of the algorithms for complete data. The paper presents a comparison of various techniques for handling incomplete data in the neuro-fuzzy system ANNBFIS. The crucial procedure in the creation of a fuzzy model for the neuro-fuzzy system is the partition of the input domain. The most popular approach (also used in the ANNBFIS) is clustering. The analyzed approaches for clustering incomplete data are: preprocessing (marginalization and imputation) and specialized clustering algorithms (PDS, IFCM, OCS, NPS). The objective of our research is the comparison of the preprocessing techniques and specialized clustering algorithms to find the the most-advantageous technique for handling incomplete data with a neuro-fuzzy system. This approach is also the indirect validation of clustering.

Downloads

Download data is not yet available.

References

Acuña E., Rodriguez C.: The treatment of missing values and its effect in the classifier accuracy. In: D. Banks, L. House, F. McMorris, P. Arabie, W. G. (eds.), Classification, Clustering and Data Mining Applications, Springer, Berlin, Heidelberg, pp. 639–648. 2004.

Bensaid A. M., Hall L. O., Bezdek J. C., Clarke L.P., Silbiger M. L., Arrington J. A., Murtagh R. F.: Validity-guided (re)clustering with applications to image segmentation. In: Transactions on Fuzzy Systems, vol. 4(2), pp. 112–123, 1996. ISSN 1063-6706.

Box G. E.P., Jenkins G.: Time Series Analysis, Forecasting and Control. Holden-Day, Incorporated, Oakland, California, 1970.

Cooke M., Green P., Josifovski L., Vizinho A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, vol. 34, pp. 267–285, 2001. URL http://dx.doi.org/10.1016/S0167-6393(00)00034-0.

Czekalski P.: Evolution-Fuzzy Rule Based System with parameterized consequences. International Journal of Applied Mathematics and Computer Science, vol. 16(3), pp. 373–385, 2006.

Czogała E., Łęski J.: Fuzzy and Neuro-Fuzzy Intelligent Systems. Series in Fuzziness and Soft Computing. Physica-Verlag, Springer-Verlag Company, Heidelberg, New York, 2000.

Dunn J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact, Well Separated Clusters. Journal Cybernetics, vol. 3(3), pp. 32–57, 1973.

Ghahramani Z., Jordan M.: Learning From Incomplete Data. Tech. rep., Lab Memo No. 1509, CBCL Paper No. 108, MIT AI Lab, 1995.

Grzymała-Busse J., Goodwin L., Grzymala-Busse W., Zheng X.: Handling Missing Attribute Values in Preterm Birth Data Sets. D. Slezak, J. Yao, J. Peters, W. Ziarko, X. Hu, (eds.), Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Lecture Notes in Computer Science, vol. 3642, pp. 342–351. Springer Berlin / Heidelberg, 2005. ISBN 978-3-540-28660-8.

Grzymała-Busse J., Hu M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining. In: W. Ziarko, Y. Yao, (eds.), Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, vol. 2005, pp. 378–385. Springer Berlin / Heidelberg, 2001. ISBN 978-3-540-43074-2.

Hathaway R., Bezdek J.: Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 31(5), pp. 735–744, 2001. ISSN 1083-4419. URL http://dx.doi.org/10.1109/3477.956035.

Jang J. S. R.: ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Transactions on Systems, Man, and Cybernetics, vol. 23(3), pp. 665–684, 1993.

Kalton G., Kasprzyk D.: The treatment of missing survey data. Survey Methodology, vol. 12, pp. 1–16, 1986.

Łęski J.: Systemy neuronowo-rozmyte (Neuro-fuzzy systems). Wydawnictwa Naukowo-Techniczne, Warszawa, 2008. ISBN 978-83-204-3229-9.

Łęski J., Czogała E.: A new artificial neural network based fuzzy inference system with moving consequents in if-then rules and selected applications. Fuzzy Sets and Systems, vol. 108(3), pp. 289–297, 1999. ISSN 0165-0114. URL http://dx.doi.org/10.1016/S0165-0114(97)00314-X.

Mackey M. C., Glass L.: Oscillation and chaos in physiological control systems. Science, vol. 197(4300), pp. 287–289, 1977.

Matyja A., Simiński K.: Comparison of algorithms for clustering incomplete data. Foundations of Computing and Decision Sciences, vol. 39(2), pp. 107–127, 2014. URL http://dx.doi.org/10.2478/fcds-2014-0007.

Mundfrom D.J., Whitcomb A.: Imputing Missing Values: The Effect on the Accuracy of Classification. Multiple Linear Regression Viewpoints, vol. 25(1), pp. 13–19, 1998.

Nelles O., Fink A., Babuška R., Setnes M.: Comparison of Two Construction Algorithms for Takagi-Sugeno Fuzzy Models. International Journal of Applied Mathematics and Computer Science, vol. 10(4), pp. 835–855, 2000.

Nelles O., Isermann R.: Basis function networks for interpolation of local linear models. Proceedings of the 35th IEEE Conference on Decision and Control, vol. 1, pp. 470–475, 1996.

Pal N. R., Bezdek J. C.: On cluster validity for the fuzzy c-means model. Fuzzy Systems, IEEE Transactions on, vol. 3(3), pp. 370–379, 1995.

Reichenbach H.: Wahrscheinlichkeitslogik. Erkenntnis, vol. 5, pp. 37–43, 1935. ISSN 0165-0106. URL http://dx.doi.org/10.1007/BF00172280.

Rubin D.: Multiple Imputation For Nonresponse In Surveys. John Wiley & Sons, Inc., 1987.

Sikora M., Krzystanek Z., Bojko B., Śpiechowicz K.: Application of a hybrid method of machine learning for description and on-line estimation of methane hazard in mine workings. Journal of Mining Sciences, vol. 47(4), pp. 493–505, 2011.

Simiński K.: Neuro-fuzzy system with hierarchical domain partition. In: Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA 2008), pp. 392–397. IEEE Computer Society Publishing, Vienna, Austria, 2008. ISBN 978-0-7695-3514-2. URL http://dx.doi.org/10.1109/CIMCA.2008.67.

Simiński K.: Patchwork neuro-fuzzy system with hierarchical domain partition. In: M. Kurzynski, M. Wozniak (eds.), Computer Recognition Systems 3, Advances in Intelligent and Soft Computing, vol. 57, pp. 11–18. Springer-Verlag, Berlin, Heidelberg, 2009. URL http://dx.doi.org/10.1007/978-3-540-93905-4_2.

Simiński K.: Neuro-rough-fuzzy approach for regression modelling from missing data. International Journal of Applied Mathematics and Computer Science, vol. 22(2), pp. 461–476, 2012. URL http://dx.doi.org/DOI:10.2478/v10006-012-0035-4.

Simiński K.: Clustering with missing values. Fundamenta Informaticae, vol. 123(3), pp. 331–350, 2013.

Simiński K.: Rough fuzzy subspace clustering for data with missing values. Computing & Informatics, vol. 33(1), pp. 131–153, 2014.

Simiński K.: Rough subspace neuro-fuzzy system. Fuzzy Sets and Systems, 2014. ISSN 0165-0114. URL http://dx.doi.org/http://dx.doi.org/10.1016/j.fss.2014.07.003.

Timm H., Döring C., Kruse R.: Different approaches to fuzzy clustering of incomplete datasets. International Journal of Approximate Reasoning, vol. 35(3), pp. 239–249, 2004. ISSN 0888-613X. URL http://dx.doi.org/DOI:10.1016/j.ijar.2003.08.004. Integration of Methods and Hybrid Systems.

Timm H., Kruse R.: Fuzzy cluster analysis with missing values. NAFIPS 1998 Conference of the North American Fuzzy Information Processing Society, pp. 242–246. 1998. URL http://dx.doi.org/10.1109/NAFIPS.1998.715573.

Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics, vol. 17(6), pp. 520–525, 2001. URL http://dx.doi.org/10.1093/bioinformatics/17.6.520.

Wagstaff K. L., Laidler V. G.: Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy. Proceedings of Astronomical Data Analysis Software and Systems XIV, vol. 347, pp. 172–176. Pasadena, California, USA, 2005.

Xie X., Beni G.: A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13(8), pp. 841–847, 1991.

Zhang C., Zhu X., Zhang J., Qin Y., Zhang S.: GBKII: An Imputation Method for Missing Values. Advances in Knowledge Discovery and Data Mining, vol. 4426, pp. 1080–1087, 2007.

Zhang S.: Shell-neighbor method and its application in missing data imputation. In: Applied Intelligence, vol. 35(1), pp. 123–133, 2011. ISSN 0924-669X. URL http://dx.doi.org/10.1007/s10489-009-0207-6.

Downloads

Published

2014-11-25

How to Cite

Sikora, M., & Simiński, K. (2014). Comparison of incomplete data handling techniques for neuro-fuzzy system. Computer Science, 15(4), 441. https://doi.org/10.7494/csci.2014.15.4.441

Issue

Section

Articles