• Che Ngufor
  • Janusz Wojtusiak




The task identifying changes and irregularities in medical insurance claim pay-ments is a difficult process of which the traditional practice involves queryinghistorical claims databases and flagging potential claims as normal or abnor-mal. Because what is considered as normal payment is usually unknown andmay change over time, abnormal payments often pass undetected; only to bediscovered when the payment period has passed.This paper presents the problem of on-line unsupervised learning from datastreams when the distribution that generates the data changes or drifts overtime. Automated algorithms for detecting drifting concepts in a probabilitydistribution of the data are presented. The idea behind the presented driftdetection methods is to transform the distribution of the data within a slidingwindow into a more convenient distribution. Then, a test statistics p-value ata given significance level can be used to infer the drift rate, adjust the windowsize and decide on the status of the drift. The detected concepts drifts areused to label the data, for subsequent learning of classification models by asupervised learner. The algorithms were tested on several synthetic and realmedical claims data sets.


Download data is not yet available.


Cappelli C., Penny R., Rea W., Reale M.: Detecting multiple mean breaks at unknown points in official time series. Mathematics and Computers in Simulation, 78(2):351–356, 2008.

Easterling D., Peterson T.: A new method for detecting undocumented discontinuities in climatological time series. International Journal of Climatology, 15(4):369–377, 1995.

Hadas D., Yovel G., Intrator N.: Using unsupervised incremental learning to cope with gradual concept drift. Connection Science, 23(1):65–83, 2011.

Hinkley D.: Inference about the change-point in a sequence of random variables. Biometrika, 57(1):1–17, 1970.

Karunanithi A., Cabezas H., Frieden B., Pawlowski C.: Detection and assessment of ecosystem regime shifts from fisher information. Ecology and Society, 13(1):22, 2008.

Lazarescu M., Venkatesh S., Bui H.: Using multiple windows to track concept drift. Intelligent Data Analysis, 8(1):29–60, 2004.

Loader C.: Change point estimation using nonparametric regression. The Annals of Statistics, 24(4):1667–1678, 1996.

Mantua N.: Methods for detecting regime shifts in large marine ecosystems: a review with approaches applied to north pacific data. Progress in Oceanography, 60(2):165–182, 2004.

Page E.: Continuous inspection schemes. Biometrika, pp. 100–115, 1954.

Rodionov S.: A brief overview of the regime shift detection methods. In Large-scale disturbances (regime shifts) and recovery in aquatic ecosystems: challenges for management toward sustainability. UNESCO-ROSTE/BAS Workshop

on Regime Shifts, Varna, Bulgaria, pp. 17–24, 2005.

Sharifzadeh M., Azmoodeh F., Shahabi C.: Change detection in time series data using wavelet footprints. Advances in Spatial and Temporal Databases, pp. 923–923, 2005.

Son Y., Kim S.: Bayesian single change point detection in a sequence of multi-variate normal observations. Statistics, 39(5):373–387, 2005.

Tsymbal A.: The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin, 2004.

Widmer G., Kubat M.: Effective learning in dynamic environments by explicit context tracking. In Machine Learning: ECML-93, pp. 227–243. Springer, 1993.

Wojtusiak J., Michalski R., Kaufman K., Pietrzykowski J.: The aq21 natural induction program for pattern discovery: initial version and its novel features. In Tools with Artificial Intelligence, 2006. ICTAI’06. 18th IEEE International

Conference on, pp. 523–526. IEEE, 2006.

Wojtusiak J., Ngufor C., Shiver J., Ewald R.: Rule-based prediction of medical claims’ payments: A method and initial application to medicaid data. In MachineLearning and Applications and Workshops (ICMLA), 2011 10th International

Conference on, vol. 2, pp. 162–167. IEEE, 2011.

Wojtusiak J., Ngufor C., Shiver J., Ewald R.: Development and testing of artificial intelligence application for healthcare financial management: Methods and initial results. Technical Report, Reports of the Machine Learning and Inference

Laboratory, 2013.




How to Cite

Ngufor, C., & Wojtusiak, J. (2013). UNSUPERVISED LABELING OF DATA FOR SUPERVISED LEARNING AND ITS APPLICATION TO MEDICAL CLAIMS PREDICTION. Computer Science, 14(2), 191. https://doi.org/10.7494/csci.2013.14.2.191