Machine learning models for predicting patients survival after liver transplantation

Wojciech Jarmulski; Alicja Wieczorkowska; Mariusz Trzaska; Michal Ciszek; Leszek Paczek

doi:10.7494/csci.2018.19.2.2746

Authors

Wojciech Jarmulski Polish-Japanese Academy of Information Technology http://orcid.org/0000-0003-3508-4606
Alicja Wieczorkowska Polish-Japanese Academy of Information Technology
Mariusz Trzaska Polish-Japanese Academy of Information Technology
Michal Ciszek Medical University of Warsaw
Leszek Paczek Medical University of Warsaw

DOI:

https://doi.org/10.7494/csci.2018.19.2.2746

Keywords:

machine learning, models interpretability, survival prediction, generalized additive models, liver transplantation

Abstract

In our work we have built models predicting whether a patient will lose an organ after liver transplantation within a specified time horizon. We have used the observations of bilirubin and creatinine in the whole first year after the transplantation to derive predictors capturing not only their static value but also variability. Our models indeed have predictive power which proves the value of incorporating variability of biochemical measurements and it is the first contribution of our paper.
The second one is the selection of the best model for the defined problem. We have identified that full-complexity models, such as random forests and gradient boosting, despite having the best predictive power, lack sufficient interpretability which is important in medicine. We have found that generalized additive models (GAM) provide desired interpretability and their predictive power is closer to the predictions of full-complexity models than to the predictions of simple linear models.

Downloads

References

Boyd J.: Statistical analysis and presentation of data. In: Evidence-Based Laboratory Medicine, pp. 113–140, AACC Press Washington, DC, 2007.

Breiman L.: Random forests, Machine Learning, vol. 45(1), pp. 5–32, 2001.

Caruana R., Lou Y., Gehrke J., Koch P., Sturm M., Elhadad N.: Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730. ACM, 2015.

Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P.: SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

Chen T., Guestrin C.: XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016.

Cholongitas E., Marelli L., Shusang V., Senzolo M., Rolles K., Patch D., Burroughs A.K.: A systematic review of the performance of the model for end-stage liver disease (MELD) in the setting of liver transplantation, Liver Transplantation, vol. 12(7), pp. 1049–1061, 2006.

Fernández-Delgado M., Cernadas E., Barro S., Amorim D.: Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, vol. 15(1), pp. 3133–3181, 2014.

Friedman J.H.: Greedy function approximation: a gradient boosting machine, Annals of Statistics, vol. 29(5), pp. 1189–1232, 2001.

Friedman J.H.: Stochastic gradient boosting, Computational Statistics & Data Analysis, vol. 38(4), pp. 367–378, 2002.

Habib S., Berk B., Chang C.C.H., Demetris A.J., Fontes P., Dvorchik I., Eghtesad B., Marcos A., Shakil A.O.: MELD and prediction of post-liver transplantation survival, Liver Transplantation, vol. 12(3), pp. 440–447, 2006.

Hastie T.J., Tibshirani R.J., Friedman J.: The elements of statistical learning: data mining, inference, and prediction. Second Edition, Springer, 2009.

Hastie T.J., Tibshirani R.J.: Generalized additive models, CRC Press, 1990.

Herland M., Khoshgoftaar T.M., Wald R.: A review of data mining using big data in health informatics, Journal of Big Data, vol. 1(1), pp. 1–35, 2014.

Kuhn M., Johnson K.: Applied predictive modeling, Springer, 2013.

Liu K.H., Huang D.S.: Cancer classification using Rotation Forest, Computers in Biology and Medicine, vol. 38(5), pp. 601–610, 2008.

Lou Y., Caruana R., Gehrke J.: Intelligible models for classification and regression. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 150–158. ACM, 2012.

Luca A., Angermayr B., Bertolini G., Koenig F., Vizzini G., Ploner M., PeckRadosavljevic M., Gridelli B., Bosch J.: An integrated MELD model including serum sodium and age improves the prediction of early mortality in patients with cirrhosis, Liver Transplantation, vol. 13(8), pp. 1174–1180, 2007.

Mazzaferro V., Llovet J.M., Miceli R., Bhoori S., Schiavo M., Mariani L., Camerini T., Roayaie S., Schwartz M.E., Grazi G.L., Adam R., Neuhaus P., Salizzoni M., Bruix J., Forner A., De Carlis L., Cillo U., Burroughs A.K., Troisi R., Rossi M., Gerunda G.E., Lerut J., Belghiti J., Boin I., Gugenheim J., Rochling F., Van Hoek B., Majno P., Metroticket Investigator Study Group: Predicting survival after liver transplantation in patients with hepatocellular carcinoma beyond the Milan criteria: a retrospective, exploratory analysis, The Lancet Oncology, vol. 10(1), pp. 35–43, 2009.

Menardi G., Torelli N.: Training and assessing classification rules with imbalanced data, Data Mining and Knowledge Discovery, vol. 28(1), pp. 92–122, 2014.

Miotto R., Li L., Kidd B.A., Dudley J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records, Scientific Reports, vol. 6, p. 26094, 2016. http://dx.doi.org/10.1038/srep26094.

Ozcift A., Gulten A.: Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms, Computer Methods and Programs in Biomedicine, vol. 104(3), pp. 443–451, 2011.

Pratt D.S., Kaplan M.M.: Evaluation of liver function. In: Harrison’s Principles of Internal Medicine, 17th ed., pp. 1923–1926. McGraw-Hill Medical Publishing Division, New York, 2008.

Roberts M.S., Angus D.C., Bryce C.L., Valenta Z., Weissfeld L.: Survival after liver transplantation in the United States: a disease-specific analysis of the UNOS database, Liver Transplantation, vol. 10(7), pp. 886–897, 2004.

Thongkam J., Xu G., Zhang Y., Huang F.: Breast cancer survivability via AdaBoost algorithms. In: Proceedings of the second Australasian workshop on Health data and knowledge management, vol. 80, pp. 55–64, 2008.

Tsujitani M., Tanaka Y.: Analysis of heart transplant survival data using generalized additive models, Computational and Mathematical Methods in Medicine, 2013.

Watt K., Menke T., Lyden E., McCashland T.M.: Mortality while awaiting liver retransplantation: predictability of MELD scores, Transplantation Proceedings, vol. 37, pp. 2172–2173, 2005.

Wood S.: Generalized additive models: an introduction with R, CRC Press, 2006.