Predicting quality of life using machine learning on community health data

Missing data is a common issue in community health and epidemiological studies. Direct removal of samples with missing data can lead to reduced sample size and information bias, which deteriorates the significance of the results. While data imputation methods are available to deal with missing data, they are limited in performance and could introduce noises into the dataset. Instead of data imputation, a novel method based on additive least square support vector machine (LS-SVM) is proposed in this paper for predictive modeling when the input features of the model contain missing data. The method also determines simultaneously the influence of the features with missing values on the classification accuracy using the fast leave-one-out cross-validation strategy. The performance of the method is evaluated by applying it to predict the quality of life (QOL) of elderly people using health data collected in the community.

Publications:

  • G. Wang, Z. Deng, K.S. ChoiTackling missing data in community health studies using additive LS-SVM classifier. IEEE Journal of Biomedical Health Informatics, vol. 22, no. 2, pp. 579-587, March 2018 (Available online 1 Dec 2016). [doi]