A COMPARATIVE STUDY OF BIG DATA MINING ALGORITHMS FOR EARLY DETECTION OF HEART ATTACK RISK FACTORS IN ELECTRONIC MEDICAL RECORDS
Keywords:
Heart Attack Risk Prediction, Big Data Mining, Random Forest, Support Vector Machine, Electronic Medical Records, Machine Learning In Healthcare, Cardiovascular Disease, Early DetectionAbstract
The early detection of heart attack risk factors is critical for reducing cardiovascular disease-related mortality. This study presents a comparative analysis of five big data mining algorithms—Decision Tree, Random Forest, Support Vector Machine (SVM), Artificial Neural Network (ANN), and K-Nearest Neighbors (KNN)—using Electronic Medical Records (EMRs) to predict heart attack risk. The results show that Random Forest outperformed the other models, achieving the highest accuracy, precision, recall, and F1-score. SVM also demonstrated strong performance, while KNN lagged behind in both accuracy and prediction efficiency. The findings suggest that advanced machine learning models, particularly Random Forest, offer significant potential for improving early detection and healthcare decision-making.
References
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems, 2(1), 1-10. doi:10.1186/2047-2501-2-3
Kuo, M.-H., Sahama, T., Kushniruk, A. W., Borycki, E. M., & Grunwell, D. K. (2014). Health big data analytics: Current perspectives, challenges and potential solutions. International Journal of Big Data Intelligence, 1(1-2), 114-126.
Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
Cios, K. J., & Moore, G. W. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1-2), 1-24. doi:10.1016/S0933-3657(02)00049-0
Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451-462. doi:10.1109/5326.897072
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag. doi:10.1007/978-1-4757-2440-0
Wilson, P. W., D'Agostino, R. B., Levy, D., Belanger, A. M., Silbershatz, H., & Kannel, W. B. (1998). Prediction of coronary heart disease using risk factor categories. Circulation, 97(18), 1837-1847. doi:10.1161/01.CIR.97.18.1837
Kannel, W. B., McGee, D. L., & Gordon, T. (1976). A general cardiovascular risk profile: The Framingham Study. The American Journal of Cardiology, 38(1), 46-51. doi:10.1016/0002-9149(76)90061-8
Lloyd-Jones, D. M., Larson, M. G., Beiser, A., & Levy, D. (2004). Lifetime risk of developing coronary heart disease. The Lancet, 353(9147), 89-92. doi: 10.1016/S0140-6736(98)10279-9
D'Agostino, R. B., Vasan, R. S., Pencina, M. J., Wolf, P. A., Cobain, M., Massaro, J. M., & Kannel, W. B. (2008). General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation, 117(6), 743-753. doi:10.1161/CIRCULATIONAHA.107.699579
Tzoulaki, I., Siontis, K. C., & Ioannidis, J. P. A. (2013). Prognosis research in healthcare: Conceptual challenges and practical issues. BMJ, 346, e5595. doi:10.1136/bmj.e5595
Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future — Big data, machine learning, and clinical medicine. New England Journal of Medicine, 375(13), 1216-1219. doi:10.1056/NEJMp1606181
Chawla, N. V., & Davis, D. A. (2013). Bringing big data to personalized healthcare: A patient-centered framework. Journal of General Internal Medicine, 28(3), 660-665. doi:10.1007/s11606-013-2455-8