A COMPARATIVE STUDY OF BIG DATA MINING ALGORITHMS FOR EARLY DETECTION OF HEART ATTACK RISK FACTORS IN ELECTRONIC MEDICAL RECORDS

Authors

  • Ramesh Krishnamaneni Independent Researcher, USA Author
  • Ashwin Narasimha Murthy Independent Researcher, USA Author
  • Souptik Sen Independent Researcher, USA Author

Keywords:

Heart Attack Risk Prediction, Big Data Mining, Random Forest, Support Vector Machine, Electronic Medical Records, Machine Learning In Healthcare, Cardiovascular Disease, Early Detection

Abstract

The early detection of heart attack risk factors is critical for reducing cardiovascular disease-related mortality. This study presents a comparative analysis of five big data mining algorithms—Decision Tree, Random Forest, Support Vector Machine (SVM), Artificial Neural Network (ANN), and K-Nearest Neighbors (KNN)—using Electronic Medical Records (EMRs) to predict heart attack risk. The results show that Random Forest outperformed the other models, achieving the highest accuracy, precision, recall, and F1-score. SVM also demonstrated strong performance, while KNN lagged behind in both accuracy and prediction efficiency. The findings suggest that advanced machine learning models, particularly Random Forest, offer significant potential for improving early detection and healthcare decision-making.

References

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems, 2(1), 1-10. doi:10.1186/2047-2501-2-3

Kuo, M.-H., Sahama, T., Kushniruk, A. W., Borycki, E. M., & Grunwell, D. K. (2014). Health big data analytics: Current perspectives, challenges and potential solutions. International Journal of Big Data Intelligence, 1(1-2), 114-126.

Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.

Cios, K. J., & Moore, G. W. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1-2), 1-24. doi:10.1016/S0933-3657(02)00049-0

Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451-462. doi:10.1109/5326.897072

Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag. doi:10.1007/978-1-4757-2440-0

Wilson, P. W., D'Agostino, R. B., Levy, D., Belanger, A. M., Silbershatz, H., & Kannel, W. B. (1998). Prediction of coronary heart disease using risk factor categories. Circulation, 97(18), 1837-1847. doi:10.1161/01.CIR.97.18.1837

Kannel, W. B., McGee, D. L., & Gordon, T. (1976). A general cardiovascular risk profile: The Framingham Study. The American Journal of Cardiology, 38(1), 46-51. doi:10.1016/0002-9149(76)90061-8

Lloyd-Jones, D. M., Larson, M. G., Beiser, A., & Levy, D. (2004). Lifetime risk of developing coronary heart disease. The Lancet, 353(9147), 89-92. doi: 10.1016/S0140-6736(98)10279-9

D'Agostino, R. B., Vasan, R. S., Pencina, M. J., Wolf, P. A., Cobain, M., Massaro, J. M., & Kannel, W. B. (2008). General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation, 117(6), 743-753. doi:10.1161/CIRCULATIONAHA.107.699579

Tzoulaki, I., Siontis, K. C., & Ioannidis, J. P. A. (2013). Prognosis research in healthcare: Conceptual challenges and practical issues. BMJ, 346, e5595. doi:10.1136/bmj.e5595

Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future — Big data, machine learning, and clinical medicine. New England Journal of Medicine, 375(13), 1216-1219. doi:10.1056/NEJMp1606181

Chawla, N. V., & Davis, D. A. (2013). Bringing big data to personalized healthcare: A patient-centered framework. Journal of General Internal Medicine, 28(3), 660-665. doi:10.1007/s11606-013-2455-8

Downloads

Published

2019-12-24