IDENTIFYING OUTLIERS IN POPULATION COHORTS: A BENCHMARK SCORE APPROACH WITH ML
Keywords:
Outlier Detection, ML Hyperparameter TuningAbstract
This paper presents a novel framework for outlier detection based on group-specific predictions. A machine learning model is trained, and predictions are made for test groups using control and no-control techniques. Group predictions are compared to actual outcomes, and significant discrepancies are flagged as outliers. The framework leverages ML Hyperparameter tuning techniques to optimize model performance, enabling robust and automated detection of outlier groups.
References
C. C. Aggarwal, Outlier Analysis, 2nd ed. New York, NY, USA: Springer, 2017. DOI: 10.1007/978-3-319-47578-3.
V. Chandola, A. Banerjee, and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Surveys, vol. 41, no. 3, pp. 1-58, 2009. DOI: 10.1145/1541880.1541882.
V. J. Hodge and J. Austin, "A Survey of Outlier Detection Methodologies," Artificial Intelligence Review, vol. 22, no. 2, pp. 85-126, 2004. DOI: 10.1023/B.0000045502.10941.a9.
T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 2016, pp. 785-794. DOI: 10.1145/2939672.2939785.
H. Zhao, Z. Cai, Z. Zhang, T. Wang, and X. Zhang, "XGBoost Algorithm and Its Application in Machine Learning," in Proc. 2021 IEEE 6th Int. Conf. Big Data Analytics (ICBDA), Xiamen, China, 2021, pp. 33-38. DOI: 10.1109/ICBDA51977.2021.9447520.
L. Xu and R. Goodacre, "On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap, and Systematic Sampling for Estimating Prediction Error in QSAR Modeling," J. Chemometrics, vol. 32, no. 8, e2994, 2018. DOI: 10.1002/cem.2994.
W. J. Conover, Practical Nonparametric Statistics, 3rd ed. New York, NY, USA: Wiley, 1999.
R. J. G. B. Campello, D. Moulavi, and J. Sander, "Density-Based Clustering Based on Hierarchical Density Estimates," in Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, 2013, pp. 160-172. DOI: 10.1145/2487575.2488173.
L. McInnes, J. Healy, and S. Astels, "hdbscan: Hierarchical Density Based Clustering," Journal of Open Source Software, vol. 2, no. 11, p. 205, 2017. DOI: 10.21105/joss.00205.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Vibhu Verma (Author)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.