IDENTIFYING OUTLIERS IN POPULATION COHORTS: A BENCHMARK SCORE APPROACH WITH ML

Vibhu Verma

Authors

Vibhu Verma Principal Data Scientist, GWU, Capital One, NY, USA. Author

Keywords:

Outlier Detection, ML Hyperparameter Tuning

Abstract

This paper presents a novel framework for outlier detection based on group-specific predictions. A machine learning model is trained, and predictions are made for test groups using control and no-control techniques. Group predictions are compared to actual outcomes, and significant discrepancies are flagged as outliers. The framework leverages ML Hyperparameter tuning techniques to optimize model performance, enabling robust and automated detection of outlier groups.

References

C. C. Aggarwal, Outlier Analysis, 2nd ed. New York, NY, USA: Springer, 2017. DOI: 10.1007/978-3-319-47578-3.

V. Chandola, A. Banerjee, and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Surveys, vol. 41, no. 3, pp. 1-58, 2009. DOI: 10.1145/1541880.1541882.

V. J. Hodge and J. Austin, "A Survey of Outlier Detection Methodologies," Artificial Intelligence Review, vol. 22, no. 2, pp. 85-126, 2004. DOI: 10.1023/B.0000045502.10941.a9.

T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 2016, pp. 785-794. DOI: 10.1145/2939672.2939785.

H. Zhao, Z. Cai, Z. Zhang, T. Wang, and X. Zhang, "XGBoost Algorithm and Its Application in Machine Learning," in Proc. 2021 IEEE 6th Int. Conf. Big Data Analytics (ICBDA), Xiamen, China, 2021, pp. 33-38. DOI: 10.1109/ICBDA51977.2021.9447520.

L. Xu and R. Goodacre, "On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap, and Systematic Sampling for Estimating Prediction Error in QSAR Modeling," J. Chemometrics, vol. 32, no. 8, e2994, 2018. DOI: 10.1002/cem.2994.

W. J. Conover, Practical Nonparametric Statistics, 3rd ed. New York, NY, USA: Wiley, 1999.

R. J. G. B. Campello, D. Moulavi, and J. Sander, "Density-Based Clustering Based on Hierarchical Density Estimates," in Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, 2013, pp. 160-172. DOI: 10.1145/2487575.2488173.

L. McInnes, J. Healy, and S. Astels, "hdbscan: Hierarchical Density Based Clustering," Journal of Open Source Software, vol. 2, no. 11, p. 205, 2017. DOI: 10.21105/joss.00205.

L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324.

IDENTIFYING OUTLIERS IN POPULATION COHORTS: A BENCHMARK SCORE APPROACH WITH ML

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

cover