IMPROVING REAL-TIME JOB MONITORING FOR DATA PIPELINES IN THE CONTEXT OF CLOUD ANALYTICS

Authors

  • Prakash Somasundaram Lead Software Engineer, Alteryx, Inc, North America. Author

Keywords:

Data Pipelines, Job Monitoring, Cloud Analytics, Multi-cloud, Resource Allocation, Real-time Monitoring, Scalability

Abstract

Technology has created a digital revolution, flooding today's data-driven society with digital information. After 2002, mobile phones, IoT sensors, remote sensing technologies, and other devices fueled the massive growth of digital content. 94% of information is digital, up from 1% decades ago. This digital data flood generates 1.7 MB per person per second or 44 zettabytes per day. Between 2020 and 2025, International Data Group predicts global data will grow exponentially from 44 to 163 zettabytes. Effective processes are needed to analyze and draw conclusions from the growing data deluge. Cloud analytics processes and analyses this sea of data using cutting-edge technology and cloud computing. Cloud analytics uses data pipelines to efficiently move and process data from multiple sources to storage or processing. However, data explosion presents many challenges. Massive and diverse data challenge traditional systems. The volume of data and the urgent need for reliable control and analysis systems were called "big data." Cloud analytics data pipeline efficiency requires real-time task monitoring. To improve real-time job monitoring, this paper discusses automated resource allocation, predictive maintenance using machine learning, Scalability, SLA compliance, user-centric interfaces, error handling, security protocols, cost-effectiveness, RMaaS, edge and IoT data pipeline monitoring, and monitoring solution benchmark. This research emphasizes visibility, early issue identification, and performance optimization to improve cloud analytics operations.

References

M. Hilbert and P. Lopez, "The World's Technological Capacity to Store, Communicate, and Compute Information", vol. 332, no. 6025, pp. 60–65, Feb. 2011, doi: https://doi.org/10.1126/science.1200970.

H. Wuo, "Big Data Processing over Cloud Computing Environment," Journal of Information, Communication and Intelligence Systems, vol. 1, no. 1, pp. 19–25, Mar. 2022, doi: https://doi.org/10.33193/JICIS.1.1.2022.17.

Petroc Taylor, "Data Created Worldwide 2010-2025, Statista, Jun. 07, 2021. https://www.statista.com/statistics/871513/worldwide-data-created/

Reinsel, D., Gantz, J., & Rydning, J. (2017). Data Age 2025: The Evolution of Data to Life-Critical. from Seagate. Framingham, MA, US: International Data Corporation.

Forbes, "Forbes", 2020. Available: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/

M. Abu Sharkh, M. Jammal, A. Shami, and A. Ouda, "Resource allocation in a network-based cloud computing environment: design challenges," IEEE Communications Magazine, vol. 51, no. 11, pp. 46–52, Nov. 2013, doi: https://doi.org/10.1109/mcom.2013.6658651.

"What is a Cloud SLA (Cloud Service-Level Agreement)?," SearchStorage. https://www.techtarget.com/searchstorage/definition/cloud-storage-SLA

J. N. Khasnabish, M. F. Mithani, and S. Rao, "Tier-Centric Resource Allocation in Multi-Tier Cloud Systems," IEEE Transactions on Cloud Computing, vol. 5, no. 3, pp. 576–589, Jul. 2017, doi: https://doi.org/10.1109/tcc.2015.2424888.

X. Cheng, C. Chen, W. Zhang, and Y. Yang, "5G-Enabled Cooperative Intelligent Vehicular (5GenCIV) Framework: When Benz Meets Marconi," IEEE Intelligent Systems, vol. 32, no. 3, pp. 53–59, May 2017, doi: https://doi.org/10.1109/mis.2017.53.

M. Paolanti, L. Romeo, A. Felicetti, A. Mancini, E. Frontoni, and J. Loncarski, "Machine Learning approach for Predictive Maintenance in Industry 4.0," 2018 14th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), Jul. 2018, doi: https://doi.org/10.1109/mesa.2018.8449150.

H. Li et al., "Improving rail network velocity: A machine learning approach to predictive maintenance," Transportation Research Part C: Emerging Technologies, vol. 45, pp. 17–26, Aug. 2014, doi: https://doi.org/10.1016/j.trc.2014.04.013.

R. Alghamdi and M. Bellaiche, "A Deep Intrusion Detection System in Lambda Architecture Based on Edge Cloud Computing for IoT," IEEE Xplore, May 01, 2021. https://ieeexplore.ieee.org/abstract/document/9458974/ (accessed Sep. 17, 2023).

H. Kaur and A. Anand, "Review and analysis of secure energy efficient resource optimization approaches for virtual machine migration in cloud computing," Measurement: Sensors, vol. 24, p. 100504, Dec. 2022, doi: https://doi.org/10.1016/j.measen.2022.100504.

Downloads

Published

2023-09-21