AWS STEP FUNCTIONS DISTRIBUTED MAP: A COMPREHENSIVE FRAMEWORK FOR SCALABLE DATA PROCESSING

Authors

  • Venkata Reddy Mulam Jawaharlal Nehru Technological University, Kakinada, India Author

Keywords:

Serverless Workflow Orchestration, Distributed Map Processing, Cloud-Native State Machines, AWS Step Functions Architecture, Parallel Data Processing Optimization

Abstract

This article presents a comprehensive analysis of AWS Step Functions' capabilities in distributed data processing, with particular emphasis on the Distributed Map pattern for parallel processing implementations. Through rigorous examination of real-world deployments, including a large-scale genomic data processing system handling 10 petabytes monthly and a real-time financial transaction monitoring system processing over 1 million transactions per hour, we demonstrate the platform's effectiveness in managing complex distributed workflows. The article provides detailed insights into performance optimization techniques, achieving sub-second response times for 95% of transactions and sustained processing rates of 500,000 records per second. Our analysis reveals that optimized workflow designs can reduce execution costs by up to 40% compared to naive implementations while maintaining high reliability and scalability. The article encompasses architectural patterns, error-handling strategies, and operational best practices, supported by extensive performance metrics and cost analyses. Key contributions include a framework for implementing efficient state machine structures, guidelines for monitoring and observability, and validated patterns for handling large-scale data processing scenarios. These findings offer valuable insights for organizations seeking to implement robust distributed processing solutions using AWS Step Functions, while highlighting opportunities for future advancement in cloud-native workflow orchestration.

References

Farahani, Reza & Loh, Frank & Roman, Dumitru & Prodan, Radu. (2024). Serverless Workflow Management on the Computing Continuum: A Mini-Survey. 146-150. 10.1145/3629527.3652901. Available: https://dl.acm.org/doi/10.1145/3629527.3652901

M. Goudarzi, "Heterogeneous Architectures for Big Data Batch Processing in MapReduce Paradigm," in IEEE Transactions on Big Data, vol. 5, no. 1, pp. 18-33, 1 March 2019,

Available: https://ieeexplore.ieee.org/abstract/document/8006298

Szalay, M.; Mátray, P.; Toka, L. State Management for Cloud-Native Applications. Electronics 2021, 10, 423. https://doi.org/10.3390/electronics10040423

[4] M. Ramesh, C. Phalak, D. Chahal and R. Singhal, "Optimal Mapping of Workflows Using Serverless Architecture in a Multi-Cloud Environment," 2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C), Hyderabad, India, 2024, pp. 252-259, doi: 10.1109/ICSA-C63560.2024.00053.Available: https://ieeexplore.ieee.org/document/10628213

Risco, Sebastián, et al. "Serverless workflows for containerised applications in the cloud continuum." Journal of Grid Computing 19 (2021): 1-18. Available: https://link.springer.com/article/10.1007/s10723-021-09570-2

Y. Wang et al., " The intelligent prediction and assessment of financial information risk in the cloud computing model” Available: https://arxiv.org/pdf/2404.09322

J Wen et al. “A Measurement Study on Serverless Workflow Services” Available: https://wenjinfeng.github.io/data/ICWS21-A%20Measurement%20Study%20on%20Serverless%20Workflow%20Services.pdf

A. Tosatto, P. Ruiu and A. Attanasio, "Container-Based Orchestration in Cloud: State of the Art and Challenges," 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems, Santa Catarina, Brazil, 2015, pp. 70-75, doi: 10.1109/CISIS.2015.35.

Available: https://ieeexplore.ieee.org/abstract/document/7185168

Mustafa Daraghmeh, Anjali Agarwal, Yaser Jararweh, Optimizing serverless computing: A comparative analysis of multi-output regression models for predictive function invocations,

Simulation Modelling Practice and Theory, Volume 134, 2024, 102925, ISSN 1569-190X,

https://doi.org/10.1016/j.simpat.2024.102925

Jungeun Shin, Diana Arroyo, Asser Tantawi, Chen Wang, Alaa Youssef, and Rakesh Nagi. 2022. Cloud-native workflow scheduling using a hybrid priority rule and dynamic task parallelism. In Proceedings of the 13th Symposium on Cloud Computing (SoCC '22). Association for Computing Machinery, New York, NY, USA, 72–77. https://doi.org/10.1145/3542929.3563495

Downloads

Published

2024-11-08

How to Cite

Venkata Reddy Mulam. (2024). AWS STEP FUNCTIONS DISTRIBUTED MAP: A COMPREHENSIVE FRAMEWORK FOR SCALABLE DATA PROCESSING. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY (IJCET), 15(6), 169-176. https://mylib.in/index.php/IJCET/article/view/IJCET_15_06_014