ADVANCING DATA LINEAGE ACCURACY WITH GENERATIVE AI: NEW TECHNIQUES AND TOOLS

Ankush Reddy Sugureddy

Authors

Ankush Reddy Sugureddy Lead Engineer, Data Insights, Cloudflare Inc, Dallas, USA. Author

Keywords:

Generative AI, Data Lineage, AI

Abstract

The popularity of open-source AI code and models is on the rise, particularly among smaller firms, research institutions, and individual users. This is so even if tech companies are progressively controlling and dominating the growing market for generative AI. Unfortunately, they are unable to make this data publicly available for training purposes due to limited computational resources and concerns about data protection. Unfortunately, they are often unable to disclose high-quality data that could be used for training purposes. One possible solution to these two issues could be to train generative AI using crowd-sourcing principles and federated learning techniques to build a distributed architecture that protects privacy. We address in this paper the ways in which these two important enablers, in conjunction with other new technologies, might be put together to form a community-driven ecosystem for generative AI. In this way, even minor players in the ecosystem will be able to safely provide training data for generative AI models. In addition to outlining future research objectives in AI moderation, the report also discusses relevant non-technical issues, such as community duty and intellectual property rights.

References

Birch, K., Cochrane, D., Ward, C., 2021. Data as asset? The measurement, governance, and valuation of digital personal data by Big Tech. Big Data Soc. 8 (1), 20539517211017308.

Abraham, R., Schneider, J., Vom Brocke, J., 2019. Data governance: A conceptual framework, structured review, and research agenda. Int. J. Inf. Manage. 49 (1), 424–438.

Janssen, M., Brous, P., Estevez, E., Barbosa, L.S., Janowski, T., 2020. Data governance: Organizing data for trustworthy artificial intelligence. Gov. Inf. Q. 37 (3), 101493.

Patel S., Rahevar M., Parmar M. Data provenance and data lineage in the cloud: A survey Int. J. Adv. Sci. Technol., 29 (05) (2020), pp. 4883-4900

Freche J., Heijer M., Wormuth B. Data lineage Digit. Journey Bank. Insur. (2021)

Zhao Y., Yang K., Chen S., Zhang Z., Huang X., Li Q., Ma Q., Luan X., Fan X. A benchmark for visual analysis of insider threat detection Sci. China Inf. Sci., 65 (09) (2022), pp. 199102:1-4

Zhao Y., Lv S., Long W., Fan Y., Yuan J., Jiang H., Zhou F. Malicious webshell family dataset for webshell multi-classification research Visual Inform. (2023)

Zhao Y., Zhao X., Chen S., Zhang Z., Huang X. An indoor crowd movement trajectory benchmark dataset IEEE Trans. Reliab., 70 (04) (2021), pp. 1368-1380

V. Talla, M. Hessar, B. Kellogg, A. Najafi, J. R. Smith, and S. Gollakota, “LoRa Backscatter: Enabling The Vision of Ubiquitous Connectivity,” Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, vol. 1, no. 3, pp. 1–24, 2017, doi: https://doi.org/10.1145/ 3130970.

M. R. Ebling, “Pervasive Computing and the Internet of Things,” IEEE Pervasive Computing, vol. 15, no. 1, pp. 2–4, 2016, doi: https://doi.org/ 10.1109/MPRV.2016.7.

R. Dautov and S. Distefano, “Three-level hierarchical data fusion through the IoT, edge, and cloud computing,” in Proceedings of the 1st International Conference on Internet of Things and Machine Learning. ACM New York, NY, USA, 2017, pp. 1–5, doi: https://doi.org/10.1145/ 3109761.3158388.

R. Dautov, S. Distefano, D. Bruneo, F. Longo, G. Merlino, and A. Puliafito, “Pushing intelligence to the edge with a stream processing architecture,” in 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). IEEE, 2017, pp. 792–799, doi: https: //doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData.2017.121.

X. Wang, Y. Han, V. C. Leung, D. Niyato, X. Yan, and X. Chen, Edge AI: Convergence of edge computing and artificial intelligence. Springer, 2020, doi: https://doi.org/10.1007/978-981-15-6186-3.

E. J. Husom, R. Dautov, A. Nedisan Videsjorden, F. Gonidis, S. Papatzelos, and N. Malamas, “Machine Learning for Fatigue Detection using Fitbit Fitness Trackers,” in Proceedings of the 10th International Conference on Sport Sciences Research and Technology Support - icSPORTS, INSTICC. SciTePress, 2022, pp. 41–52, doi: https://doi. org/10.5220/0011527500003321.

R. Dautov, E. J. Husom, F. Gonidis, S. Papatzelos, and N. Malamas, “Bridging the Gap Between Java and Python in Mobile Software Development to Enable MLOps,” in 2022 18th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). IEEE, 2022, pp. 363–368, doi: https://doi.org/10.1109/ WiMob55322.2022.9941679.

R. Dautov, S. Distefano, G. Merlino, D. Bruneo, F. Longo, and A. Puliafito, “Towards a Global Intelligent Surveillance System,” in Proceedings of the 11th International Conference on Distributed Smart Cameras. ACM New York, NY, USA, 2017, pp. 119–124, doi: https://doi.org/10.1145/3131885.3131918.

R. Dautov, S. Distefano, D. Bruneo, F. Longo, G. Merlino, A. Puliafito, and R. Buyya, “Metropolitan intelligent surveillance systems for urban areas by harnessing IoT and edge computing paradigms,” Software: Practice and experience, vol. 48, no. 8, pp. 1475–1492, 2018, doi: https://doi.org/10.1002/spe.2586

ADVANCING DATA LINEAGE ACCURACY WITH GENERATIVE AI: NEW TECHNIQUES AND TOOLS

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

cover