DIGITIZED DOCUMENT CLASSIFICATION USING MACHINE AND DEEP LEARNING A SURVEY
Keywords:
Data Mining, Natural Language Process, Classifier, Text classification, Machine Learning , Deep LearningAbstract
The digital transformation has led to a widespread application of artificial intelligence (AI) technology to overcome human-induced errors in a range of systems utilized in our daily lives. The World Wide Web's rapid expansion has made it impossible for humans to classify information, which has sparked the development of methods like data mining, natural language processing, and machine learning for the automatic classification of textual documents. Due to the abundance of information available from many sources, classification jobs have become increasingly important. One important way to handle and process a large number of digital documents is through automated text classification. This essay offers an understanding of the steps involved in text classification as well as different classifiers. Additionally, it seeks to evaluate and contrast different classifiers that are currently accessible based on a few parameters, including performance and time complexity
References
A. Cruz-roa, J. C. Caicedo, and F. A. González, “Artificial Intelligence in Medicine Visual pattern mining in histology image collections using bag of features,” Artif. Intell. Med., vol. 52, no. 2, pp. 91–106, 2011, doi: 10.1016/j.artmed.2011.04.010.
M. Puttagunta and S. Ravi, “Medical image analysis based on deep learning approach,” Multimed. Tools Appl., vol. 80, no. 16, pp. 24365–24398, 2021, doi: 10.1007/s11042-021- 10707-4
D. Jaswal, S. V, and K. P. Soman, “Image Classification Using Convolutional Neural Networks,” Int. J. Sci. Eng. Res., vol. 5, no. 6, pp. 1661–1668, 2014, doi: 10.14299/ijser.2014.06.002
W. Liu, Z. Wang, X. Liu, Y. Liu, and F. E. Alsaadi, “A Survey of Deep Neural Network Architectures and Their Applications,” Neurocomputing, 2016, doi: 10.1016/j.neucom.2016.12.038.
D. R. Sarvamangala and R. V. Kulkarni, “Convolutional neural networks in medical image understanding: a survey,” Evol. Intell., no. 0123456789, 2021, doi: 10.1007/s12065-020- 00540-3.
S. Bauer, R. Wiest, L. Nolte, and M. Reyes, “A survey of MRI-based medical image analysis for brain tumor studies,” vol. 97, 2013, doi: 10.1088/0031-9155/58/13/R97.
M. A. Mazurowski, M. Buda, A. Saha, and M. R. Bashir, “Deep Learning in Radiology : An Overview of the Concepts and a Survey of the State of the Art With Focus on MRI,” 2018, doi: 10.1002/jmri.26534.
Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 991–995
Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layerwise-trained deep convolutional neural networks for document image classification. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 1273–1278
Csurka G (2017) Document image classification, with a specific view on applications of patent images. In: Current challenges in patent information retrieval, Springer, pp 325–350
Tensmeyer C, Martinez T (2017) Analysis of convolutional neural networks for document image classification. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 388–393
S¸ahin S et al (2020) Dijital Dok¨umanların Anahtar Kelime Tabanlı Dogrulanması. In: Proceedings of the 6. Ulusal Y¨ ˘ uksek Bas¸arımlı Hesaplama Konferansı (in Turkish), pp 1–6
Yaman D, Eyiokur FI, Ekenel HK (2017) Comparison of convolutional neural network models for document image classification. In: 2017 25th signal processing and communications applications conference (SIU), IEEE, pp 1–4
Noce L et al (2016) Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM symposium on document engineering, pp 165– 173
Audebert N et al (2019) Multimodal deep networks for text and image-based document classification. In: Joint european conference on machine learning and knowledge discovery in databases, Springer, pp 427–443
Fetterly, D., Manasse, M. & Najork, M. (2005). Detecting phrase-level duplication on the world wide web. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (pp. pp. 170-177). : ACM Press, Salvador, Brazil
Hunnisett, D. S. & Teahan, W.J. (2004). Context-based methods for text categorisation. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (pp. pp. 578-579). : ACM Press, Sheffield, United Kingdom
Stamatatos, E., Kokkinakis, G. & Fakotakis, N. (2000). Automatic text categorization in terms of genre and author. Computational Linguistics, 26, pp. 471-495.
Muhammed Miah, “Improved k-NN Algorithm for Text Classification”, Department of Computer Science and Engineering University of Texas at Arlington, TX, USA.
Abdulmunim, M. E., & Abass, H. K. (2019). Classification and Retrieving Printed Arabic Document Images Based on Bagged Decision Tree Classifier. AL-MANSOUR JOURNAL, (32).
Al-Khurabi, Abd Allah Ali& Mansur, Muhammad Abd Allah. 2004. Arabic document image classification using neural networks. Mansoura Engineering Journal،Vol. 29, no. 1, pp.1-8.
Arindam Das et al.”Document image classification with intra – domain transfer learning and stacked generalization of deep convolutional neural networks "24th international conference in pattern recognition(ICPR)2018.
Afzal, M. Z., Capobianco, S., Malik, M. I., Marinai, S., Breuel, T. M., Dengel, A., & Liwicki, M. (2015, August). Deepdocclassifier: Document classification with a deep convolutional neural network. In 2015 13th international conference on document analysis and recognition (ICDAR) (pp. 1111-1115). IEEE.
Kölsch, A., Afzal, M. Z., Ebbecke, M., & Liwicki, M. (2017, November). Real-time document image classification using deep CNN and extreme learning machines. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1318-1323). IEEE.
Taghreed Alghamdi et al “Arabic Document classificationby deep learning”.In 2021 international journel of advanced computer science and applications (IJACSA)(vol.12. No. 10.2021)
Fang Lu Qingyuan Bai, “A Refined Weighted KNearest Neighbours Algorithm for Text Categorization”, IEEE 2010.
Analysis of Convolutional Neural Networks for Document Image Classification-Chris Tensmeyer and Tony Martinez-14th IAPR International Conference on Document Analysis and Recognition 2017
Rupali Bhaisare,T. RajuRao 2013 “Review On Text Mining With Pattern Discovery”.
Fang Lu Qingyuan Bai, “A Refined Weighted KNearest Neighbours Algorithm for Text Categorization”, IEEE 2010.
Vidhya. K.A G.Aghila, “A Survey of Naïve Bayes Machine Learning approach in Text Document Classification”, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, 2010.
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation." CoRR, vol. abs/1311.2524, 2013
S. Sarraf, “Binary Image Segmentation Using Classification Methods: Support Vector Machines, Artificial Neural Networks and K th Nearest Neighbours,” Int. J. Comput., vol. 24, no. 1, pp. 56–79, 2017.