Affiliation:
1. Jawaharlal Nehru Technological University
2. GITAM University
Abstract
In recent years, segmentation and recognition of multilingual languages have attracted the attention of many researchers. Multilingual Optical Character Recognition (OCR) technology uses tools like PyTesseract, OpenCV and Recurrent Neural Networks (RNN) to transform text in English, Telugu, Hindi, Tamil and Kannada. Converting text to digital format transforms communication and supports cultural understanding. The system supports multiple languages and can handle different languages. PyTesseract and OpenCV are used for accurate behavior recognition, while RNN improves language understanding. To ensure accuracy, the system uses advanced techniques to overcome problems such as noise and distortion in data input. This technology, combined with advanced OCR algorithms, improves text recognition and makes it adaptable to multilingual environments. This study highlights the importance of multilingual OCR in preserving language, supporting international cooperation, and encouraging participation in the digital age. The research explores ways to use cross-language grammar, fonts, and document layouts using previously implemented techniques to create informative content. RNN further improves the OCR process by capturing complex words. The userfriendly interface and integration with various platforms increase accessibility, allowing users to easily engage with multilingual content. Therefore, multilingual OCR, which combines PyTesseract, OpenCV, RNN, and other advanced techniques, is used to overcome speech problems, handle various grammars and input data, and have a positive impact on the development of OCR technology. This research helps create a globally connected society where knowledge is transmitted across language boundaries, fostering cultural exchange and fostering growth, while ensuring a good and accurate understanding of literature.
Reference33 articles.
1. Efficient CRNN Recognition Approaches for Defective Characters in Images
2. Amara, M., Zidi, K., Ghedira, K., & Zidi, S. (2016). New
rules to enhance the performances of histogram
projection for segmenting small-sized Arabic words.
International Conference on Hybrid Intelligent Systems,
(pp. 167-176). Springer International Publishing.
3. Anupama, N., Rupa, C., & Reddy, E. S. (2013).
Character segmentation for Telugu image document
using multiple histogram projections. Global Journal of
Computer Science and Technology Graphics & Vision,
13(5), 11-15.
4. A segmentation scheme of arabic words with harakat
5. Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis