{"id":15157,"date":"2021-08-30T13:28:08","date_gmt":"2021-08-30T13:28:08","guid":{"rendered":"https:\/\/newver.innotech-vn.com\/?p=12469"},"modified":"2023-03-15T16:24:23","modified_gmt":"2023-03-15T16:24:23","slug":"all-you-need-to-know-about-ocr-in-2021","status":"publish","type":"post","link":"https:\/\/newver.innotech-vn.com\/vie\/all-you-need-to-know-about-ocr-in-2021\/","title":{"rendered":"All you need to know about OCR in 2021"},"content":{"rendered":"
OCR (Optical Character Recognition) is a technology that identifies characters from printed books, handwritten papers, or images. With this technology, businesses and users can rapidly transfer documents into their digital systems, and data analysis tools can process the relevant data.<\/p>\n
In 2021, OCR still provides outstanding results only on particular use cases. In most practical applications, it is still far below human level accuracy. Modern OCR applications are especially poor in processing documents with poor image quality, some alphabets like less commonly used Arabic fonts, handwriting and cursive handwriting.<\/p>\n
<\/p>\n
<\/p>\n
With computer vision technologies, OCR first detects characters one by one. Afterward, it uses image classification to identify each character. If these two steps work successfully, OCR outputs accurate results. However, characters can sometimes be too close to each other and might not be recognized. Thus, OCR requires more than computer vision technologies.<\/p>\n
Even though OCR identifies characters, those characters form words, sentences and paragraphs. Research in NLP has resulted in numerous algorithms that can be used to correct mistakes in character recognition using probabilistic approaches. For example, despite missing characters can be estimated using context.<\/p>\n