This project is a Python pipeline that uses Optical Character Recognition (OCR) to extract text and structured data from scanned PDF documents. It processes each page, cleans the recognized text, ...
CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 ...