Best OCR for template-based form extraction? [D]
Hi, I’m working on a school project and I’m currently testing OCR tools for forms.
The documents are mostly structured or semi-structured forms, similar to application/registration forms with labeled fields and sections. My idea is that an admin uploads a template of the document first, then a user uploads a completed form, and the system extracts the data from it. After extraction, the user reviews the result, checks if the fields are correct, and edits anything that was read incorrectly.
So I’m looking for an OCR/document understanding tool that can work well for template-based extraction, but also has some flexibility in case document layouts change later on.
Right now I’m trying Google Document AI, and I’m planning to test PaddleOCR next. I wanted to ask what OCR tools you’d recommend for this kind of use case.
I’m mainly looking for something that:
- works well on scanned forms
- can map extracted text to the correct fields
- is still manageable if templates/layouts change
- is practical for a student research project
If you’ve used Document AI, PaddleOCR, Tesseract, AWS Textract, Azure AI Document Intelligence, or anything similar for forms, I’d really appreciate your thoughts.
[link] [comments]
Want to read more?
Check out the full article on the original site