OCR PDF Extractor

Project Information

  • Category: Machine Learning & OCR
  • Client: Air Conditioning Company
  • Date: 2025
  • Technologies: Python, Google Document AI, GPT, Django
  • Role: AI Engineer & Full-Stack Developer

OCR PDF Extractor

An OCR pipeline that processes handwritten receipts using a fine-tuned Google model and GPT, extracting structured data from 20,000+ receipts at 90% accuracy for an air conditioning company.

Key Capabilities:

  • OCR Pipeline: Fine-tuned Google model with GPT post-processing for handwritten receipts.
  • Targeted Extraction: Pulls structured fields (dates, amounts, items) from unstructured handwritten input.
  • Scale: Processes 20,000+ receipts at ~90% accuracy.
  • Validation: Confidence thresholds and rule checks to flag anomalies.
  • Export: CSV/Excel output for downstream reporting.