Project Information
- Category: Machine Learning & OCR
- Client: Air Conditioning Company
- Date: 2025
- Technologies: Python, Google Document AI, GPT, Django
- Role: AI Engineer & Full-Stack Developer
OCR PDF Extractor
An OCR pipeline that processes handwritten receipts using a fine-tuned Google model and GPT, extracting structured data from 20,000+ receipts at 90% accuracy for an air conditioning company.
Key Capabilities:
- OCR Pipeline: Fine-tuned Google model with GPT post-processing for handwritten receipts.
- Targeted Extraction: Pulls structured fields (dates, amounts, items) from unstructured handwritten input.
- Scale: Processes 20,000+ receipts at ~90% accuracy.
- Validation: Confidence thresholds and rule checks to flag anomalies.
- Export: CSV/Excel output for downstream reporting.