Engineering6 min read

The Document-to-ERP Pipeline: 3 Years of Lessons

We have processed over 200,000 enterprise documents into ERP systems. The biggest lesson: the hard problem is not the AI — it is authentication, schema mapping, and the 3-way match. Here is what we would do differently.

The problem looks solved until you run it on real data

Document AI has matured fast. Modern extraction models handle scanned PDFs, multi-column layouts, handwritten annotations, and non-English text with accuracy that was impossible three years ago. If your benchmark is a clean invoice in a controlled environment, the problem is essentially solved.

Enterprise data is not a benchmark. It is 15 years of inconsistently formatted supplier invoices, purchase orders with handwritten amendments, delivery notes that reference PO numbers from a system that was retired in 2018, and accounts payable clerks who have been manually correcting the same categories of errors for so long they do not even notice them anymore. The AI gets you 80% of the way there. The last 20% is where most projects stall — and it has almost nothing to do with extraction quality.

200,000+enterprise documents processed into live ERP systems — the lesson: extraction is the easy part; authentication, schema mapping, and the 3-way match are where pipelines break

Lesson 1: Authentication is the actual bottleneck

The first thing we underestimated was authentication. Not for the AI pipeline — for the ERP. Enterprise ERPs have layered permission models: module-level permissions, record-level permissions, field-level permissions, and in some cases session-based permissions that expire at different intervals for different user roles.

A pipeline that works perfectly in a test environment — where you are using a superuser account — will fail silently in production when it hits a field that the service account does not have write permission for. The ERP often does not return an error. It just does not write the field. Your data is wrong and you do not know.

What we do now: map every field the pipeline writes to its permission requirements before build starts. Test with a minimal-privilege service account from day one. Treat OAuth and permission failures as first-class errors in the pipeline, not environment issues to fix later.

Watch out

Never build or test a document pipeline using a superuser or admin service account. You will not discover permission gaps until production — and at that point, the failure is silent. Use minimum-privilege credentials from the first day of development.

Lesson 2: Schema mapping is a business problem, not a technical one

Every ERP implementation has a custom schema. The out-of-the-box Odoo invoice model and the invoice model at a company that has been running Odoo for 6 years are different objects. Custom fields, renamed modules, disabled features, and years of workarounds mean there is no standard mapping between a supplier invoice and an ERP record.

We spent weeks on one project trying to understand why the GL code mapping was wrong — not because the extraction was wrong, but because the ERP had three different "GL code" fields depending on which accounting period the transaction was in, and the business rules lived in the head of the CFO.

What we do now: API schema mapping is a discovery activity, not a technical activity. Before any build starts, we run a structured interview with the finance team to document every field, every edge case, and every informal rule. We put this in a mapping document that gets signed off before any code is written. The AI implements the mapping. Humans own the business logic.

Lesson 3: The 3-way match is not a feature — it is a compliance requirement

The three-way match — confirming that a purchase order, a delivery receipt, and a supplier invoice all agree before approving payment — is standard in enterprise AP. It is also the step where automated pipelines most often fail silently, because a "match" in a rules engine and a "match" in real life are different things.

A real example: a supplier invoices for 100 units. The PO is for 100 units. The delivery note says 98 units. A naive three-way match fails and the invoice goes to a manual queue. An experienced AP clerk knows that for this supplier, a 2% delivery variance is within contract tolerance and approves it. This knowledge is not in any system — it is institutional knowledge.

What we do now: three-way match logic is always codified explicitly with the finance and procurement teams before automation. Tolerance rules, exception categories, and escalation paths are documented as business rules in the pipeline configuration — not inferred by the AI. Any match failure surfaces a specific, actionable reason, not a generic "requires review" flag.

If we were starting over, we would spend the first two weeks of every document pipeline project doing nothing but mapping business rules with the finance team. Not building. Not integrating. Just mapping. Every week skipped here costs three weeks in rework.

Lesson 4: What we would do differently from day one

Use structured extraction from the start

Do not use a general-purpose LLM for document extraction in production. Use a model fine-tuned or prompted specifically for structured output — JSON with explicit field names, types, and confidence scores. It is slower to set up but faster to validate and maintain.

Build the audit trail before the pipeline

Every document that enters the pipeline, every field extracted, every validation result, every ERP write should be logged with timestamps and actor IDs before you run a single document in production. Retrofitting audit logging is painful. Regulators do not care that it was hard.

Make human review a first-class state

Every pipeline needs a well-designed human review interface, not just an exception queue. The review interface determines whether humans catch errors quickly or approve them without reading. We have seen exception queues where the "approve" button is the path of least resistance — and that defeats the entire purpose.

Test with your ugliest data first

Find the worst 50 documents in the archive — the ones that gave humans the most trouble — and test against those before you test against clean samples. If the pipeline handles the worst cases acceptably, the clean cases will be fine. If you test only clean cases, you will be surprised in production.