Engineering Practice7 min read

Building for Auditability from Day One

A client spent six weeks reconstructing a paper trail for a $2M accounts payable discrepancy. Their system had logs. It had dashboards. It had approval workflows. What it did not have was auditability — and those are not the same thing.

Logging is not auditability

Most teams, when asked whether their system is auditable, point to their logs. Application logs. Access logs. Error logs. A dashboard showing payment totals. And to be fair, those things have real value. But when an auditor sits across from you and asks why a $2M purchase order was approved on a Tuesday afternoon three months ago, application logs will not save you.

Auditors ask three questions. What happened? When did it happen? Who approved it — and what information did they have at the time they approved it? Most systems built without auditability in mind can answer the first question adequately. They struggle with the second. And they completely fail the third.

The distinction matters enormously. Logging records outcomes. Auditability reconstructs decisions. An outcome is: payment of $450,000 was sent to Vendor ID 8821 on March 14th. A decision is: at 14:23 on March 12th, the finance director saw a purchase order for $450,000 with supporting quote #Q-2024-8821-3, reviewed it against budget line 4420, and clicked approve. Those are different data. Systems built to record outcomes almost never capture the second type — and that is what auditors need.

We got this wrong on the first two projects we built. We had good logging. We had clear UI for approval workflows. But we were recording state changes, not decisions. The difference only became visible when an auditor arrived and asked for something we could not produce.

What six weeks of reconstruction actually looks like

The client was a regional distributor. A $2M discrepancy surfaced during a routine quarter-end reconciliation — a series of AP transactions that did not match corresponding GRNs. Nothing obviously fraudulent. Could have been duplicate processing, could have been a system migration artifact, could have been legitimate payments to a vendor that had been restructured under a new entity. Auditors needed to know which.

Their ERP had an approval workflow. It had user accounts and permission roles. It stored the current state of every invoice record in a normalized relational database — the standard way most enterprise systems are built. What it did not have was a record of what each invoice record looked like at the moment of approval. When an invoice was revised post-approval, the system updated the record in place. The approval timestamp remained. The original data it referenced did not.

Reconstructing what an approver actually saw required cross-referencing application logs (which had gaps), email archives, PDF attachments from an unstructured document store, and conversations with staff who may or may not have remembered correctly. Six weeks. Four people. ~$80,000 in consulting fees for what should have been a two-day query.

The underlying cause: mutable state with no event history. The fix would have been straightforward at design time and was effectively impossible to retrofit without rebuilding the data layer.

6 weeksto reconstruct a paper trail that should have been a two-day query — the real cost of building for state rather than events

Mutable state vs. immutable events

Standard relational database design records the current state of an entity. An invoice row has a status column. When the invoice is approved, the status updates from PENDING to APPROVED. The previous status is gone. This is efficient for querying current state. It is a disaster for auditing.

Event sourcing inverts this. Instead of storing the current state of an entity, you store every event that has ever happened to it. The current state is derived by replaying the event log. An invoice does not have a status field that updates — it has an event stream: InvoiceCreated, InvoiceLineItemAdded, InvoiceSubmittedForApproval, InvoiceApproved. Each event is immutable once written. You can always reconstruct what the invoice looked like at any point in time because you have a complete, ordered history of every change.

You do not need full event sourcing to get most of the audit benefit. The minimum viable approach is append-only tables for anything that requires auditability: a separate table that records every state transition, who triggered it, what the entity looked like at that moment (serialized as JSON), and a timestamp. Never delete rows. Never update rows. The current state table can be as mutable as needed — the audit table is its permanent record.

Our take

The architecture decision of mutable state vs. append-only events is almost impossible to change after go-live without a full data migration. It has to be made at the design stage. Teams that defer it are making a hidden bet that they will never face a serious audit.

Correlation IDs and why you cannot retrofit them

This is the one that causes the most pain when discovered late. A correlation ID is a unique identifier assigned at the start of a business transaction — when a document enters the system, when an approval workflow is triggered, when an automated process begins — that is propagated through every downstream service, database write, and log entry that the transaction touches.

Without correlation IDs, tracing a single business event across a multi-service system requires timestamp matching, educated guessing, and manual cross-referencing. With them, you run one query: give me everything that touched correlation ID txn_20240314_8821_x9k and you get every database write, every service call, every approval event, and every external API call that was part of that business transaction — in order.

The reason you cannot retrofit them: correlation IDs need to be present in every table, every log format, every service payload from the start. Adding them to a production system that has been running for two years means modifying dozens of database schemas, updating every service that writes to those tables, and — critically — having no correlation IDs for any historical data. Which is exactly the data auditors ask about.

We now treat the absence of a correlation ID strategy as a blocker before any enterprise financial system begins implementation. Not a nice-to-have. A blocker.

Every enterprise financial system should be designed to answer an auditor's question before that auditor ever arrives. If you are building auditability after the first audit request, you have already failed the design review — you just have not received the invoice yet.

ZATCA and GCC compliance as a forcing function

For clients in the Gulf, this is not theoretical. ZATCA — Saudi Arabia's Zakat, Tax and Customs Authority — mandates e-invoicing requirements under Phase 2 of the Fatoora program that effectively require auditability by law. Every invoice must have a UUID. Every invoice event must be cryptographically signed and timestamped. The authority can request a complete transaction history at any time, and the response window is short.

Businesses that were already running append-only event logs with correlation IDs found ZATCA compliance relatively straightforward — they just needed to add the cryptographic signing layer and the API integration. Businesses that were running mutable-state ERP systems had to either buy a compliance middleware layer that is expensive and fragile, or rebuild their invoicing data model from scratch.

The UAE, Bahrain, and Egypt are following similar frameworks. If you are building any financial system for a GCC business today, assume auditability requirements equivalent to ZATCA Phase 2 will apply within 24 months. The companies that build for it now will spend nothing adapting. The ones that do not will face the same choice: expensive middleware or expensive rebuilds.

Watch out

Retrofitting auditability into a running production system typically costs 3–5x what it would have cost to build it correctly at the start. We have done both. The retrofit is never just a data migration — it is a product rebuild with live data constraints and zero downtime requirements.

What auditability-first architecture looks like in practice

It is not complicated. It is just decisions made deliberately at the start.

Every entity that participates in a financial or compliance workflow gets an append-only event table alongside its standard state table. The event table records: event type, entity ID, actor ID (human or system), the full serialized state of the entity at that moment, a correlation ID linking it to the originating transaction, and a server-side timestamp. No application code is permitted to update or delete rows in event tables. This is enforced at the database level with row-level permissions, not just at the application level by convention.

Decision logs are kept separate from action logs. When a human approves a payment, the decision log records not just "approved" but the data snapshot they approved: the invoice amount, the vendor, the budget line, the supporting documents referenced, and the approval screen version they were looking at. This means that even if the invoice is later revised or the supporting document is replaced, the record of what the approver saw is permanent.

Correlation IDs are generated at the entry point of every business workflow — document ingestion, manual data entry, API call from an external system — and propagated through every service boundary in the request context. They appear in every database write, every log line, and every outbound API call for the duration of that transaction.

The cost at design time: maybe two days of schema planning and an additional 15–20% of implementation time to instrument everything correctly. The cost of not doing it: see above. Six weeks. $80,000. A relationship with an auditor you did not want to have.