Tech Reads
Operations Automation8 min read

Vendor Risk Scoring with AI: The 3 Signals That Actually Predict Problems

We built three vendor risk scoring systems. Two were wrong in expensive ways. The third works. The naive approach — sentiment analysis on supplier news, payment history — misses the signals that actually matter. Here is what we found, with specific thresholds from a system running in production today.

The first two systems that were wrong

The first system we built used a combination of payment history (days late, dispute rate) and public news sentiment analysis. It felt rigorous. We had 18 months of payment data and a news feed scanning for supplier mentions. The system gave every vendor a score from 0 to 100.

The problem: the scores were stable right up until vendors actually caused problems. A logistics supplier we had scored at 74/100 — medium-low risk — delivered late on five consecutive shipments over six weeks. Looking back at the signals, the payment history was fine. The news feed had found nothing. The score never moved.

The second system added financial health indicators scraped from public filings and trade credit databases. Better. But it still missed things the first system missed, and it introduced false positives — vendors with temporarily unusual financial metrics due to growth or restructuring that looked risky on paper but were operationally reliable.

The insight that led to the third system: vendor problems show up in behavior before they show up in financials or news. By the time a supplier is in the news for financial trouble, the relationship is already at risk. The signal you want is earlier.

Signal 1: Response time variance on quote requests

When a supplier's sales team starts taking longer to respond to RFQs, something has changed. They may be short-staffed. They may be overwhelmed with other clients. They may be in financial difficulty and reducing their commercial team. They may have deprioritized your account.

The signal is not the absolute response time — it is the change in variance. A supplier who historically responds within 4 hours, now averaging 18 hours with high variance between quotes, has changed their operational behavior. That change often precedes other problems by 6 to 10 weeks.

6–10 weeksaverage lead time between a detectable rise in RFQ response time variance and the first operational problem — based on our production dataset of 340 supplier relationships

The threshold we use: flag a supplier if their rolling 30-day average RFQ response time has increased by more than 40% compared to their 12-month baseline, or if the standard deviation of their response times has doubled. Either condition triggers a yellow flag. Both conditions together trigger a review.

This requires storing response times on every RFQ sent and received. Most ERPs record the RFQ sent date and the quote received date — the data is there, it just is not typically surfaced as a metric.

Signal 2: Invoice discrepancy rate

An invoice discrepancy — an invoice amount that does not match the purchase order, a billing code mismatch, a missing line item — is usually treated as an administrative error and corrected. Most finance teams handle this dozens of times a month and track it only as workload, not as a vendor signal.

When a supplier's invoice discrepancy rate rises, it is a behavioral signal. A vendor with a 2% discrepancy rate who climbs to 8% over two months is experiencing something internal: turnover in their billing team, accounting software issues, or a business under pressure cutting corners. In one case we tracked, a supplier's discrepancy rate climbed from 1% to 11% over eight weeks before they informed us they had switched accounting systems. In another, the rate climbed to 14% and the company had a receivables crisis — they were billing early and inflating quantities to accelerate cash collection.

The threshold: a discrepancy rate that rises more than 3 percentage points above baseline over a rolling 45-day window triggers a flag. Above 10% sustained for 30 days, regardless of baseline, triggers a review regardless.

Our take

Invoice discrepancy tracking requires a clean three-way match process — if you are not systematically comparing invoices to POs to goods receipts, you cannot compute discrepancy rate. This is a prerequisite, not a nice-to-have.

Signal 3: Communication pattern changes

This is the most difficult signal to extract systematically, but it is the most predictive one we have found. When the pattern of communication from a supplier changes — who is writing to you, how formal the language has become, whether escalations are happening more often — something has changed in that organization.

Concretely: a supplier whose account manager changes for the third time in 12 months is a different risk profile than one with the same contact for three years. A supplier who used to confirm deliveries proactively and now requires you to chase for confirmation has changed their operational behavior. A supplier escalating payment queries to senior contacts rather than resolving them at the accounts level has a cash problem.

We use a lightweight NLP classification on email threads with suppliers — available in the ERP inbox — to flag three patterns: contact churn (new email addresses from the same domain at a rate above 1 new contact per quarter), escalation frequency (emails from addresses at director level or above when prior history was working-level), and resolution latency on disputes (how long from dispute raised to resolution confirmed).

The contact churn signal sounds minor. It is not. High staff turnover at a supplier means institutional knowledge is leaving, account management continuity is breaking down, and the organization is likely under pressure. Three new contacts at a single supplier in six months is worth a phone call.

How the scoring system actually works

Each of the three signals produces a component score. The composite risk score is not a simple average — we weight signals by their predictive lead time and by the materiality of the supplier to operations (a critical single-source supplier with one yellow flag gets more attention than a commodity supplier with two yellow flags and easy alternatives).

Green

All signals within baseline. No flags in rolling 60 days.

Yellow

One signal flagged. Monitoring intensified, no action required.

Amber

Two signals flagged, or one signal at critical threshold. Procurement review scheduled. New orders above ~$15K held for manual approval.

Red

All three signals flagged, or supplier in Amber for 30+ days without resolution. Escalated to procurement director. Alternative sourcing activated.

The system in production today covers 340 active supplier relationships and produces approximately 12 flags per week that warrant human review. Before it was in place, supplier problems were discovered at the point of failure — a missed delivery, a quality issue, a sudden unavailability. The flags are now catching the majority of those problems 4 to 8 weeks early.

What we still get wrong

The system still misses sudden failures — a supplier who goes from green to catastrophic in a week because of an event that does not show up in behavioral signals. A fire at a warehouse. A sudden regulatory sanction. A principal who disappears.

For those cases, the system does not help much. The right mitigation is not better scoring — it is supplier diversity, safety stock, and contract terms that protect you when a supplier fails suddenly. The scoring system optimizes for the predictable failures. The unpredictable ones require a different kind of resilience.

Share

Related reading