Engineering Blog · Post #109

Three-Tier Location Matching for YoY Exposure Comparisons

From "the broker says the schedule is unchanged but the renewal SOV has 31 locations and the prior policy bound with 29 and the addresses have lowercase / abbreviation / typo / unit-number drift, so a UW staring at two spreadsheets can't tell at a glance whether two locations are the same address with different formatting, a deliberate drop, an accidental drop, or two new pickups" to "three-tier matching — explicit broker-supplied prior_location_guid first, canonicalized address exact match second, fuzzy similarity surfaced as suggestions third — with manual UW override at every tier, materialized into a snapshot pinned at bind time so the audit shows exactly what the matcher decided when the policy was bound" — through one indexed column on Location, one address canonicalizer, one tier ladder, and one explicit UW-override path that always wins.

The Problem

Acme Construction's renewal lands. The current SOV has 31 locations. The prior bound policy had 29. The UW needs to verify the broker's claim that "the schedule is unchanged" or quickly understand the material exposure shifts — TIV up 25%? New locations? New CAT-tier-1 zones? Construction-class downgrades?

Side-by-side comparison of a 31-row schedule against a 29-row schedule is mechanical for software and tedious for humans. The catch: the addresses don't match cleanly.

These are the same locations. Lowercase / abbreviation / ZIP+4 / punctuation drift. The string-equality test says "no match"; an underwriter says "obviously same address." The mismatch matters because the diff is the basis for the YoY exposure-change report — adding two Same Location rows to the Added column when they're really Matched / Unchanged distorts the picture.

Two naïve framings:

String-equality match. Brittle. Drift kills it. Half the matches show up as Added/Removed; the report is noise.
Pure ML address matching. Black-box. Hard to audit ("why did the matcher say these are the same address?"). UW can't override it without going around the system. And you have to host a model that's available at exposure-comparison time.

The right framing borrows from how UWs already think:

First, trust the broker. If the broker supplied an explicit prior location guid link on the renewal SOV, the match is whatever the broker said it was. UW can override.
Then, normalize and equality-match. Lowercase, expand "St" → "Street", strip punctuation, drop ZIP+4 suffix, collapse whitespace. The drift cases above all collapse to the same canonical string and match.
Then, surface fuzzy candidates. When 1 and 2 produce no match but a similar-looking prior location exists, surface it as a suggestion — UW makes the call.

Three tiers. Determinism first; UW-override on top. ML — when it ships — slots into tier 3 as a candidate scorer, never as an autonomous matcher.

The InsightUW Approach

graph TD subgraph Snap["Pinned snapshots"] Bind["Exposure Snapshot snapshot kind='bind time' (immutable)"] Quote["Exposure Snapshot snapshot kind='quote finalize'"] MAN["Exposure Snapshot snapshot kind='manual'"] end subgraph Source["Location (mutable)"] LOC["Location + canonicalized address (indexed) + prior location guid (FK or override)"] end subgraph Tier1["Tier 1 — explicit"] FK["broker-supplied prior location guid"] OVR["UW manual override prior location guid"] end subgraph Tier2["Tier 2 — canonicalized exact"] CAN["canonicalize(address) (lowercase / expand abbreviations / punct strip / ZIP-base)"] EQ["equality on canonicalized address"] end subgraph Tier3["Tier 3 — fuzzy suggestions (Phase 2 enriched)"] SIM["similarity score (token set ratio + zip prefix + state)"] SUG["surfaced as candidates, UW selects"] end subgraph Diff["uw exposure service"] Match["match locations"] Comp["get comparison (materialized to Exposure Comparison)"] end subgraph UI["UI"] Panel["exposure-comparison-panel (Matched / Added / Removed tabs)"] Ovrmodal["per-location manual link override modal"] end LOC --> Tier1 Tier1 -->|priority 1| Match LOC --> CAN CAN --> EQ Tier2 -->|priority 2| Match Tier3 -->|priority 3 (suggestion)| Match Bind --> Comp LOC --> Comp Match --> Comp Comp --> Panel Panel --> Ovrmodal Ovrmodal --> Tier1

Three tiers; UW override always wins; pinned snapshots make the bound-time decision auditable.

Location — two additive columns

Both nullable. Backfill is mechanical.

Canonicalization — deterministic, audit-friendly

The earlier examples all canonicalize to the same string:

Equality on the canonicalized string is sufficient.

The function is pure; identical input produces identical output; UWs can paste an address into the canonicalize-preview box and see exactly what string the matcher will compare. That property — deterministic and audit-readable — is the design bar. ML matching breaks both. Tier 3 keeps determinism by surfacing suggestions, never auto-applying them.

match locations — the tier ladder

The order of operations is the contract:

If the broker (or UW override) said "this maps to that," that's the answer.
If the canonicalized strings are equal and unambiguous, that's the answer.
Otherwise, the row goes to Added with fuzzy suggestions. UW resolves manually.

Ambiguous tier 2 — two prior locations with the same canonicalized address (Acme has two sites at "1234 Main Street, Boston" — separate buildings on the same lot, neither carries a unit number) — falls through to tier 3 deliberately. Auto-picking one would mask the ambiguity.

Tier 3 fuzzy — suggestions, not decisions

The fuzzy score is a function — token set ratio from rapidfuzz, bounded [0, 100], with cheap zip+state boosts. Threshold 70 keeps the suggestion list short. Top 3 gets surfaced in the UI per Added row.

When a Phase 2 ML matcher ships, it slots in here — a different scorer, same surface. The matcher suggests; UW decides.

UW override — set location match

UW clicks a row in the Added tab. Modal lists the top 3 fuzzy suggestions plus a search-prior-locations field. UW picks one; the override saves; the comparison re-materializes; the row moves from Added to Matched on the next refresh. Audit row written.

The same path covers "this isn't really a new location, it's the prior site renumbered" and "this is a new location and I'm explicit about that — strike out any auto-suggestion" (override prior_location_guid to NULL after the matcher tried).

Pinned snapshots — what UW saw at bind time

Idempotent on (submission_guid, snapshot_kind) for non-manual kinds. The Bind & Issue cap #3 bind hook auto-pins bind time (soft import + try/except — if the snapshot service isn't available, the bind doesn't fail). The audit shows what UW saw when the policy bound — including which tier each location matched at, what UW overrides were applied, and what the rollups looked like.

Comparison reads pinned snapshots when available; falls back to live Location rows when not.

get comparison — materialized + cached ~15 min

The 15-minute cache means exposure-panel reads are sub-millisecond on the second hit. Manual override invalidates the cache (the set location match call calls invalidate comparison cache) so UW changes show up immediately.

Resolving the prior submission — borrowed from Bind & Issue cap #1

Renewal-aware. New business returns no prior submission and the panel renders "No prior policy on file" gracefully.

Worked Example: Sarah's Acme Renewal Schedule Diff

Acme's renewal SOV has 31 locations. The prior bound policy had 29. Sarah opens the exposure-comparison panel.

Roll-up bar:
- TIV: $94M (prior $86M) · +9.3% (warn — under 10% threshold)
- Total payroll: $42M (prior $38M) · +10.5% (warn)
- Total employees: 2,400 (prior 2,200) · +9.1%
- Location count: 31 (prior 29) · +2 net

Tabs: Matched (28) · Added (3) · Removed (1)

She clicks Matched. Of the 28:
- Tier 1 — explicit (broker-supplied): 19 rows. The broker's renewal SOV included prior location guid for every location they considered unchanged.
- Tier 2 — canonicalized exact: 9 rows. The broker didn't link them, but normalized address strings collapse to identical values. Each row's match-tier chip shows "auto" with a tooltip explaining the canonicalization logic. The "Boston Suite 200" row is one of these — broker forgot the explicit link; canonicalization caught it.

She clicks Added. Three rows:

Charlotte, NC — TIV $4.2M, occupancy=warehouse · no fuzzy suggestions above threshold 70 — genuinely new location. She makes a note for exposure review.
"5678 Industrial Pkwy Houston TX" — TIV $8.1M · fuzzy suggestion: prior 5678 Industrial Parkway, Houston, TX 77001-1234 (score 94). The canonicalizer should have caught this — but wait, both addresses are already in the prior policy under canonicalized form 5678 industrial parkway houston tx 77001. The matcher identified ambiguity: there were two prior locations canonicalized to the same string (Acme had a second "5678 Industrial Parkway" office on the same lot, suite 200 vs suite 400, but the prior schedule didn't carry the suite). Tier 2 saw two candidates → ambiguous → pushed to Added with suggestions. Sarah opens the modal, sees both prior candidates, picks the one matching the description; the row moves to Matched on refresh.
"9999 Lakeshore Dr Bldg 4 Chicago" — TIV $5.8M · fuzzy suggestion: prior 9999 Lakeshore Dr, Bldg 4, Chicago, IL 60601 (score 96). The canonicalizer should have matched this exactly — but the prior row's canonicalized address was empty (legacy data, never backfilled). Sarah picks the suggestion; the prior row's canonicalized_address is opportunistically backfilled by the override path; the row moves to Matched.

After Sarah's two overrides, the panel re-renders:
- Matched: 30
- Added: 1 (Charlotte, NC — genuine)
- Removed: 1 (a Cleveland warehouse the broker drops; Sarah confirms with broker email)

She moves on. The submission proceeds to bind.

What the bind-time snapshot captures

Two weeks later, the policy binds. The Bind & Issue cap #3 bind hook calls pin snapshot. The snapshot writes:

The audit row says: at bind time, this policy had 31 locations; 30 matched; Sarah overrode 2 of them; her decisions are timestamped and attributed. Two years from now, on a coverage dispute, the snapshot is the source of truth for "what UW saw."

When the matcher fails fast

A different submission — Beta Industries, brand new business, no prior policy. resolve prior submission returns None. The panel renders: "No prior policy on file. New business; YoY comparison not applicable." No tier ladder runs; no matcher invoked. The panel doesn't pretend.

When the canonicalizer is silently wrong

A subtle case — apartment numbers that the canonicalizer collapses incorrectly. The broker's prior SOV had:

Different addresses; correctly canonicalized differently. But the renewal SOV writes:

Tier 2 finds zero canonicalized matches (no prior canonicalizes to "123 main street" without an apt suffix). Tier 3 surfaces both prior rows as fuzzy candidates with scores 92/91. UW reads the modal, says "wait, the renewal didn't carry apt numbers — the broker stripped them" and either picks one + flags the data quality issue with the broker, or rejects both suggestions and adds the row as a new location, deliberately.

The system surfaced the ambiguity. The UW resolved it. The override is auditable.

What's Deferred (Phase 2)

Postal-API canonicalization. Today's canonicalizer is rule-based — abbreviation table, ZIP-base, punctuation strip. A Phase 2 USPS / Royal Mail / Canada Post API call would resolve "5678 Industrial Pkwy Houston TX" to a delivery point ID, eliminating the ambiguity from the apt-number case above. Stub designed; integration deferred until a postal-API budget is approved.
ML similarity scoring. Today's tier 3 uses token set ratio + zip/state boosts. A trained address-similarity model would improve the suggestion ranking, especially for international addresses. Slots into the same fuzzy candidates interface; deferred.
Geocoding cross-check. Two textually different addresses that resolve to the same lat/lng (typo on one, correct on the other) wouldn't match by canonicalization. Geocoding both and computing distance would catch it. Cost-benefit favors deferring until the rule-based + fuzzy approach is observed in production for a year.
Multi-language address handling. Bermuda + Canadian + UK deployments use different address conventions (postal codes, parishes, two-line addresses). Canonicalizer is US-biased today. Phase 2: per-jurisdiction canonicalizer modules.
Snapshot-aware override propagation. Today, a UW override on the renewal panel updates the current submission's prior location guid. If the prior submission had a different override that's now wrong, the prior snapshot doesn't update — it's immutable. Generally correct (the snapshot is what UW saw at bind time, frozen) but occasionally surprises. Documented; not changing.
Per-jurisdiction abbreviation tables. Today's _ABBREV is US-centric. UK has different street suffixes ("Crescent" abbreviated as "Cres," etc.); per-state in the US has minor variations. Not blocking.
Fuzzy match auto-apply at high score. A future optimization: when score ≥ 95 and there's exactly one candidate, auto-match instead of suggesting. Today the bar is determinism — the matcher never auto-matches at tier 3.

What This Means for Underwriters

The matcher prefers human signal over algorithm. Broker-supplied prior location guid (tier 1) wins over canonicalization (tier 2) wins over fuzzy suggestion (tier 3). UW override is a tier-1 signal.
Canonicalization is deterministic and auditable. Lowercase, abbreviation expansion, ZIP-base, punctuation strip. UW can paste an address and see the canonical form. ML doesn't run at tier 2.
Fuzzy is suggestion-only. Tier 3 surfaces top-3 candidates above a similarity threshold. The matcher never auto-applies them. UW picks; override saves.
Ambiguous tier 2 falls through to tier 3. Two prior locations with the same canonicalized address don't auto-match; the matcher pushes the row to Added with suggestions and lets UW disambiguate.
Override always wins. UW sets prior location guid (or NULL it). The comparison re-materializes; audit captures who, when, what.
Pinned snapshots freeze the bind-time decision. bind auto-pins bind time; quote finalize is a separate snapshot kind. Two years later, the audit answers "what locations did UW match when the policy bound."
15-minute cache on the materialized comparison. Sub-millisecond reads after the first hit; UW overrides invalidate cache.
Prior submission resolved via the Bind & Issue cap #1 chain. prior_policy_guid → Policy → source_quote_guid → prior Quote → prior submission. New business returns "no prior policy"; the panel renders gracefully.
Severity is per-LOB. Exposure Threshold carries warn/alert thresholds per metric and per-attribute rules (occupancy_change, construction_class_downgrade, new_cat_tier1_location, new_jurisdiction). The renewal panel rolls up to clean / minor / major / unknown_risk.
The matcher fails fast on missing data. No prior policy = no comparison, panel says so. Empty canonicalized_address on a prior row = falls through to tier 3 with opportunistic backfill on UW override. Brittle paths surface; nothing pretends.
Adding a Phase 2 ML matcher is one substitution. fuzzy candidates is the seam. A trained model slots in with the same return shape; UW workflow doesn't change; suggestion-only contract holds.
Determinism + override is the right bar for compliance. Insurance audit asks "why was this decision made?" The matcher's answer is always reproducible — broker said, canonicalization said, or UW said. ML can extend the suggestion side without breaking this.

What's Next

The Customer 360 module composition (blog #103) wired a renewal-time T&Cs differ for prior-policy fields; the exposure-matcher described here handles the location-schedule diff. Together they cover the two most common renewal-comparison surfaces. The next blogs will cover the rest of the comparison story — line-level loss experience snapshots, the renewal-readiness gate, and how UW manager review wraps the renewal decision.

Want to see how InsightUW matches 31 locations against 29 priors with explainable, override-able, audit-friendly logic? Request a demo.