Engineering Blog · Post #21

From 5-Year Loss Runs to Renewal Pricing: How Structured Claims Ingestion Powers Data-Driven Rate Changes

How InsightUW's InsightXtract pipeline transforms 47 pages of Mercy Health Partners loss run PDFs into structured year-by-year claims data — detecting a deteriorating loss ratio from 35% to 82% — and automatically triggering a rate_inadequacy flag that feeds the AI rate recommendation engine to suggest an +18% rate increase, all before the underwriter opens the file.

The Problem

Renewal pricing in medical malpractice is supposed to be data-driven. In practice, it is PDF-driven. The underwriter receives a 5-year loss run from the broker — a dense, multi-page PDF with inconsistent formatting, varying column headers, and claims scattered across tables that were never designed for machine consumption.

The typical renewal pricing workflow looks like this:

Manual extraction: The underwriter opens the loss run PDF, reads each claim line, and types values into a spreadsheet. For a hospital system with 45 claims over 5 years, this takes 60–90 minutes.
Inconsistent categorization: One underwriter codes a claim as "surgical error," another codes the same type as "operative complication." There is no enforced taxonomy.
Year-by-year trends are invisible: The spreadsheet shows individual claims but does not automatically compute loss ratios by policy year, trend lines, or deterioration velocity.
Rate recommendations are gut-feel: The underwriter looks at the spreadsheet, consults their experience, and proposes a rate change. Two underwriters looking at the same data propose different rates.
No audit trail: Three years later, when someone asks "why did we only increase rates by 8% when the loss ratio was 82%?", there is no structured record of what data drove the decision.

The result is a renewal pricing process that is slow, inconsistent, and disconnected from the actual claims data that should drive it.

The InsightUW Approach

InsightUW's renewal pricing pipeline connects three systems: InsightXtract (structured document extraction), the Loss Summary Engine (trend analysis and flagging), and the AI Rate Recommendation Engine (data-driven pricing). Loss run PDFs flow in one end; a defensible rate recommendation comes out the other.

graph TB subgraph Ingestion["Loss Run Ingestion"] A["Broker Email with Loss Run PDF (47 pages)"] B["Insight Xtract Document Classification"] C["AI Table Extraction (GPT-4o + Layout Parser)"] end subgraph Structuring["Claims Structuring"] D["Claim Record Normalization"] E["Taxonomy Mapping (ICD-10 + Custom LOB Codes)"] F["Year-by-Year Aggregation"] end subgraph Analysis["Trend Analysis & Flagging"] G["Loss Ratio Calculation per Policy Year"] H["Trend Detection (Deteriorating / Stable / Improving)"] I["rate inadequacy Flag Triggered at 70%+ LR"] end subgraph Pricing["AI Rate Recommendation"] J["Loss Data + Book Benchmarks + Market Data"] K["AI Rate Model (Gradient Boost + LLM Overlay)"] L["Rate Recommendation +18% with Confidence Interval"] end A --> B B --> C C --> D D --> E E --> F F --> G G --> H H --> I I --> J J --> K K --> L

The InsightXtract Extraction Pipeline

When a loss run PDF arrives attached to a renewal submission, InsightXtract classifies it as a loss run document and routes it through the claims extraction pipeline. The pipeline handles the structural chaos of real-world loss run PDFs: multi-page tables with merged cells, continuation headers, varying date formats, and inconsistent claim status labels.

Step 2: Structured Claim Extraction

Each claim is extracted into a normalized record with consistent field names, regardless of the PDF source format.

The Loss Summary API: Year-by-Year Breakdown

Once claims are extracted and structured, the Loss Summary Engine computes year-by-year aggregations that reveal the trend the underwriter needs to see.

The Scenario

Mercy Health Partners operates 3 hospitals with 450 total beds in the Midwest. Their Medical Malpractice policy is up for renewal on June 1, 2026. The broker (Marsh) submits the renewal package including a 47-page loss run PDF covering policy years 2021 through 2026. The current premium is $3.8M.

Timeline: From PDF to Rate Recommendation

Time	Event	System	Output
8:15 AM	Broker email arrives with loss run PDF attached	Email Ingestion	Submission created, documents classified
8:15:02 AM	InsightXtract classifies PDF as LOSS_RUN (MedMal)	InsightXtract	Document type = LOSS_RUN, 7 tables detected
8:15:18 AM	Table extraction begins — 7 tables across 47 pages	InsightXtract	Layout parser identifies column headers per table
8:16:45 AM	45 claims extracted and normalized	InsightXtract	45 structured claim records, 93% confidence
8:16:50 AM	Taxonomy mapping applies ICD-10 codes	Claims Engine	Claim types standardized across all 5 years
8:17:00 AM	Year-by-year aggregation computed	Loss Summary Engine	5 policy years, loss ratios: 21.7% → 99.7%
8:17:01 AM	rate inadequacy flag triggered	Flag Engine	Loss ratio > 70% in 2 consecutive years
8:17:01 AM	frequency increase flag triggered	Flag Engine	250% claim frequency increase
8:17:02 AM	severity spike flag triggered	Flag Engine	108% average severity increase
8:17:05 AM	AI Rate Recommendation engine invoked	AI Rate Engine	Model ingests loss data + book benchmarks
8:17:12 AM	Rate recommendation generated: +18%	AI Rate Engine	$3.8M → $4.484M recommended
8:17:13 AM	Underwriter notification: 3 flags, rate recommendation ready	Notification Engine	Bell icon: HIGH severity alert

Total elapsed time from PDF receipt to rate recommendation: 2 minutes, 58 seconds.

Department-Level Claims Breakdown

The extraction pipeline does not just aggregate — it structures claims by facility and department, revealing concentration risk:

Facility	Department	Claims (5yr)	Total Incurred	Avg Severity	Trend
Mercy General Hospital	Orthopedics	8	$1,840,000	$230,000	Stable
Mercy General Hospital	OB/GYN	6	$1,620,000	$270,000	Increasing
Mercy St. Luke's	Emergency Medicine	14	$3,180,000	$227,142	Rapidly Increasing
Mercy St. Luke's	Radiology	5	$680,000	$136,000	Stable
Mercy Children's	Pediatrics	7	$920,000	$131,428	Stable
Mercy Children's	NICU	5	$500,000	$100,000	Decreasing

The data immediately reveals that Mercy St. Luke's Emergency Medicine is the primary driver of deterioration — 14 claims with an increasing severity trend.

The AI Rate Recommendation

When the rate inadequacy flag fires, the AI Rate Recommendation engine ingests the structured loss data alongside book-level benchmarks and market data.

The Extraction-to-Pricing Pipeline in Detail

sequenceDiagram participant Broker as Broker (Marsh) participant Email as Email Ingestion participant IX as Insight Xtract participant LSE as Loss Summary Engine participant Flag as Flag Engine participant AI as AI Rate Engine participant UW as Underwriter Broker->>Email: Renewal package with 47-page loss run PDF Email->>IX: Classify document → Loss Run (Med Mal) IX->>IX: Detect 7 tables across 47 pages IX->>IX: Extract 45 claims with layout parser + GPT-4o IX->>LSE: 45 structured claim records LSE->>LSE: Normalize taxonomy (ICD-10 mapping) LSE->>LSE: Aggregate by policy year (5 years) LSE->>LSE: Compute loss ratios: 21.7% → 35.0% → 56.7% → 72.3% → 99.7% LSE->>Flag: Loss ratio > 70% in consecutive years Flag->>Flag: Trigger rate inadequacy (High) Flag->>Flag: Trigger frequency increase (Medium) Flag->>Flag: Trigger severity spike (High) Flag->>AI: 3 flags + structured loss data AI->>AI: Ingest loss data + book benchmarks + market data AI->>AI: Run gradient boost model with LLM overlay AI->>UW: Rate recommendation: +18% ($3.8M → $4.484M) Note over UW: Total pipeline time: 2 min 58 sec

Metrics: Before and After Structured Claims Ingestion

Metric	Before InsightUW	After InsightUW	Improvement
Loss run extraction time (45 claims)	60–90 minutes (manual)	90 seconds (automated)	98% faster
Claim categorization consistency	62% agreement between UWs	97% (enforced taxonomy)	56% improvement
Year-by-year trend visibility	Manual spreadsheet (if done)	Automatic with flags	Always available
Time from PDF to rate recommendation	2–4 hours	2 minutes 58 seconds	98% faster
Rate recommendations with data backing	~30% (most are gut-feel)	100% (model-driven)	3.3x improvement
Audit trail for pricing decisions	None	Complete (every factor logged)	Full traceability
Claims missed in extraction	5–12% (human error)	< 1% (AI + validation)	90% reduction
Departmental concentration detection	Rarely done (too time-consuming)	Automatic	New capability

Key Takeaways

Loss runs are the foundation of renewal pricing, but they are locked in PDFs. InsightXtract converts unstructured loss run documents into structured, queryable claims data — automatically, in under 2 minutes.
Year-by-year trend analysis is the signal, not individual claims. The Loss Summary Engine computes loss ratios per policy year and detects deterioration velocity, giving the underwriter the trend line, not just data points.
Flags create urgency and accountability. The rate inadequacy flag is not a suggestion — it is a system-generated alert that the current rate is mathematically insufficient based on loss experience. It appears on the renewal, in the queue, and in the audit trail.
AI rate recommendations are defensible because they are decomposed. The recommendation is not a black box. Each factor (loss trend, frequency, severity, book benchmark, market) is weighted and shown, so the underwriter can agree, adjust, or override with full transparency.
The pipeline is the audit trail. Every step — extraction, normalization, aggregation, flagging, recommendation — is logged with timestamps and inputs. Three years from now, you can reconstruct exactly what data drove the pricing decision.

Ready to turn loss run PDFs into data-driven renewal pricing? InsightUW's extraction-to-pricing pipeline transforms 47 pages of claims history into a defensible rate recommendation in under 3 minutes.

Schedule a Loss Run Pipeline Demo →