Case Study • AI-Powered Document Intelligence • Insurance & Compliance

Cutting Document Review Time From Minutes to Seconds.

Insurance, compliance, and cybersecurity teams are still spending 20–40 minutes searching long PDFs for answers. BIS Advisors implemented an AI-powered document intelligence system that turns those lookups into grounded, cited answers in a few seconds — across real policies, SOC 2 reports, NIST frameworks, cyber endorsements, ESG reports, regulatory guidance, and audit evidence packages.

BIS Advisors is incorporated in Colorado and operates out of Florida, providing AI document intelligence services to insurance and compliance organizations across the United States, with local availability in Florida for in-person consulting when needed.

Skip to business impact Talk about your documents
80–90%+
Reduction in lookup time
2–7 seconds
Typical response time
Low–mid 90%+
Accuracy on mixed test set
0 unsupported claims
In grounded-answer mode

Business Problem

Underwriting, claims, audit, and compliance teams rely on long, dense documents every day:

  • 30–200+ page cyber and property insurance policies
  • Endorsements, exclusions, riders, and cyber addenda
  • SOC 2 Type I/II reports
  • NIST 800-53 and 800-171 control frameworks
  • ESG and regulatory reports
  • Audit evidence PDF collections

These documents are technical, cross-referenced, carrier- or vendor-specific, and often stored across multiple systems. Manual lookup can take 20–40 minutes per question, especially around exclusions, deductibles, or control verification, leading to:

  • Slower underwriting turnaround and operational delays
  • Inconsistent interpretation across reviewers and teams
  • Elevated compliance and regulatory risk
  • Analyst frustration and cognitive fatigue
  • Time lost during submissions, audits, and investigations

“I spend at least a third of my time just searching for where things are in these PDFs.” — Analyst feedback during discovery

Solution Overview

BIS Advisors implemented a Python-based Retrieval-Augmented Generation (RAG) pipeline optimized specifically for insurance, audit, and compliance workflows. Instead of relying on keyword search and manual scrolling, analysts ask natural-language questions and receive grounded answers, supported by citations to their own documents.

System Capabilities

  • Parses and indexes dense, real-world PDFs
  • Retrieves the most relevant paragraphs, clauses, or controls
  • Generates grounded answers using only retrieved content
  • Returns answers in 2–7 seconds for most workflows
  • Includes citations to policy sections or SOC/NIST controls
  • Returns “Not present in the document.” when content is absent
  • Handles multi-document libraries and cross-document questions
  • Supports cloud, hybrid, and secure on-prem deployments

Technical Stack (High-Level)

The system uses a modern, self-hosted AI stack designed for regulated environments:

  • Language: Python (RAG pipeline)
  • LLM: Llama 3.1 Instruct 8B (self-hosted)
  • Inference: vLLM
  • Embeddings: Llama 3.1 Embeddings
  • Vector DB: FAISS
  • Deployment: GCP L4 and on-prem RTX 4090 (hybrid)
  • Guardrails: strict grounded-answer prompting + fallback refusal for unsupported questions

This is not a general-purpose chatbot. It is a focused, production-grade RAG system built for serious, document-centric work.

Engineering Notes From Real Deployments

What Mattered Most in Practice

  • PDF parsing and normalization often mattered more than model size — OCR quality, tables, fonts, and headers heavily influenced recall.
  • Chunk size and overlap had a large effect on coverage and cross-referenced sections.
  • Embedding quality was critical for insurance and SOC/NIST recall, especially in long, technical documents.
  • Grounding-first prompts significantly reduced hallucinations and kept outputs defensible.

Infrastructure & Stability

  • Running on a clean Linux install with an RTX 4090 proved far more stable than WSL for heavy inference workloads.
  • vLLM required careful VRAM tuning; the sweet spot for both 4090 and L4 (24 GB) was around 80–85% GPU utilization.
  • Very long answers (>2–3k tokens) increased latency nonlinearly, so answer length and context depth were tuned per workflow.

Evaluation Approach & Performance

Evaluation Approach

To produce defensible metrics, we used a structured evaluation covering both insurance and SOC/NIST workloads:

  • ~120 total test questions
  • ~50–60 insurance coverage and endorsement questions
  • ~50–60 SOC 2 / NIST control interpretation questions
  • Scoring rubric: Correct / Partially Correct / Incorrect, with manual review by domain-aware evaluators
  • Grounding manually verified by tracing each answer back to source text
  • Latency measured across repeated runs on both GCP L4 and on-prem RTX 4090

The goal was realistic performance — not “perfect demo” numbers.

Performance Highlights

Latency (L4 vs RTX 4090)

  • Typical response time on GCP L4: ~5–7 seconds across all question types.
  • Range: approximately 0.5 to 15 seconds, primarily driven by answer length and context depth.
  • Under identical conditions, an on-prem RTX 4090 ran ~3.5× faster than an L4, with similar answer quality.

Answer Quality

  • Accuracy in the low-to-mid 90% range on a mixed test set of insurance and SOC/NIST questions.
  • No unsupported claims in the evaluated set when using strict grounded-answer mode.
  • Consistent results between different reviewers, with clear traceability from every answer back to underlying source text.

Deployment & Cost Considerations

Different organizations have different risk profiles, data residency requirements, and cost constraints. The solution was validated across on-prem and cloud GPUs, with a focus on stability and predictable cost per question.

On-Prem RTX 4090

  • Fast, stable performance for 8B-class models
  • Full data control — documents and embeddings stay within internal networks
  • Hardware amortization drives incremental compute cost toward near-zero
  • Requires a clean Linux install; WSL was not stable enough under sustained load

Cloud (GCP L4)

  • Stable, easy to deploy, runs 24 GB VRAM models comfortably
  • Ideal for distributed teams or variable workloads
  • Typical configuration: ~8–10 hours/day scheduled uptime, yielding roughly $400–$500/month in compute cost
  • At ~40k–50k monthly questions, effective compute cost can be on the order of $0.00001–$0.00002 per question, excluding implementation/integration work.

Hybrid (Recommended)

  • On-prem 4090 handling most daily workloads
  • Cloud GPUs used for burst capacity, after-hours usage, or special projects
  • Balanced cost, performance, and privacy for insurance and compliance teams

API Models & Escalation

The architecture supports a “local-first, escalate-when-needed” strategy:

  • Use local Llama 3.1 Instruct 8B for 80–90% of routine coverage / control / evidence questions.
  • Escalate only the hardest, highest-stakes questions to higher-priced API models (GPT-4-class, Claude Sonnet, etc.).
  • This approach keeps costs predictable while preserving access to top-tier models when warranted.

Business Impact

Before AI

  • 20–40 minutes per lookup on long PDFs
  • Several documents open side-by-side for each question
  • Repeated manual cross-referencing of clauses and controls
  • High variance between reviewers and teams
  • Significant cognitive load and fatigue

After AI

  • Grounded answers in seconds instead of minutes
  • Cited, defensible responses that can be quickly verified in source documents
  • More consistent interpretation across analysts
  • ~80–90% reduction in manual lookup time for evaluated workflows
  • Lower analyst workload and improved throughput
  • Stronger audit defensibility and traceability

Impact Across Teams

Underwriting

Faster submission review, clearer interpretation of endorsements, sublimits, deductibles, and exclusions, and more consistent reading of carrier-specific wording.

Claims

Improved adjudication speed and consistency, with quick reference to the exact policy language that supports each decision.

Compliance & Audit

Immediate SOC/NIST evidence lookup, reliable control interpretation, faster audit readiness, and traceable answers for regulators and external assessors.

Adoption & Change Management

A key finding from deployments: the technology works, but adoption requires intentional change management.

  • Analysts must trust the system enough to use it daily
  • Managers need clarity on which questions AI should handle vs. when to escalate
  • Teams need training on grounded answers vs. free-form AI “guesses”
  • Compliance and audit functions need documentation, governance, and clear usage policies
  • Integration into existing underwriting, claims, and audit workflows is essential for real ROI

With the right structure in place, adoption tends to accelerate: analysts see immediate time savings, leadership sees improved consistency, and compliance teams gain stronger defensibility.

Key Lessons Learned

  • Python is highly productive for insurance/compliance RAG work.
  • Instruct models outperform base models for grounded, citation-heavy tasks.
  • Retrieval, ranking, and data quality often matter more than model size alone.
  • WSL is not suitable for production GPU workloads; clean Linux + GPU is preferred.
  • Integration and change management drive the real impact — not just the model choice.

Roadmap

Building on this capability, BIS Advisors is expanding AI-powered document intelligence in several directions:

  • Advanced policy comparison and version-difference views
  • Automated mapping between SOC, NIST, and ISO frameworks
  • Enhanced audit evidence extraction and tagging
  • Multi-document “chain-of-thought” review workflows
  • Support for larger and more complex document libraries
  • Running larger models (e.g., Llama 3.1 34B/70B or equivalents) on A100 and H100 GPUs to compare accuracy and latency in high-complexity environments
  • Internal “Ask Your Documents” portals for enterprise teams, with access control and audit logging

These enhancements will further streamline document-heavy operations in regulated industries while maintaining strict compliance, traceability, and trust.

Ready to See This on Your Documents?

AI-powered document intelligence is no longer experimental — it is practical, repeatable, and already delivering measurable value in insurance, cybersecurity, compliance, and audit environments. If your teams are still scrolling through long PDFs to answer routine questions, a focused pilot on your own documents can usually demonstrate value quickly.

We are based in Florida with the company incorporated in Colorado, serving clients nationwide. Florida organizations may request optional on-site sessions in Orlando, Tampa, Miami, and surrounding regions.

📞 Call us: +1 (303) 632-7874
✉️ Email: consulting@bisadvisors.com

Good Fit If…

  • You rely on long insurance, compliance, cyber, ESG, or audit PDFs for daily decisions.
  • You have security or regulatory constraints on where documents and answers can live.
  • You want grounded, defensible answers — not black-box AI responses.
  • You’re open to a scoped pilot on your own documents before scaling to the enterprise.