Xenia

Overview

Extracts structured financial data from hotel proposals. Parses HTML, PDF, and scraped web pages into normalized quotes with confidence scoring and comparison.

Event planners receive hotel proposals as emails, PDFs, and web links, each formatted differently. Xenia parses these documents and extracts room rates, F&B minimums, meeting room costs, taxes, fees, attrition, and cancellation terms into a normalized quote. The extraction pipeline uses ~40 named regex patterns with a confidence scoring system that rates each field by how it was found (explicit label, prose context, or calculation). Post-extraction validation catches anomalies before saving. Side-by-side comparison highlights the lowest values across 2-3 saved quotes. Domain-driven architecture with 196 tests across 12 suites.

Category

Dev Tools

Stack

Next.jsReact 19SupabaseFirecrawlZod

Features

Multi-Source Input

Paste hotel proposal HTML or email text, upload PDF/HTML files, or scrape a proposal URL via Firecrawl. Pattern detection auto-selects the right parser.

Confidence Scoring

Every extracted field carries a confidence tier: explicit-label, prose-direct, prose-loose, or calculated. Auditable extraction methods for each value.

Side-by-Side Comparison

Compare 2-3 saved quotes with lowest values highlighted per category. Room rates, F&B minimums, meeting costs, taxes, and cancellation terms.

Post-Extraction Validation

Detects negative amounts, date inversions, tax rate outliers, room count anomalies, and total mismatches. Warnings surface before you save.

Multi-Format Export

CSV for spreadsheets, PDF report with structured layout, or clipboard copy. All exports include confidence scores and extraction metadata.

Quote History

Save, search, edit, and delete parsed quotes via Supabase. Full-text search across hotel names, dates, and extracted values.

Extraction Pipeline

1 Input (HTML/PDF/URL)
2 Pattern detection
3 Key-value extraction
4 Prose fill
5 Validation