Methodology · How the data is made
How Luxstay collects, structures, validates, and updates the data on every page. The summary below is intentionally short — auditable detail lives in the codebase and the sources registry.
A human-curated list of destinations defines the Year-1 catalog (60 Vietnam destinations across three priority tiers). Curation considers search demand, traveler intent, and geographic coverage — not editorial favouritism.
For every destination we fetch records from GeoNames (geography, population, timezone), Wikidata (cross-locale identifiers), Wikipedia (narrative summaries), and OpenStreetMap (POIs). Raw payloads are attributed to a row in the entity_sources audit table so we can trace every fact back to a source.
Source data is fed to Anthropic's Claude with a tightly-scoped extraction prompt. The model is instructed to extract — not invent — factual values. Where source data is missing, the field is omitted rather than hallucinated. Each call returns a JSON document validated against a Zod schema before it touches the database.
Every Claude call writes a row to the content_generations table recording the model, prompt version, token counts, latency, and cost in USD. This makes content economics transparent and lets us regenerate specific pages without re-running the whole catalog.
Head-to-head comparison pages are generated from already- extracted destination facts — Claude only writes the comparison narrative, never the underlying numbers. Data points displayed in the comparison table are derived from structured facts, not free-form text.
Source data is re-pulled on a scheduled cadence (Year 1: monthly). Pages are regenerated when source data changes meaningfully or when the prompt version is bumped. Every regeneration increments the page's generation_version so historical content is auditable.
Editorial content (rankings, comparisons, recommendations) is produced before any affiliate or partner data is layered on. We never re-rank destinations or hide negatives based on commission. Affiliate links are clearly disclosed and tracked separately from page content.
Constraints we don't deviate from. Violations are bugs.
Spotted an inaccuracy? Email [email protected] with the page URL and we'll trace it back to the source row in the audit table.