Reference § 00 · Methodology · Public document
Information only

How the Market Compendium is made

Methodology

Revised on the engine's own cadence Provenance versioned per event-link row

Storm is a market-data and research service. It reads published prices from several prediction-market venues, maps the same real-world event across those venues into a single canonical record, and reports the published price difference normalised for each venue's posted fees — the net_edge_bps field — on every page that shows a price. This document explains how each of those steps works, and where the method has real limits.

The site does not place orders, custody funds, or recommend buying or selling any contract. It records what the venues themselves publish, normalises it, and presents the comparison.

§ 1. How we aggregate

Storm pulls public market data on a fixed cadence from each venue's own API. No account linking, no order routing, no private endpoints. The current coverage set, drawn live from the venue directory:

  • Betfair Exchange — UK Gambling Commission regulated (GB entity) + NTRWC-licensed (AU entity); geo-blocks US
  • Crypto.com — CFTC-regulated — Crypto.com Derivatives North America (Nadex)
  • ForecastEx — CFTC-registered DCM + DCO; IBKR subsidiary (Aug 2024 launch). Trading via IBKR ForecastTrader or Robinhood only.
  • Futuur — Offshore (Curaçao-based); accepts international users including US/non-US
  • Hypermind — Forecasting-tournament platform (tournament prize structure, not a regulated exchange)
  • Iowa Electronic Markets — University of Iowa academic exchange (CFTC no-action letter; ~$500 position cap) (catalog only — not actively polled)
  • Kalshi — CFTC-regulated DCM
  • Limitless Exchange — Decentralized CLOB prediction-market protocol on Base; no jurisdictional regulator
  • Manifold Markets — Play-money platform — mana is not legal tender; not a prediction-market venue in the regulatory sense. Operated by Manifold for Charity (US 501c3-adjacent).
  • Matchbook — Alderney Gambling Control Commission licensed peer-to-peer betting exchange (UKGC license also held by parent)
  • Metaculus — Academic / play-money forecasting platform — no money changes hands; no regulatory regime applies.
  • Polymarket — CFTC-registered DCM via QCX acquisition (Nov 2025); US retail access restored via intermediated brokerages
  • PredictIt — Academic exchange (Victoria University of Wellington, NZ) under CFTC No-Action Letter 14-130 (2014); status contested in litigation since 2022

Per-venue ingest cadence ranges from ~90s on the deepest order-book venues to ~30min on scrape-heavy ones; each cadence is tuned to the venue's API constraints and rate limits. See individual venue pages for per-venue specifics.

Each poll upserts the venue's raw market record and updates the outcome-level current price plus a separate columnar price-snapshot table that backs the per-event sparklines. Newsfeed ingestion runs separately on its own cadences (30 min – 24 h, source-dependent — see §3).

Scope note. Storm covers binary and categorical prediction-market contracts on the venues above. Sportsbook lines (DraftKings, FanDuel) are out of scope — no public API — and any single-venue market without a cross-venue analogue is noted on the event page but produces no spread.

§ 2. Matching markets to canonical events

The matching layer is the hardest part of the product to replicate. Raw prices are public, but deciding that Polymarket's “Will Trump win 2024?” and Kalshi's PRES-24-DJT are the same canonical event, with aligned outcome labels, requires per-pair judgement. Storm's pipeline runs three passes:

  1. Rule-based first pass. Normalized question text is scored against the canonical event ontology via title-token Jaccard, outcome-alias overlap, category heuristics, and a set of refuse discriminators that recognize known false-positive shapes before the score is even returned — cross-state (Texas Senate vs NY Governor), deadline-conditional ("Warsh by May 1" vs unconditional), sports cross-team, scope (state-level vs federal), geopolitics-topic (deal vs ceasefire), and inverse-polarity (where Yes-of-A maps to No-of-canonical).
  2. Auto-approve high-confidence. Rule-based proposals scoring at or above a configurable threshold (default 0.90) are auto-linked with matched_by = 'rule'.
  3. Tiered LLM review. Mid-confidence link proposals, event-discovery proposals (cluster→new canonical event), and outcome-expansion proposals (add a candidate to an existing categorical event) each flow through their own LLM-agent reviewer on independent cadences. Each tier has approve/reject authority; outcome-expansion adds a third "keep_pending — re-evaluate next tick" verdict for cases where the cluster is plausibly correct but evidence is thin. Accepted links are written with matched_by = 'agent' and a review timestamp.
  4. Re-review. A weekly sweep re-runs the current prompts against previously-approved links and demotes any the agent now rejects — drift-correction for prompt changes that have tightened the approval bar since a link was first written.

No human is in the loop at any tier. A backfill task fires when a new canonical event is created, scanning the dark surface of unmatched markets and proposing outcome-expansion for every candidate name that matches the new event's title — the LLM reviewer drains those proposals and the event self-populates from a thin seed.

Every event_link row carries its provenance — rule vs LLM vs agent — confidence, and an outcome_mapping JSON that describes which venue outcome labels correspond to which canonical outcomes. When venues subdivide differently (one lists a Super Bowl winner market, a sportsbook lists individual team futures), the canonical event represents the coarser shared concept and the finer-grained markets are tagged accordingly. See the per-event page for the full mapping record on any given link.

§ 3. What a spread is

A cross-venue price difference is the gap between the implied probabilities a venue publishes for the same outcome. When venue A publishes 0.52 and venue B publishes 0.46 for the same outcome, the gross gap in basis points is:

The gross mid-vs-mid number does not account for venue fees. The net figure Storm reports normalises by the published ask on the cheap-side venue and the published 1 − bid complement on the other (which equals the opposite-side ask there). The normalised cost-to-enter Storm reports when both venues publish orderbook depth:

For venues that don't expose ask depth on their public read path (Polymarket's Gamma API, Manifold, ForecastEx, Futuur, PredictIt, Metaculus), the calculation falls back to mid-based math and the resulting row is annotated (mid-based) on the event page so the reader knows the printed edge is approximate. Metaculus is additionally marked tradeable=false so its prices are excluded from spread computations entirely — it's a reference signal, not a venue you can trade against. Fees are the venue's published taker rate (typical_taker_fee_bps on the venue record). Gas is computed from typical_gas_cost_usd against a reference notional of $1,000 — illustrative, not a recommendation; effective bps impact scales inversely with position size. Spreads where either leg's price falls below 0.005 are dropped from display: at those levels the bid-ask gap typically erases the printed edge regardless of fees.

The edge-badge colour on every page follows a fixed threshold:

  • ≥ 100 bps Meaningful after fees — worth a look on a liquid event.
  • 30 – 100 bps Thin. Watch for it to widen or tighten.
  • < 30 bps Noise at reference size; fees likely dominate.
  • Negative Fees and gas exceed the gross spread. Not an opportunity.

§ 4. Signal types beyond cross-venue spreads

A cross-venue price difference (§ 3) is one form of mispricing, but not the only one Storm reports. Three other signal types appear on the home-page ticker and per-event pages, each diagnosing a different kind of disagreement.

Insider proxy — same orderbook, two currencies

Futuur is unusual: it prices each outcome on the same orderbook in both a real-money currency (USDC) and a play-money currency (OOM, "Ooms"). When the two prices diverge by a meaningful margin, the hypothesis is that informed traders are choosing not to express on the USDC leg. Real money carries legal exposure on US-regulated speech; anyone holding non-public information rationally avoids leaving a subpoena trail. Pseudonymous play-money trades don't.

Storm records the divergence in the insider_signal table and surfaces the widest current gaps on the home-page ticker. A freshness filter skips outcomes whose OOM leg hasn't actually moved within a 48-hour window (configurable via STORM_INSIDER_PROXY_OOM_FRESHNESS_HOURS) — a price pinned at 0.01 for weeks is a dead book, not active disagreement. The signal is intra-venue and intra-market: same orderbook, same outcome, two prices, one population choosing which leg to express on.

Real money vs play money / forecaster — cross-venue, cross-population

The same canonical event priced on a real-money venue and on a venue where bad predictions don't cost real money — play money or forecaster reputation (see venues for which sites fall in which category). When the two populations disagree by more than a meaningful margin, the gap is information about whose model is more right.

The diagnosis differs from the insider proxy. Insider proxy is about a single population — Futuur traders — choosing which currency leg to express on. Real-money vs play-money is about different populations entirely: real money has skin in the game and legal exposure; play money reflects crowd sentiment without those constraints; forecasters compete on Brier-score calibration over years. The gap is a research signal, never an arbitrage opportunity — the currencies don't settle across the boundary.

Outcomes are aligned across venues using each event_link's outcome_mapping JSON, so a Polymarket "Donald Trump" outcome and a Manifold "Trump" outcome both canonicalise to the same key and compare correctly even when the surface labels differ.

News-driven signals: mispricing and stale book

When credible news matches a tracked event (see § 5), two things can follow. News mispricing fires when news has resolved the event but a linked venue's book is still pricing the resolved outcome ≥ 10 percentage points away from 0¢ or 100¢ — the venue hasn't finished updating. Stale book fires when news has landed within the last 48 hours but a linked outcome shows less than half a percentage point of movement against its pre-news baseline — the book is slower than the headline. Both are flagged for research; neither is a trade signal.

All four signal types — cross-venue spread, insider proxy, real-money vs other-population, and news-driven — appear on the persistent ticker at the bottom of every page in a single combined feed, with each tile colour-tagged by kind so signal types are distinguishable at a glance:

  • Spread — gold; cross-venue arbitrage edge after fees.
  • Insider — bright purple; Futuur USDC vs OOM divergence on the same orderbook.
  • Real/Play — cyan; real-money venue vs play-money or forecaster venue on the same canonical event.
  • Mispriced — orange; news has resolved the event but a venue's book hasn't updated.
  • Stale — coral; news landed but the linked book hasn't moved.
  • ★ Yours — moss green (paid tiers only); your own recent alerts inserted into the feed.

§ 5. News ingestion and event-linking

Storm pulls headlines from a curated set of public RSS / Atom / government-data feeds and ties each item to canonical events via a confidence-scored matcher. Twenty active sources at the time of writing:

  • Government & public-data feeds — FRED (Fed economic series), BEA (national accounts), Census, NOAA NCEI, USGS (significant earthquakes), NIST, FEC (campaign filings), CFTC, Treasury, the White House feed, State Dept, DOD news. Public domain.
  • Mainstream news RSS — BBC News (World & Politics), BBC Sport, NPR Top Stories, ESPN top headlines, Variety, Hollywood Reporter, CoinDesk, The Block, TechCrunch, The Verge. Headlines + click-through to source; license posture is fair-use aggregation.
  • Crowd-curated — Wikinews (CC BY 2.5, new-article creations only), Wikipedia (per-event revisions, see §5), arXiv (recent abstracts), Metaculus (long-form analysis).

Cadence: most news sources poll every 30 min or 60 min; high-volume government data feeds (FRED, BEA, Census, NOAA, arXiv, NIST) and slow-moving curated content (Metaculus notebooks) poll every 6 h; FEC certified results every 24 h because filings lag by weeks anyway.

Every news item is upserted into a separate news_event table keyed on (source_id, external_id). A second pass proposes news_event_link rows tying each item to candidate canonical events with a confidence score; high- confidence links surface on the per-event page's Recent news section and feed the news-driven alert path for subscribers watching the event. Low-confidence links are retained for audit but not surfaced.

§ 6. Wikipedia article resolution

Storm's event titles are LLM-generated propositions (“Will X happen by Y?”, “[Year] [Topic] [Outcome]”) and rarely match Wikipedia article titles by direct equality. Resolving the right article requires a separate two-tier pass:

  1. Heuristic resolver. Strip “Will…by [Date]?” wrappers, drop year prefixes and trailing parenthetical qualifiers, run MediaWiki opensearch on a few cleaned variants, and accept the first result that shares a proper-noun token with the event title.
  2. LLM fallback. On heuristic miss, ask Haiku for the single best Wikipedia article title; verify the proposed title actually resolves to a Wikipedia page before storing.

The resolved title is cached on event.wikipedia_article_title; the resolver runs daily and re-queries entries older than 30 days. Once a title is cached, the Wikipedia ingester pulls recent revisions of that page every 30 min — fresh edits near the event's canonical_resolution_date are strong evidence signals for the matcher and surface as “milestone”-claim news_events. Events flagged as having no Wikipedia coverage (the resolver tried and failed) are skipped to avoid burning the budget.

§ 7. Venue calibration

For every resolved binary event in the last 365 days, Storm compares each linked venue's published price at fixed horizons before resolution (1 day, 7 days, 30 days) to the actual outcome. Aggregated per (venue, category, horizon):

The aggregation is recomputed weekly (full DELETE + INSERT against venue_calibration_summary). Results are published at /calibration; the per-venue page renders a category-by-horizon breakdown. Categorical (multi-candidate) events are not yet scored.

§ 8. Confidence and review

Every event-link record carries a match_confidence score in [0, 1], a matched_by channel (rule / llm / agent), and a timestamp. At thresholds below 0.90, the mapping is presented as proposed rather than confirmed, and the corresponding spread is not alerted on until Storm clears it. Review is done by Storm using a second LLM pass with cross-checking; no human in the loop.

When a link is superseded — because the underlying market was re-listed under a new ID, or an ontology merge reshaped the canonical event — the old row is marked status = 'superseded' rather than deleted, so the historical spread series remains attributable.

§ 9. Resolution-basis risk

A published cross-venue price difference does not imply that the two contracts will pay out the same way. Nominally-same markets can and do resolve differently across venues because each venue writes its own resolution source into its own contract. The clean archetype is the 2020 Georgia call:

One venue called Georgia for Biden three days earlier than another, on the strength of a different source's projection.

Storm records the resolution_basis on every event-link row so the divergence is never silent — but the risk is live and there is no universal fix. The per-venue resolution text is published on every event page; when two venues disagree on resolution basis, the event page flags it explicitly.

§ 10. Jurisdictional notes

  • Polymarket. CFTC-registered via the QCX acquisition, but retail US access is brokered and jurisdictionally gated. Geoblocked or broker-blocked users cannot transact on the venue even when the price comparison is visible here. Storm does not police this; the subscriber is responsible for venue eligibility at every venue.
  • Kalshi. US-native, CFTC designated contract market. Open to US residents without a broker intermediary.
  • Betfair Exchange. UK / international. Geoblocks US residents. Price data is shown here for reference comparisons; trading eligibility is the user's responsibility.
  • Futuur. Offshore (Curaçao). Operates outside US regulatory oversight. Listed here because the venue's USDC orderbook and OOM play-money orderbook trade on the same questions; the divergence between them is recorded as a distinct insider-proxy signal — a pseudonymous play-money price diverging from the same venue's real-money price can express information that wouldn't legally surface on the CFTC-regulated venues. Not advice that any US resident is eligible to trade Futuur directly.
  • Manifold Markets. Play-money. Displayed as a wisdom-of-crowds reference and explicitly excluded from net_edge_bps calculations. Any published price gap against a real-money venue is a sentiment divergence, not a fee-normalised price comparison.
  • PredictIt. Operated by Victoria University of Wellington (New Zealand) under CFTC No-Action Letter 14-130. Per-market $850 cap makes this a long-tail political signal rather than a deep arb venue; PredictIt's bulk feed doesn't publish per-contract volume or liquidity, so those columns render as "—" on the event panel.
  • Metaculus. Forecasting-tournament platform; not a money venue. Storm reads the public website (browsewrap surface) via a headless Chromium session because Cloudflare gates anonymous HTTP. Marked tradeable=false so spread math excludes it; useful as a divergence signal between expert- forecaster consensus and real-money venues. The authenticated Metaculus API is license-blocked for commercial redistribution, so Storm uses only the public-website path.

§ 11. Limitations

Eyewall Markets is information only. The service does not:

  • Hold user funds, route orders, or sign trades on any venue.
  • Log in to any venue on behalf of a user.
  • Recommend a bet size. net_edge_bps and liquidity are facts; position sizing is the user's decision.
  • Guarantee that any spread will still exist at the moment a user opens a venue tab. Spreads frequently close in seconds.
  • Substitute for a broker, financial advisor, tax advisor, or legal counsel.

The cross-venue mapping is curated autonomously by Storm — it has errors. Every page displays the match_confidence for the underlying link; corrections sent to [email protected] reach Storm's support inbox and are read by the agent, not a person. Resolution-basis divergence (§8) is a live risk, not an edge case. Prices and liquidity figures are snapshots taken at the ingest cadence for each venue (§1) — typically 90s to 3 min old by the time they reach a reader, and never guaranteed to reflect fillable depth at the listed price. Metaculus prices may be up to 30 minutes stale by design.

Every alert email repeats the core disclaimer: informational only, spreads may close before you can act, venue eligibility is the user's responsibility, Eyewall Markets holds no position in any market described.