By Matthieu Credou — Afnic 2011-2017, on the new gTLD program team. Lived through the 2012 round.
Published 2026-04-24
Methodology
The tld.report 2026 forecast combines four data primitives — historical ICANN applications (1,930 from the 2012 round), 50,721 UDRP cases drawn from four providers, the 2026 RSP applicant disclosures (31 organizations cleared by ICANN as of April 2026), and World Bank macro indicators — into a hierarchical Bayesian negative-binomial model that predicts per-country 2026 application counts, joined to a candidate-level layer that names probable applicants.
This document is fully open. Every number quoted in our paid country, sector, and master reports rests on the methods described here. If you are evaluating whether to buy a report, you can — and should — read this page first. Buyers receiving a country report can cross-check our predictions against the model spec and the retrodiction fit to 2012 actuals.
1. Data sources
ICANN 2012 application baseline
We start with the complete public archive of the 2012 ICANN new gTLD application round, fetched from the ICANN application status site. After fixing a pagination bug in the source archive (a 302 redirect that broke naive scrapers), we have a clean dataset of 1,930 applications from 1,155 unique applicants. The dataset includes 116 IDN strings (Chinese, Arabic, Cyrillic, Hindi). The top applicant countries by 2012 volume:
| Rank | Country | 2012 apps | |---|---|---:| | 1 | United States | 940 | | 2 | Cayman Islands | 91 | | 3 | Luxembourg | 86 | | 4 | Japan | 71 | | 5 | Germany | 70 | | 6 | Gibraltar | 62 | | 7 | Switzerland | 51 | | 8 | France | 49 | | 9 | Australia | 41 | | 10 | China | 41 |
The Cayman Islands, Luxembourg, and Gibraltar concentrations reflect the structural use of offshore IP-holding vehicles by major portfolio operators in 2012 (Donuts, Famous Four, Uniregistry). We treat these as a separate "shell jurisdiction" class in the model — see Section 2.
For each application we additionally hold:
- Outcome class (delegated-in-root, withdrawn, RA-terminated, lost-contention, rejected, etc.) — derived from ICANN status pages and DNS validation (
dig <tld> NS @1.1.1.1). - Operator inferred — the back-end DNS provider (Identity Digital, XYZ.COM, ShortDot, GMO, etc.) reverse-mapped from NS records. Identity Digital alone operates 740 of the 1,104 currently in-root strings.
UDRP enforcement corpus
UDRP cases (Uniform Domain-Name Dispute-Resolution Policy proceedings) are our primary proxy for ongoing brand IP enforcement intensity. A brand that files many UDRP complaints is a brand that takes domain protection seriously — and is therefore a credible 2026 gTLD applicant.
We aggregate from four providers totaling 50,721 cases through April 2026:
| Provider | Cases | Notes | |---|---:|---| | WIPO | 17,824 | Geneva — the largest single provider | | NAF (National Arbitration Forum) | 24,199 | US-based | | CAC (Czech Arbitration Court) | 5,887 | Prague — EU complainants | | ADNDRC (Asian Domain Name Dispute Resolution Centre) | 2,811 | Hong Kong + Beijing + Seoul + Kuala Lumpur consolidated |
The ADNDRC ingestion was added in April 2026 specifically to fix a Western-bias problem in earlier versions of our model. Without ADNDRC, Asian brand IP enforcement was systematically under-represented: the addition multiplies UDRP counts for China (×2.76), Hong Kong (×124), South Korea (×2.18), and adds dozens of cases for Singapore and Japan. This rebalanced the model's view of which Asian brands are credible 2026 applicants.
Country resolution from raw complainant text uses a four-step cascade:
- Regex normalization (90 patterns) — collapses variants like
United States of America ("U.S.")toUS - Manual brand overrides (270 brands) — Google → US, Saudi Aramco → SA, Stellantis → NL, etc.
- Corporate-form suffix heuristic (22 rules) —
GmbH→ DE,S.p.A→ IT,B.V.→ NL,Pty Ltd→ AU - Fuzzy matching (Jaro-Winkler ≥ 0.90) against the 1,155 known 2012 applicants
This cascade resolves 88.7% of the 50,721 cases to ISO-2 country. The 11.3% unresolved are predominantly individuals or obscure brands without enforceable trademarks, and are excluded from the country-level model.
We define active brands as complainants with at least 5 UDRP cases in the 2020-2026 window. This excludes one-shot enforcers and focuses on the institutional IP teams that drive 2026 application decisions. Total: 14,151 distinct active complainants across all jurisdictions.
RSP 2026 applicant disclosures
ICANN's Registry Service Provider (RSP) program publishes the list of organizations who have completed the technical evaluation required to operate a gTLD in 2026. As of April 2026, 31 organizations have cleared at least one tier (Main + DNS + DNSSEC). Their geographic distribution is itself a leading indicator:
- US (4 cleared Main): Charleston Road Registry (Google), DigiCert/UltraDNS, GoDaddy Registry, Unstoppable Domains
- China (3 cleared)
- Korea, Turkey, Indonesia, Thailand, Russia (1 each)
- Plus the traditional EU/Anglo presence (Identity Digital, Verisign, CentralNic-derived entities)
Notably new vs 2012: PT AIDI Digital Global (Indonesia), Yandex (Russia, despite sanctions), Thai Name Server Co. (Thailand), Korean entity (KIDRC infrastructure). These are countries with effectively zero 2012 applications. Our model treats RSP presence as a binary signal at three tiers (cleared Main, cleared DNS-only, testing-only).
World Bank macro
We pull current-USD GDP (NY.GDP.MKTP.CD) and population (SP.POP.TOTL) from the World Bank API for 32 of our 34 model countries. Two require hardcoded fallbacks:
- Taiwan — the World Bank does not publish data for Taiwan due to UN non-recognition. We use Taiwan's Directorate-General of Budget data: GDP $474B (2012) → $792B (2024), population 23.3M → 23.4M.
- Gibraltar — the World Bank lists Gibraltar but returns NULL for all years (British Overseas Territory). We use Gibraltar Statistics Office data: GDP $2.0B (2012) → $3.1B (2024), population 32,400 → 33,700.
The Gibraltar fallback was a critical fix. An earlier version of the model used fillna(0) on missing GDP, which standardized Gibraltar to ~7σ below the GDP mean and — combined with a then-negative GDP coefficient — produced an absurd 531-app forecast for a 32K-population territory. After the fallback was applied, Gibraltar's forecast collapsed to 9.6 apps, and the GDP coefficient itself reverted to null (it had been a data hygiene artefact, not a real signal). This is a cautionary tale about NULL-handling that we generalize: any World Bank-sourced covariate must enumerate British Overseas Territories and disputed-status economies before use.
2. Model architecture (NB GLM v3)
We model 2012 country-level application counts with a Bayesian negative binomial generalized linear model:
y_c ∼ NegBin(μ_c, α)
log μ_c = β_0 + β · X_c
where y_c is the count of 2012 applications from country c, X_c is a 6-dimensional covariate vector, and α is the NB dispersion parameter. The negative binomial likelihood (rather than Poisson) accommodates overdispersion in the count data — a single country can submit dozens or hundreds of applications.
Sampling: PyMC 5.28.4 NUTS, 2 chains × 1,000 draws after 1,000 tuning steps. Convergence verified: r̂ = 1.0 across all 8 parameters, ESS ≥ 1,017, zero divergences. Results reproducible with random_seed=42.
Features (n=6, 89% HDI shown)
| Feature | Mean coefficient | 89% HDI | Interpretation | |---|---:|---|---| | log(GDP) | +0.38 | [-0.39, +1.07] | Null — overlaps zero. Once UDRP is in the model, GDP adds little. | | log(UDRP active brands) | +0.82 | [+0.42, +1.19] | Dominant, HDI excludes zero. The single strongest predictor. | | RSP 2026 max tier | -0.02 | [-0.38, +0.34] | Null on 2012 outcome (RSP is a 2026 forward signal). | | log(population) | -0.40 | [-1.00, +0.23] | Null but lean negative — weak colinearity, not a causal claim. | | Shell jurisdiction (binary) | +1.57 | [+0.43, +2.86] | Strongly positive, HDI excludes zero. | | US first-mover (binary) | +0.94 | [-0.21, +1.99] | Half-absorbs the US 2012 over-performance. HDI includes zero. | | NB dispersion α | 1.24 | — | Slight tightening vs v2 (was 1.13). |
The fit-vs-forecast asymmetry on binaries
Two of our binary features use a deliberate fit-vs-forecast asymmetry:
- Shell jurisdiction is set to 1 at fit (Gibraltar, Luxembourg, Cayman) to absorb their over-application in 2012 as a clean residual. At forecast time it is set to 0 for everyone, expressing the modeling assumption that the 2012 shell-entity wave does NOT recur in 2026 — post-OECD BEPS and IP-holding regulatory tightening have closed the structural arbitrage that drove their 2012 volume.
- US first-mover is set to 1 only for the US at fit (absorbing the +147 residual that no other feature explains). At forecast time = 0 for everyone, expressing the assumption that the US first-mover premium does not fully persist (the ICANN process is now battle-tested globally; Identity Digital, Donuts, and CentralNic have international competitors).
This asymmetry is a modeling choice with structural consequences. We surface its impact explicitly via two scenarios:
- Base case (us_first_mover=0 at forecast) — US 2026 forecast = 334 apps; total 1,244 apps
- High case (us_first_mover=1 at forecast) — US 2026 forecast = 908 apps; total 1,821 apps
We do not pretend to resolve which posture is "correct." Both are defensible; the choice between them is a thesis, not a parameter the data can pin down with n=34.
3. Retrodiction on 2012
The standard ML validation strategy — train/test split — is not available with one historical round. Instead we evaluate the model's retrodiction quality: given the covariates as of 2012, how well does the fitted model reconstruct the actual 2012 applications per country?
Top 12 residuals (observed − model-predicted):
| Country | 2012 actual | v3 predicted | Residual | |---|---:|---:|---:| | US | 940 | 792 | +148 | | LU | 86 | 166 | -80 | | NL | 19 | 81 | -62 | | FR | 49 | 109 | -60 | | JP | 71 | 29 | +42 | | IT | 6 | 47 | -41 | | GI | 62 | 34 | +28 | | GB | 40 | 64 | -24 | | HK | 41 | 17 | +24 | | SE | 11 | 35 | -24 | | AE | 37 | 14 | +23 | | AU | 41 | 21 | +20 |
The US still has +148 residual even after the us_first_mover dummy absorbs most of the 2012 over-performance. This is the strongest evidence that the model cannot fully explain the 2012 US dominance from features alone.
Negative residuals on FR, NL, IT, LU mean the model OVER-predicts these countries in 2012 — they have strong UDRP signal but did not actually apply at that intensity. This is the "EU under-representation" pattern that recurs across the literature: continental European brands have heavy IP enforcement profiles but historically conservative gTLD strategies. Our 2026 forecast reflects this: France gets 110 apps (vs 49 in 2012, ×2.2 catch-up), Italy 48 (vs 6, ×8 catch-up).
Positive residuals on JP, HK, AE mean those countries OVER-applied in 2012 relative to their then-features. Possible explanations: 2012 launch enthusiasm, regional registry influences, defensive brand registrations not driven by UDRP intensity.
These residuals are honest. They tell readers where the model is uncertain.
4. Bucket construction
The country-level forecast (Section 2) tells us how many applications to expect per country. The candidate layer tells us who is likely to apply.
We construct the candidate set from 11 source buckets:
| Bucket | n rows (master v5) | Source | |---|---:|---| | A — 2012 reapplicants | 1,153 | All 1,155 entities that applied in 2012, scored for retry probability | | B — G2000 non-2012 | 165 | Forbes Global 2000 brands that did NOT apply in 2012 | | C — Post-2012 entrants | 2,352 | Companies founded or that grew significantly after 2012 (manually curated by sector) | | D — Named portfolio operators | 17 | Named back-end operators with disclosed 2026 ambitions | | E — Scraped strings | 55 | Strings explicitly disclosed in 2026 RSP application descriptions | | F — Mid-cap aggregate | 88 | Bucket F unnamed mid-cap tail (Monte Carlo aggregate) | | CN — China-focused | 89 | Chinese conglomerates and tech (ByteDance, BYD, Alibaba subsidiaries) | | GCC — Gulf-focused | 137 | Saudi/UAE/Qatar/Bahrain entities (Aramco, ADNOC, Neom, PIF investees) | | IN — India-focused | 56 | Indian unicorns + conglomerates (Reliance, Tata, BYJU'S, Razorpay) | | PORTFOLIO | 11 | Portfolio operator stubs (Identity Digital, Donuts...) | | PORTFOLIO_CO | 9 | Co-applicant rows (Unstoppable Domains coalition: .anime, .privacy, .brave, .hub, .xmr) |
Total: 4,132 candidate × string rows. The same canonical entity may appear in multiple rows (one per likely string) — Amazon for instance has 6 rows across 6 strings.
Each row has:
canonical— the entity namecountry_iso— domicile (used to join the country-level forecast)string— the gTLD string this row predicts (e.g., "ai", "bank", "amazon")p_variant— Bernoulli probability that THIS entity files for THIS stringp_applies_overall— overall probability the entity applies for at least one stringrationale— 1-3 sentences justifying the candidatebucketandsub_sector— categorization for sector reports
The 16-sector taxonomy
Each row is also tagged with a sector from a 16-sector taxonomy: Banking & Insurance, AI & ML, Web3 & Crypto, Cloud / DevInfra / Cyber, Biotech & Pharma, Climate & Energy, Space / Defense / Quantum, Robotics & Industrial, Media / Telecom / Creator, Consumer & Retail, Legal & Professional, EdTech, AgriTech, Auto & Semis, Conglomerate / Sovereign, Brand Heritage. The taxonomy assigns each candidate to exactly one sector via a rules-first classifier (keyword matching on canonical name + string + bucket-of-origin) with optional LLM fallback for unmatched rows. Audited noise rate: ~3-5% miscategorizations, accepted for v1.
5. Country × scenario matrix
Combining the country-level forecast with the candidate layer gives us the headline 2026 outlook, which we report as 6 Monte Carlo scenarios along two orthogonal axes:
- Bucket F unnamed mid-cap tail (λ): low (150) / base (250) / high (350)
- US first-mover posture: base (usfm=0) / high (usfm=1)
| Bucket F × US usfm | Median | 80% CI | |---|---:|---| | low × base | 1,682 | [1,434 ; 1,967] | | low × high | 2,709 | [2,314 ; 3,162] | | base × base (default) | 1,777 | [1,513 ; 2,079] | | base × high | 2,801 | [2,393 ; 3,274] | | high × base | 1,871 | [1,596 ; 2,190] | | high × high | 2,898 | [2,477 ; 3,382] |
We refer to base × base = 1,777 apps median as our point forecast. The 80% credible interval [1,513 ; 2,079] frames the uncertainty most buyers care about. The US first-mover swing (+1,000 apps to the high case) is the single largest source of forecast uncertainty.
The Monte Carlo combines the candidate-level Poisson-binomial draws (from p_variant per row) with a multiplicative scenario factor m ∼ LogNormal(0, 0.12) capturing global aggregate uncertainty. 10,000 draws per scenario.
6. Honest known-unknowns
Several limitations of the v3 model deserve to be in front of every buyer rather than buried in a footnote:
-
n = 34 is small for hierarchical methods. We do not yet implement region random effects (AMERICAS / EUROPE / APAC / GCC / OFFSHORE) because partial pooling at that granularity needs more data than we currently have.
-
No "Asian wave 2026" covariate. Our country_matrix narrative gives China a growth factor of 5.5 (40 apps → 226). The v3 model, working purely from features, predicts only 32 apps for China. Neither is wrong — they encode different information. The gap is the cost of not having an explicit "tech-sector × CAGR" or "ntldstats adoption-per-app" feature. We flag the discrepancy in country reports.
-
The
log_popcoefficient is null but lean-negative. Conditional on UDRP, bigger-population countries appear to apply less. We do not endorse this as a causal claim — at n=34 it is more likely colinearity redistribution between UDRP, GDP, and population. It is published to document what the model literally produces, not to support a thesis. -
us_first_mover HDI spans zero. With only one US data point, the model cannot estimate a clean US-specific intercept. We force the structural assumption from outside the data, and we surface both base and high scenarios in every report.
-
Rationale field cleaned, not curated. Candidate rationale text is pre-cleaned (citation markers stripped) but not editorially rewritten. Some rows read as machine-generated summaries. v2 of the data pipeline plans to source-cite each rationale to a checkable URL.
-
2012 is one observation. The fundamental statistical problem: we are forecasting a once-a-decade event with one prior occurrence. All Bayesian quantification of uncertainty rests on priors and structural assumptions, not on cross-validation. Buyers should treat the headline numbers as informed posteriors, not as measurements.
We are confident enough in the qualitative direction (UDRP-active EU brands will catch up; shells will fade; the Asian wave is real but unmodeled by features) to publish. We are not confident enough in any single point estimate to defend it without context.
7. External validation
The model design follows a methodology framework provided by an external forecaster consulted in April 2026, whose recommendations we adopted in their key elements:
- Lognormal prior on total N, centered near 1,930 with σ chosen so the 80% interval spans roughly 0.5×–1.5× the 2012 baseline. This regularizes extreme scenarios.
- Hierarchical Poisson / negative binomial regression with partial pooling across countries — adopted as our v3 structure, though the partial pooling step (region random effects) is deferred.
- Revealed-preference outcome classes — a 4-class taxonomy for 2012 reapplicants ("never applied," "still operating," "self-terminated," "lost contention"). Implemented as outcome_class on the ICANN 2012 detail data (1,930 apps × 9 actual classes).
- P(retry) calibrated against external benchmarks — withdrawn IPO re-filing rate (9-13%), failed M&A return rate (25-40%), withdrawn SEO reissue rate (~25%). Used as priors on Bucket A retry probability per outcome class.
- Decomposition into baseline + named entity layers —
N_c = N_baseline + N_namedto avoid double-counting. Implemented as the country forecast (top-down) + bucket aggregation (bottom-up). - Sensitivity analysis on EU blending parameter α — the weighting between 2012 base and contemporary UDRP intensity. Adopted via the explicit shell_jurisdiction and us_first_mover binaries that capture structural deviations.
Validation by retrodiction (Section 3) rather than train-test split — also adopted from the external framework.
The full external methodology note is held internally and not published verbatim here. Master tier buyers receive an appendix referencing it; the prose adoption above captures the structural choices that shape every forecast on tld.report.
Citing this methodology
If you reference our forecasts in your own work, please cite:
tld.report. (2026). 2026 gTLD Application Forecast Methodology v3. Retrieved from https://tld.report/methodology
For data licensing inquiries (Master tier with full database, 4,132 rows), contact hello@tld.report.
Methodology version: v3 (April 2026). Refreshes follow the model's release cadence; subscribers are notified of substantial changes via the model version number in the report header.