Cross-Sectional Coverage Analysis
What transport data exists, where, and for when — across all sources.
Cross-Sectional Coverage Analysis
What transport data actually exists, where, and for when? This analysis cuts across all sources surveyed in the project — TDC, World Bank, OPSIS, PortWatch, OECD/ITF, and others — to map coverage by topic, geography, and time period.
The point isn't to catalogue every dataset. It's to answer a practical question: if you need transport data for a given country and topic, what's available and what's missing?
Data taxonomy
We classify transport data into ten categories. These emerged from the topics
used by TDC and World Bank, cross-referenced with the transport modes taxonomy
in docs/research/taxonomies/transport-modes.md.
| Category | What it covers | Primary sources |
|---|---|---|
| Road & Vehicles | Road network length, paved %, vehicle registrations, traffic volumes | TDC (246), WB WDI, OECD/ITF, SUM4All |
| Road Safety | Fatalities, mortality rates, crash data by road user type | WHO GHO, WB WDI, OECD/ITF, SUM4All |
| Rail | Network length, passenger-km, freight-km, rolling stock | TDC (95), WB WDI, OECD/ITF, EUROSTAT |
| Aviation | Passengers, freight, departures, airport connectivity | TDC (76), WB WDI, WEF TTDI, IMF BOP |
| Maritime & Ports | Port traffic, container throughput, liner shipping connectivity, vessel calls | TDC (72), UNCTAD, PortWatch, WB WDI |
| Public Transit | Bus, metro, tram, BRT ridership and networks | TDC (35), WB WDI, GTFS feeds, Transitland |
| Logistics & Trade | LPI scores, customs clearance, freight corridors, trade facilitation | TDC (21), WB LPI, WB Enterprise Surveys |
| Transport Emissions | CO2 from transport, fuel prices, energy consumption | TDC (31), Climate Watch, WB WDI, Climate TRACE |
| Infrastructure (Geospatial) | Vector networks for roads, rail, ports, airports; climate risk overlays | OPSIS, African Transport DB, OSM, Overture |
| Transport Finance | ODA flows (DAC 210xx), private investment, project finance | IATI/DAC CRS, WB PPI, ieConnect |
Numbers in parentheses are TDC dataset counts. World Bank contributes 264 indicators across 13+ databases; these overlap multiple categories.
Geographic coverage
The overall picture
Coverage is uneven. Southeast Asia and parts of Africa have the most datasets on TDC. World Bank indicators cover 200+ countries but at varying depth. OPSIS has global geospatial coverage. PortWatch covers 145+ port countries.
The table below shows TDC dataset counts for 30 selected countries across all categories. World Bank, OPSIS, and PortWatch add coverage on top of these numbers for almost every country listed.
| Country | Total | Road | Rail | Avia | Mari | Transit | Logis | Emis |
|---|---|---|---|---|---|---|---|---|
| Vietnam | 137 | 39 | 19 | 16 | 28 | 11 | 7 | 9 |
| Indonesia | 133 | 39 | 15 | 17 | 25 | 13 | 8 | 8 |
| South Korea | 122 | 35 | 14 | 17 | 25 | 10 | 8 | 5 |
| Malawi | 118 | 36 | 15 | 17 | 17 | 13 | 12 | 8 |
| Myanmar | 111 | 32 | 13 | 15 | 24 | 8 | 7 | 4 |
| India | 107 | 35 | 11 | 13 | 17 | 8 | 7 | 8 |
| Kenya | 100 | 32 | 15 | 13 | 11 | 15 | 3 | 11 |
| Ghana | 93 | 29 | 13 | 9 | 15 | 18 | 3 | 6 |
| Pakistan | 91 | 28 | 11 | 12 | 16 | 5 | 6 | 5 |
| Bangladesh | 90 | 27 | 11 | 12 | 16 | 5 | 6 | 5 |
| Nepal | 90 | 27 | 11 | 12 | 16 | 5 | 6 | 5 |
| Zambia | 86 | 22 | 15 | 11 | 18 | 9 | 6 | 5 |
| South Africa | 74 | 21 | 8 | 8 | 12 | 13 | 4 | 8 |
| Germany | 72 | 28 | 17 | 5 | 10 | 10 | 3 | 10 |
| Nigeria | 61 | 16 | 7 | 7 | 11 | 11 | 3 | 6 |
| Uganda | 61 | 15 | 8 | 7 | 11 | 11 | 3 | 6 |
| Rwanda | 60 | 14 | 8 | 7 | 11 | 11 | 3 | 6 |
| UK | 56 | 9 | 3 | 4 | 4 | 6 | 3 | 6 |
| Ethiopia | — | — | — | — | — | — | — | — |
| Tanzania | — | — | — | — | — | — | — | — |
Ethiopia and Tanzania fall outside the top 80 in TDC dataset count — a significant gap given their importance to FCDO transport programmes.
Regional patterns
Southeast Asia has the densest coverage. The Asian Transport Observatory contributes ~55 datasets per country across the region, covering most modes. Vietnam, Indonesia, Philippines, Malaysia, Thailand, and Laos all exceed 100 datasets.
East and Southern Africa is the next strongest region, driven by Transport for Cairo (Malawi, Kenya, Zambia, Ghana — 46–65 datasets each) and CCG (South Africa, Nigeria, Uganda, Rwanda — 18–26 datasets each). But this coverage is concentrated in a few countries. Ethiopia, Tanzania, and Mozambique are notably thin.
South Asia has moderate coverage. India sits at 107 datasets; Pakistan and Bangladesh around 90; Nepal at 90. Sri Lanka and Afghanistan are lower.
Europe is well covered by EUROSTAT and OECD/ITF but those sources sit outside TDC. TDC's European datasets are mostly EUROSTAT re-publications.
Americas are the weakest region on TDC (19 datasets for all of North America, 31 for South America). World Bank and OECD fill some of this gap.
Middle East has almost no dedicated coverage on any platform.
What each source adds geographically
| Source | Geographic scope | Strength |
|---|---|---|
| TDC | 246 countries, but very uneven | Best for SE Asia and selected African countries |
| World Bank Data360 | 200+ countries | Global statistical indicators, but 1–2 year lag |
| OPSIS | 220+ countries | Only source for global geospatial infrastructure networks |
| PortWatch | 145+ port countries, 1,985 ports | Only source for near-real-time maritime trade |
| OECD/ITF | 66 member + partner countries | OECD and a few LMICs — strong for road safety and rail |
| African Transport DB | 54 African countries | All of Africa, but static 2023 snapshot |
| IATI/DAC | 140+ recipient countries | Transport ODA flows — global but financial data only |
| WHO GHO | 194 WHO member states | Road safety only — broadest country coverage |
| SUM4All GTF | 183 countries | Composite indicators, no API (web/PDF only) |
Temporal coverage
Update frequency spectrum
Transport data ranges from real-time to static snapshots. Where data sits on this spectrum matters as much as whether it exists at all.
| Frequency | Sources | Typical lag |
|---|---|---|
| Weekly / near-real-time | PortWatch (port calls, disruptions) | Days |
| Quarterly | IATI (aid disbursements) | Weeks to months |
| Annual | World Bank WDI, OECD/ITF, WHO GHO, TDC bulk | 1–2 years |
| Biennial | WHO Global Status Report on Road Safety | 2–3 years |
| Irregular / one-off | SUM4All GTF, WB LPI, enterprise surveys | Varies |
| Static snapshot | OPSIS, African Transport DB, Overture Maps | Point-in-time |
| Per-operator | GTFS feeds, GBFS bike-share | Varies (days to months) |
TDC temporal depth by category
Most TDC datasets cluster in 2010–2024. Historical depth before 2000 is thin.
| Category | Earliest | Latest | Peak decade | Datasets in 2010s | Datasets in 2020s |
|---|---|---|---|---|---|
| Road & Vehicles | 1960 | 2030 | 2010s | 1,142 | 445 |
| Rail | 1960 | 2025 | 2010s | 591 | 182 |
| Maritime & Ports | 1960 | 2026 | 2010s | 413 | 129 |
| Aviation | 1929 | 2026 | 2010s | 375 | 115 |
| Public Transit | 1990 | 2030 | 2010s | 160 | 91 |
| Transport Emissions | 1969 | 2030 | 2010s | 149 | 95 |
| Logistics & Trade | 1990 | 2026 | 2010s | 113 | 57 |
"Datasets" here means dataset-years — a dataset covering 2010–2020 counts once per year in the range. The 2020s numbers are lower partly because many datasets haven't been updated past 2022–2023.
Aviation has the longest historical tail (ICAO data back to 1929), but most of that is sparse. Rail and road data starts meaningfully in the 1990s. Public transit and logistics data barely exists before 2000.
World Bank temporal coverage
World Bank WDI indicators typically have annual data from the 1990s or 2000s onward, with a 1–2 year publication lag. Some series (air passengers, rail freight) go back to the 1970s. The WEF indices (TTDI, GCI) start around 2006. Enterprise surveys are cross-sectional — one year per country, no time series.
Gaps in temporal coverage
No source provides consistent time series before 1990 for most LMICs. Pre-2000 data exists for OECD countries (via ITF) and for aviation (ICAO), but not for African or South Asian road, rail, or transit metrics.
Real-time data is limited to maritime trade (PortWatch) and some transit feeds (GTFS where available). There's no equivalent real-time source for road traffic, rail operations, or aviation movements that's freely accessible.
The 2020s gap: many datasets stop at 2021–2022. COVID disrupted data collection in several countries, and some national statistical offices haven't caught up. The most current annual data available is typically 2022 or 2023.
Coverage matrix: topic vs. source
This matrix shows which sources contribute to each topic category, with rough data-quality indicators.
| TDC | WB Data360 | OPSIS | PortWatch | OECD/ITF | WHO | IATI/DAC | SUM4All | |
|---|---|---|---|---|---|---|---|---|
| Road & Vehicles | 246 ds | 28 ind | networks | — | stats | — | — | RAI |
| Road Safety | — | 6 ind | — | — | fatalities | mortality | — | composite |
| Rail | 95 ds | 14 ind | networks | — | stats | — | — | — |
| Aviation | 76 ds | 17 ind | — | — | — | — | — | — |
| Maritime & Ports | 72 ds | 21 ind | ports | trade flows | — | — | — | — |
| Public Transit | 35 ds | 36 ind | — | — | — | — | — | — |
| Logistics & Trade | 21 ds | 41 ind | — | — | freight | — | — | composite |
| Emissions | 31 ds | 8 ind | hazards | — | — | — | — | composite |
| Infrastructure (Geo) | — | — | full | — | — | — | — | — |
| Transport Finance | — | 10 ind | — | — | — | — | ODA flows | — |
"ds" = datasets, "ind" = indicators. Dashes mean no meaningful coverage.
What this tells us
Road & Vehicles is the best-covered topic across sources. Every major platform has something. But the data is fragmented — TDC has registration counts, World Bank has network length and paved percentages, OPSIS has the actual road geometries, OECD/ITF has traffic volumes. No single source gives you the full picture for a given country.
Road Safety has a source-richness problem, not a data problem. WHO, World Bank, OECD/ITF, and SUM4All all publish road fatality data — but with different definitions, reference years, and estimation methods. TDC has no road safety topic at all.
Geospatial infrastructure depends entirely on OPSIS and the African Transport DB. If those sources don't cover your area of interest at the resolution you need, there's no fallback except raw OpenStreetMap.
Transport finance is only available through IATI/DAC (for ODA) and World Bank PPI (for private investment). Neither is on TDC.
Public transit looks data-rich in the indicator counts but is actually the weakest category for LMICs. Most WB and TDC transit data covers aggregate statistics. The granular route-level data that would make transit planning useful (GTFS feeds, schedules, stop locations) exists for perhaps 50–60 cities in Africa and South Asia, and most of those feeds are incomplete or stale.
FCDO-priority country profiles
For countries central to FCDO's transport programme (RIDE and related), here's what's actually available.
Kenya
Best-covered African country after Malawi. TDC has 100 datasets (Transport for Cairo and CCG are the main contributors). Road, rail, and public transit are all reasonably represented. OPSIS covers the road and rail network. PortWatch covers Mombasa. Digital Matatus provides GTFS for Nairobi. World Bank has full indicator coverage. Main gap: no high-frequency road traffic or safety data.
Nigeria
61 TDC datasets — lower than expected for Africa's largest economy. Road and maritime are the strongest categories. Lagos has some transit data but no official GTFS feed. PortWatch covers Lagos and other ports. OPSIS has the road network. World Bank indicator coverage is full. Gap: limited rail data (reflecting Nigeria's limited rail system), very little on informal transport.
India
107 TDC datasets, strong across all categories. World Bank and OECD provide additional indicator depth. OPSIS covers the full road and rail network. Transit GTFS exists for a handful of cities (Delhi, Mumbai, Bangalore, Kochi). Gap: India's state-level variation is enormous, and most sources only report national aggregates.
Ethiopia
Not in the top 80 TDC countries — a genuine blind spot. World Bank has standard indicators. OPSIS covers the road and rail network (including the Addis-Djibouti railway). No known GTFS feeds. No PortWatch coverage (landlocked). ODA data available via IATI. This country needs dedicated data work.
Bangladesh
90 TDC datasets with reasonable spread across categories. Dhaka has some transit data. PortWatch covers Chittagong and Mongla. OPSIS has road and inland waterway networks. World Bank coverage is full. Gap: inland waterway transport data is thin despite waterways being a major mode.
Gaps and priorities
Structural gaps
-
No unified country-level summary exists. You can't go to one source and ask "what transport data is available for Tanzania?" Each source has its own scope, coverage, and access pattern.
-
Geospatial and statistical data don't talk to each other. OPSIS gives you road geometries; World Bank gives you road-km by country. Joining them requires geographic reference translation that no source does automatically.
-
Road safety is fragmented across sources with incompatible definitions. WHO estimates, police-reported figures, and modelled rates give different numbers for the same country.
-
Transit data in LMICs is near-absent at the route level. Aggregate statistics exist, but the operational data (GTFS feeds, real-time positions) that's standard in OECD cities barely exists in Africa and South Asia.
Geographic gaps
- Ethiopia, Tanzania, Mozambique — important FCDO countries with very thin data on TDC
- Middle East — almost no dedicated transport data on any platform
- Central Asia — minimal coverage beyond World Bank indicators
- Pacific Islands — OPSIS covers infrastructure, but statistical data is sparse
Temporal gaps
- Pre-2000 historical data for most LMICs simply doesn't exist in digital form
- 2023–2025 data hasn't appeared yet for many annual sources (publication lag)
- Real-time road and rail data has no freely accessible global source
Method
This analysis was generated by a Python script
(scripts/extract-coverage.py) that processes:
- TDC: 460 datasets from
all-datasets.json(fetched via TDC's tRPC API) - World Bank: 264 transport indicators from
transport-indicators-filtered.json(fetched via Data360 API) - Additional sources: manually characterised from dataset profiles in
docs/datasets/and research docs indocs/research/
The taxonomy was built by mapping TDC's topics and sectors fields and
World Bank's _transport_subtopics to ten broad categories, with keyword
fallback matching for datasets that lack structured topic metadata.
Geographic counts are TDC-only (TDC is the only source with per-dataset
country tagging). World Bank and other sources are described qualitatively.
Temporal counts use TDC's temporal_coverage_start and temporal_coverage_end
fields, expanded to dataset-years.