TGIS/AI
← Data Deep DivesData Analysis

Cross-Sectional Coverage Analysis

What transport data exists, where, and for when — across all sources.

Cross-Sectional Coverage Analysis

What transport data actually exists, where, and for when? This analysis cuts across all sources surveyed in the project — TDC, World Bank, OPSIS, PortWatch, OECD/ITF, and others — to map coverage by topic, geography, and time period.

The point isn't to catalogue every dataset. It's to answer a practical question: if you need transport data for a given country and topic, what's available and what's missing?


Data taxonomy

We classify transport data into ten categories. These emerged from the topics used by TDC and World Bank, cross-referenced with the transport modes taxonomy in docs/research/taxonomies/transport-modes.md.

CategoryWhat it coversPrimary sources
Road & VehiclesRoad network length, paved %, vehicle registrations, traffic volumesTDC (246), WB WDI, OECD/ITF, SUM4All
Road SafetyFatalities, mortality rates, crash data by road user typeWHO GHO, WB WDI, OECD/ITF, SUM4All
RailNetwork length, passenger-km, freight-km, rolling stockTDC (95), WB WDI, OECD/ITF, EUROSTAT
AviationPassengers, freight, departures, airport connectivityTDC (76), WB WDI, WEF TTDI, IMF BOP
Maritime & PortsPort traffic, container throughput, liner shipping connectivity, vessel callsTDC (72), UNCTAD, PortWatch, WB WDI
Public TransitBus, metro, tram, BRT ridership and networksTDC (35), WB WDI, GTFS feeds, Transitland
Logistics & TradeLPI scores, customs clearance, freight corridors, trade facilitationTDC (21), WB LPI, WB Enterprise Surveys
Transport EmissionsCO2 from transport, fuel prices, energy consumptionTDC (31), Climate Watch, WB WDI, Climate TRACE
Infrastructure (Geospatial)Vector networks for roads, rail, ports, airports; climate risk overlaysOPSIS, African Transport DB, OSM, Overture
Transport FinanceODA flows (DAC 210xx), private investment, project financeIATI/DAC CRS, WB PPI, ieConnect

Numbers in parentheses are TDC dataset counts. World Bank contributes 264 indicators across 13+ databases; these overlap multiple categories.


Geographic coverage

The overall picture

Coverage is uneven. Southeast Asia and parts of Africa have the most datasets on TDC. World Bank indicators cover 200+ countries but at varying depth. OPSIS has global geospatial coverage. PortWatch covers 145+ port countries.

The table below shows TDC dataset counts for 30 selected countries across all categories. World Bank, OPSIS, and PortWatch add coverage on top of these numbers for almost every country listed.

CountryTotalRoadRailAviaMariTransitLogisEmis
Vietnam137391916281179
Indonesia133391517251388
South Korea122351417251085
Malawi1183615171713128
Myanmar11132131524874
India10735111317878
Kenya1003215131115311
Ghana9329139151836
Pakistan9128111216565
Bangladesh9027111216565
Nepal9027111216565
Zambia8622151118965
South Africa742188121348
Germany72281751010310
Nigeria611677111136
Uganda611587111136
Rwanda601487111136
UK569344636
Ethiopia
Tanzania

Ethiopia and Tanzania fall outside the top 80 in TDC dataset count — a significant gap given their importance to FCDO transport programmes.

Regional patterns

Southeast Asia has the densest coverage. The Asian Transport Observatory contributes ~55 datasets per country across the region, covering most modes. Vietnam, Indonesia, Philippines, Malaysia, Thailand, and Laos all exceed 100 datasets.

East and Southern Africa is the next strongest region, driven by Transport for Cairo (Malawi, Kenya, Zambia, Ghana — 46–65 datasets each) and CCG (South Africa, Nigeria, Uganda, Rwanda — 18–26 datasets each). But this coverage is concentrated in a few countries. Ethiopia, Tanzania, and Mozambique are notably thin.

South Asia has moderate coverage. India sits at 107 datasets; Pakistan and Bangladesh around 90; Nepal at 90. Sri Lanka and Afghanistan are lower.

Europe is well covered by EUROSTAT and OECD/ITF but those sources sit outside TDC. TDC's European datasets are mostly EUROSTAT re-publications.

Americas are the weakest region on TDC (19 datasets for all of North America, 31 for South America). World Bank and OECD fill some of this gap.

Middle East has almost no dedicated coverage on any platform.

What each source adds geographically

SourceGeographic scopeStrength
TDC246 countries, but very unevenBest for SE Asia and selected African countries
World Bank Data360200+ countriesGlobal statistical indicators, but 1–2 year lag
OPSIS220+ countriesOnly source for global geospatial infrastructure networks
PortWatch145+ port countries, 1,985 portsOnly source for near-real-time maritime trade
OECD/ITF66 member + partner countriesOECD and a few LMICs — strong for road safety and rail
African Transport DB54 African countriesAll of Africa, but static 2023 snapshot
IATI/DAC140+ recipient countriesTransport ODA flows — global but financial data only
WHO GHO194 WHO member statesRoad safety only — broadest country coverage
SUM4All GTF183 countriesComposite indicators, no API (web/PDF only)

Temporal coverage

Update frequency spectrum

Transport data ranges from real-time to static snapshots. Where data sits on this spectrum matters as much as whether it exists at all.

FrequencySourcesTypical lag
Weekly / near-real-timePortWatch (port calls, disruptions)Days
QuarterlyIATI (aid disbursements)Weeks to months
AnnualWorld Bank WDI, OECD/ITF, WHO GHO, TDC bulk1–2 years
BiennialWHO Global Status Report on Road Safety2–3 years
Irregular / one-offSUM4All GTF, WB LPI, enterprise surveysVaries
Static snapshotOPSIS, African Transport DB, Overture MapsPoint-in-time
Per-operatorGTFS feeds, GBFS bike-shareVaries (days to months)

TDC temporal depth by category

Most TDC datasets cluster in 2010–2024. Historical depth before 2000 is thin.

CategoryEarliestLatestPeak decadeDatasets in 2010sDatasets in 2020s
Road & Vehicles196020302010s1,142445
Rail196020252010s591182
Maritime & Ports196020262010s413129
Aviation192920262010s375115
Public Transit199020302010s16091
Transport Emissions196920302010s14995
Logistics & Trade199020262010s11357

"Datasets" here means dataset-years — a dataset covering 2010–2020 counts once per year in the range. The 2020s numbers are lower partly because many datasets haven't been updated past 2022–2023.

Aviation has the longest historical tail (ICAO data back to 1929), but most of that is sparse. Rail and road data starts meaningfully in the 1990s. Public transit and logistics data barely exists before 2000.

World Bank temporal coverage

World Bank WDI indicators typically have annual data from the 1990s or 2000s onward, with a 1–2 year publication lag. Some series (air passengers, rail freight) go back to the 1970s. The WEF indices (TTDI, GCI) start around 2006. Enterprise surveys are cross-sectional — one year per country, no time series.

Gaps in temporal coverage

No source provides consistent time series before 1990 for most LMICs. Pre-2000 data exists for OECD countries (via ITF) and for aviation (ICAO), but not for African or South Asian road, rail, or transit metrics.

Real-time data is limited to maritime trade (PortWatch) and some transit feeds (GTFS where available). There's no equivalent real-time source for road traffic, rail operations, or aviation movements that's freely accessible.

The 2020s gap: many datasets stop at 2021–2022. COVID disrupted data collection in several countries, and some national statistical offices haven't caught up. The most current annual data available is typically 2022 or 2023.


Coverage matrix: topic vs. source

This matrix shows which sources contribute to each topic category, with rough data-quality indicators.

TDCWB Data360OPSISPortWatchOECD/ITFWHOIATI/DACSUM4All
Road & Vehicles246 ds28 indnetworksstatsRAI
Road Safety6 indfatalitiesmortalitycomposite
Rail95 ds14 indnetworksstats
Aviation76 ds17 ind
Maritime & Ports72 ds21 indportstrade flows
Public Transit35 ds36 ind
Logistics & Trade21 ds41 indfreightcomposite
Emissions31 ds8 indhazardscomposite
Infrastructure (Geo)full
Transport Finance10 indODA flows

"ds" = datasets, "ind" = indicators. Dashes mean no meaningful coverage.

What this tells us

Road & Vehicles is the best-covered topic across sources. Every major platform has something. But the data is fragmented — TDC has registration counts, World Bank has network length and paved percentages, OPSIS has the actual road geometries, OECD/ITF has traffic volumes. No single source gives you the full picture for a given country.

Road Safety has a source-richness problem, not a data problem. WHO, World Bank, OECD/ITF, and SUM4All all publish road fatality data — but with different definitions, reference years, and estimation methods. TDC has no road safety topic at all.

Geospatial infrastructure depends entirely on OPSIS and the African Transport DB. If those sources don't cover your area of interest at the resolution you need, there's no fallback except raw OpenStreetMap.

Transport finance is only available through IATI/DAC (for ODA) and World Bank PPI (for private investment). Neither is on TDC.

Public transit looks data-rich in the indicator counts but is actually the weakest category for LMICs. Most WB and TDC transit data covers aggregate statistics. The granular route-level data that would make transit planning useful (GTFS feeds, schedules, stop locations) exists for perhaps 50–60 cities in Africa and South Asia, and most of those feeds are incomplete or stale.


FCDO-priority country profiles

For countries central to FCDO's transport programme (RIDE and related), here's what's actually available.

Kenya

Best-covered African country after Malawi. TDC has 100 datasets (Transport for Cairo and CCG are the main contributors). Road, rail, and public transit are all reasonably represented. OPSIS covers the road and rail network. PortWatch covers Mombasa. Digital Matatus provides GTFS for Nairobi. World Bank has full indicator coverage. Main gap: no high-frequency road traffic or safety data.

Nigeria

61 TDC datasets — lower than expected for Africa's largest economy. Road and maritime are the strongest categories. Lagos has some transit data but no official GTFS feed. PortWatch covers Lagos and other ports. OPSIS has the road network. World Bank indicator coverage is full. Gap: limited rail data (reflecting Nigeria's limited rail system), very little on informal transport.

India

107 TDC datasets, strong across all categories. World Bank and OECD provide additional indicator depth. OPSIS covers the full road and rail network. Transit GTFS exists for a handful of cities (Delhi, Mumbai, Bangalore, Kochi). Gap: India's state-level variation is enormous, and most sources only report national aggregates.

Ethiopia

Not in the top 80 TDC countries — a genuine blind spot. World Bank has standard indicators. OPSIS covers the road and rail network (including the Addis-Djibouti railway). No known GTFS feeds. No PortWatch coverage (landlocked). ODA data available via IATI. This country needs dedicated data work.

Bangladesh

90 TDC datasets with reasonable spread across categories. Dhaka has some transit data. PortWatch covers Chittagong and Mongla. OPSIS has road and inland waterway networks. World Bank coverage is full. Gap: inland waterway transport data is thin despite waterways being a major mode.


Gaps and priorities

Structural gaps

  1. No unified country-level summary exists. You can't go to one source and ask "what transport data is available for Tanzania?" Each source has its own scope, coverage, and access pattern.

  2. Geospatial and statistical data don't talk to each other. OPSIS gives you road geometries; World Bank gives you road-km by country. Joining them requires geographic reference translation that no source does automatically.

  3. Road safety is fragmented across sources with incompatible definitions. WHO estimates, police-reported figures, and modelled rates give different numbers for the same country.

  4. Transit data in LMICs is near-absent at the route level. Aggregate statistics exist, but the operational data (GTFS feeds, real-time positions) that's standard in OECD cities barely exists in Africa and South Asia.

Geographic gaps

  • Ethiopia, Tanzania, Mozambique — important FCDO countries with very thin data on TDC
  • Middle East — almost no dedicated transport data on any platform
  • Central Asia — minimal coverage beyond World Bank indicators
  • Pacific Islands — OPSIS covers infrastructure, but statistical data is sparse

Temporal gaps

  • Pre-2000 historical data for most LMICs simply doesn't exist in digital form
  • 2023–2025 data hasn't appeared yet for many annual sources (publication lag)
  • Real-time road and rail data has no freely accessible global source

Method

This analysis was generated by a Python script (scripts/extract-coverage.py) that processes:

  • TDC: 460 datasets from all-datasets.json (fetched via TDC's tRPC API)
  • World Bank: 264 transport indicators from transport-indicators-filtered.json (fetched via Data360 API)
  • Additional sources: manually characterised from dataset profiles in docs/datasets/ and research docs in docs/research/

The taxonomy was built by mapping TDC's topics and sectors fields and World Bank's _transport_subtopics to ten broad categories, with keyword fallback matching for datasets that lack structured topic metadata.

Geographic counts are TDC-only (TDC is the only source with per-dataset country tagging). World Bank and other sources are described qualitatively. Temporal counts use TDC's temporal_coverage_start and temporal_coverage_end fields, expanded to dataset-years.