Jump to the Content

Data Quality Scores

Purpose: Data quality scores provide data users with a transparent, at-a-glance assessment of the reliability and completeness of the Ecological Footprint and biocapacity results for each country. The scores guide both the publication of data and inform users about appropriate applications.

Approach for Determining Quality Scores and its Limitations: The National Footprint and Biocapacity Accounts calculate national Ecological Footprint and biocapacity back to 1961, primarily using UN and complementary international datasets.

While the accounts accept published data from established international, and mostly UN, data sources without independent verification, the production process includes conservative corrective measures. Conservative means that, when in doubt to rather not make corrections. Researchers identify and estimate missing values, correct obvious errors, and flag data with extreme volatility or implausible values. These methods enhance the consistency of results; however, the conservative approach does not address all issues and therefore the reliability of results varies across countries and years.

Structure and Interpretation of Quality Scores: Results for each country are reviewed and assigned a quality score based on the presence of detected output data anomalies. Anomalies are defined as unexpected or unexplained data points, such as zeros or extreme outlier values, dips, spikes, and trends. The score has two components:

  • Time series score (1–3, 3 = highest), and
  • Latest complete year score (A–D,  A = highest).

The “latest year” refers to the most recent year with complete input data, which typically lags two to four years behind the current year due to reporting delays among UN and alike agencies. These two dimensions are chosen because they reflect different data challenges: the latest year results can be based on provisional data with a higher probability of errors, while the time series may include historical gaps or inconsistencies.

Recent editions of the National Footprint and Biocapacity Accounts also provide estimates up to the present. These extended estimates beyond the last complete year are not yet scored, but they are only published if the historical time series achieves a score of 3.

These scores determine which portions of the accounts can be published. The table below defines quality score numbers and letters and it also spells out their implications for output data usage.

Score element Publishable data
(“latest year” means the last year with complete input data)
3 Timeline minus latest year: all EF & BC components
2 Timeline minus latest year: only EF and BC totals
1 Timeline minus latest year: no data
A Latest year: all EF & BC components
B Latest year: only EF and BC totals
C Latest year: only deficit/reserve status
D Latest year: no data

EF stands for Ecological Footprint; BC stands for biocapacity.

The following table summarizes the implications of the scores for publishing the data:

Score Data Completeness Criteria and Implications
3A No component of BC or EF is unreliable or unlikely for any year.
All can be published.
3B

 

No component of BC or EF is unreliable or unlikely for the years prior to the latest data year. These results can be published.
For the latest year, however, some individual components of the EF or BC are incomplete or unlikely.
The latest year can however be published as totals, as the affected component are only minor.
3C No component of BC or EF is unreliable or unlikely for the years prior to the latest data year. These results can be published.
For the latest year, however, some individual components of the EF or BC are incomplete or unlikely.

Even totals of the latest year cannot be published, as the affected component are major. Still the deficit/reserve status can be ascribed.
3D No component of BC or EF is unreliable or unlikely for the years prior to the latest data year. These results can be published.
For the latest year, however, components of the EF or BC are too incomplete or too unlikely to determine deficit/reserve status.
No results of the latest year can be published.
2A EF or BC component time series have results that are unreliable or very unlikely, except in the latest data year. Still, the total EF and BC time series results are not significantly affected by unlikely data and can be published.
No EF and BC results, including components, in the latest year are significantly affected by unlikely data and can be published.
2B EF or BC component time series include results that are unreliable or very unlikely, including the latest year.
The total EF and BC time series results are not significantly affected by unlikely data and can be published.
Components of last year and the time series cannot be published.
2C Total EF or BC time series and component EF and BC time series results are unreliable or unlikely, especially in the latest year.
The total EF and BC time series results (minus latest year) are not significantly affected by unlikely data and can be published.
The  results for the last year can only be used to determine deficit/reserve status.
2D Total EF or BC time series and component EF and BC time series results are unreliable or unlikely, especially in the latest year.
The total EF and BC time series results (minus latest year) are not significantly affected by unlikely data and can be published.
EF and BC results in the latest year are significantly impacted by the unlikely or unreliable values, making them unpublishable.
1A Several components of the EF or BC are very unreliable or unlikely, except the latest year.
The EF and BC time series results are significantly affected by unlikely data, and are unpublishable.
No EF and BC results in the latest year are significantly affected by unlikely data. Therefore, the last year’s results, including components, can be published.
1B Several components of the EF or BC are very unreliable or unlikely, except the latest year.
The EF and BC time series results are significantly affected by unlikely data, and can not be published.
The total EF and BC results in the latest year are not significantly affected by unlikely data and can be published.
1C Several components of the EF or BC are very unreliable or unlikely.
The EF and BC time series results are significantly affected by unlikely data, and are not publishable.
The unlikely or unreliable values have not impacted the creditor/debtor status. That status can be published.
1D There is too much unreliable or unlikely data to make any conclusions about the timeline or latest year of this country. No data can be published.

Special Considerations and Notes

“Latest year:” Refers to the most recent year with complete data. The latest year may therefore be 2-4 years old since UN data sets come typically with a 2-4 year time delay.

Extended estimates: the National Footprint and Biocapacity Accounts now provide extended timeline results up to the current year, estimated based on incomplete data sets, additional data sources, and timeline extensions. These estimates beyond the “last year” are not yet scored, but as a minimum condition, they can only be published if the timeline has minimum timeline score of 2 and latest year score of B.

Score Improvements: Through additional country-specific research, ideally in collaboration with researchers from those countries, particularly government agencies, it may be possible to improve the Data Quality score (i.e., the reliability of the results). Such reviews of the National Footprint and Biocapacity Accounts have been performed with over a dozen countries or international agencies.

In past editions, improved datasets, methodological enhancements to the National Footprint and Biocapacity Accounts, and better data-cleaning processes have helped increase the Data Score for some countries. Similar improvements are likely in the future.

Learn more about National Footprint and Biocapacity Accounts data.