Jump to the Content

Data Quality Scores

National Footprint and Biocapacity Accounts use various UN and, where necessary, para-UN datasets to calculate the Footprint and the biocapacity for all countries for each year going back to 1961.

In some cases data may be limited, unavailable, or may contain apparent errors. While account production includes some procedures to eliminate the more obvious data errors and to estimate missing data, particularly in cases when data exist for years surrounding the missing data points, results for countries and/or years are inevitably of variable reliability. UN statistical offices do some basic data checking, but ultimately, the data is self-reported by the member states.

When calculating the National Footprint and Biocapacity Accounts, the input data published by those UN and para-UN agencies are taken at face value and are not cross-checked or verified. Still, as part of the production of each edition of the National Footprint and Biocapacity Accounts, the researchers assess the results using rudimentary filters for overcoming lack of data completeness, reported zeros that are more likely “missing data”, excessive variance from year to year, or values outside expected ranges.

Once the National Footprint and Biocapacity Accounts are completed, results for each country are reviewed and given a quality score based on detected irregularities as mentioned above. The score for data quality builds on two elements: one score reflects the quality of the time series [1-3, with 3 being the highest quality score] and the other score assesses the latest year [A-D, with A being the highest quality score]. These two dimensions are chosen because those portions of the data are subject to different types of data problems: the latest year can be compromised by data delays or reporting errors, while the time series can be challenged by historical irregularities or data gaps in those data sets.

These scores are used to determine which portions of the accounts we can publish. The table below shows what the quality score numbers and letters mean:

Score elements Publishable data
3 Timeline minus latest year – all EF & BC components
2 Timeline minus latest year – only EF and BC totals
1 Timeline minus latest year – no data
A Latest-year  – all EF & BC components
B Latest-year  – only EF and BC totals
C Latest-year  – only deficit/reserve status
D Latest-year  – no data

 

The following table summarizes the implications of the scores for publishing the data:

Score Data Completeness Criteria and Implications
3A No component of BC or EF is unreliable or unlikely for any year.
All can be published.
3B

 

No component of BC or EF is unreliable or unlikely for the years prior to the latest data year. These results can be published.
For the latest year, however, some individual components of the EF or BC are incomplete or unlikely.
The latest year can however be published as totals, as the affected component are only minor.
3C No component of BC or EF is unreliable or unlikely for the years prior to the latest data year. These results can be published.
For the latest year, however, some individual components of the EF or BC are incomplete or unlikely.

Even totals of the latest year cannot be published, as the affected component are major. Still the deficit/reserve status can be ascribed.
3D No component of BC or EF is unreliable or unlikely for the years prior to the latest data year. These results can be published.
For the latest year, however, components of the EF or BC are too incomplete or too unlikely to determine deficit/reserve status.
No results of the latest year can be published.
2A EF or BC component time series have results that are unreliable or very unlikely, except in the latest data year. Still, the total EF and BC time series results are not significantly affected by unlikely data and can be published.
No EF and BC results, including components, in the latest year are significantly affected by unlikely data and can be published.
2B EF or BC component time series include results that are unreliable or very unlikely, including the latest year.
The total EF and BC time series results are not significantly affected by unlikely data and can be published.
Components of last year and the time series cannot be published.
2C Total EF or BC time series and component EF and BC time series results are unreliable or unlikely, especially in the latest year.
The total EF and BC time series results (minus latest year) are not significantly affected by unlikely data and can be published.
The  results for the last year can only be used to determine deficit/reserve status.
2D Total EF or BC time series and component EF and BC time series results are unreliable or unlikely, especially in the latest year.
The total EF and BC time series results (minus latest year) are not significantly affected by unlikely data and can be published.
EF and BC results in the latest year are significantly impacted by the unlikely or unreliable values, making them unpublishable.
1A Several components of the EF or BC are very unreliable or unlikely, except the latest year.
The EF and BC time series results are significantly affected by unlikely data, and are unpublishable.
No EF and BC results in the latest year are significantly affected by unlikely data. Therefore, the last year’s results, including components, can be published.
1B Several components of the EF or BC are very unreliable or unlikely, except the latest year.
The EF and BC time series results are significantly affected by unlikely data, and can not be published.
The total EF and BC results in the latest year are not significantly affected by unlikely data and can be published.
1C Several components of the EF or BC are very unreliable or unlikely.
The EF and BC time series results are significantly affected by unlikely data, and are not publishable.
The unlikely or unreliable values have not impacted the creditor/debtor status. That status can be published.
1D There is too much unreliable or unlikely data to make any conclusions about the timeline or latest year of this country. No data can be published.

Note: Through further nation-specific research, preferably in collaborations with researchers from those countries (particularly from government agencies) it is possible that the Data Quality score (i.e., the quality of the results) can be improved. Improved data sets, methodological improvements in the National Footprint and Biocapacity Accounts, and better data cleaning processes have also helped to increase the Data Score of some country results in past Editions, as is likely in the future.

Learn more about National Footprint and Biocapacity Accounts data.