View source: R/compare_datasets.R
| compare_datasets | R Documentation |
Compare new datasets from source with current datasets in WDI/DCS.
compare_datasets(new, current, alpha = 0.05)
new |
data.frame: A |
current |
data.frame: A |
alpha |
numeric: Significance level for a two-tailed test. Defaults to 0.05. |
The main usage of compare_datasets() is to compare the individual
country-year values in the new source dataset with the current values in
WDI/DCS.
Comparison is done by merging the two datasets (left join on new),
calculating the absolute difference between the two value columns, and then
running outlier detection on the diff column.
Users should look for both large differences in values (diff) and large
p-values (p_value) to identify outliers or other possible unwanted changes
in the data.
In the case where a few values for a specific country are substantially
different from the current dataset in WDI/DCS they should pop out as outliers
with large p-values. On the other hand it might be the case that most or all
values for a specific country have changed. In that case it is unlikely to be
any outliers, but changes can be found by inspecting the diff and
n_diff columns.
A tibble with the following columns added to new:
current_value: Value in current dataset.
current_source: Source in current dataset.
diff: Absolute difference between value and current_value.
outlier: TRUE if the diff value is an outlier.
p_value: p-value for the diff value.
n_diff: Sum of diff by country.
n_outlier: Sum of outlier by country.
## Not run:
# Fetch indicator from source
df <- fetch_indicator("SH.MED.NUMW.P3", "who")
# Compare with WDI
dl <- compare_with_wdi(df)
# Compare new (source) and current (WDI) datasets
res <- compare_datasets(new = dl$source, current = dl$wdi)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.