Haga clic aquí para instrucciones en español.
You can install the development version of verdata
from GitHub with:
if (!require("devtools")) {install.packages("devtools")}
devtools::install_github("HRDAG/verdata")
One of the verdata
's dependencies requires the installation of the GNU Scientific Library. It's possible that you will need to install this library separately before installing verdata
.
verdata
has two data frames that contain information about the data dictionary for the replicate files. In diccionario_replias
, you will find the definition of each of the variables contained within. In diccionario_vars_adicional
, you will find additional variables that were constructed for the final report of the Colombian Truth Commission. These data dictionaries are currently only available in Spanish, but we are working on translating them to English.
To use this package, it is necessary to have previously downloaded the data from one of the sites where they are published. This package offers 8 functions to handle the data divided into 5 categories:
R
The confirm_files
function allows you to authenticate the downloaded, making sure that your files exactly correspond to the files originally published. This function accepts files in either of the two published formats (parquet
or csv
).
Additionally, the read_replicates
function allows you to authenticate the content of the files, as well as import the desired number of replicates into R
. This function accepts files in either of the two published formats (parquet
or csv
).
filter_standard_cev
function is optional and allows you to filter the data in the same way that the Truth Commission did, depending on the human rights violation you're analyzing.summary_observed
function offers a count of the observed number of victims - total or grouped by different variables before the statistical imputation of missing fields. The number obtained is the mean between the different replicates.combine_replicates
function uses the Normal approximation using the laws of total expectation and variance to combine the replicates, yielding a 95% confidence interval and a point estimate of the mean number of documented victims taking the imputation uncertainty into consideration. See Section 18.2 of Bayesian Data Analysis for more information.The estimates_exist
function allows you to see whether your strata of interest already exist in the pre-calculated estimation files that you downloaded from the Truth Commission website onto your local machine. This function requires the stratified data and the directory where you've saved the pre-calculated estimates as inputs and returns a data frame with a logical value for whether the estimate exists and a path to the file containing the estimation results if the estimates exists. If you would like to replicate the Truth Commission's results, the data objects estratificacion
(in Spanish) and stratification
(in English) specify the stratifications used for each of estimates presented in the methodological report.
The mse
function allows you to make estimates of underreporting using LCMCR specification (see Section 6 of the methodological report). To use this function, you need to define stratification variables and apply the stratification (i.e., by grouping the data according to these variables). See the function's example and Section 8.4.2 of the methodological report). These estimates take time and computational resources to run. If you would like to make use of the estimates already calculated by our team, you'll need to download the estimates from the Truth Commission website onto your local machine. You can make use of the pre-calculated estimates by specifying the path to the estimates_dir
argument. Keep in mind that by providing a directory, the function assumes the same specifications for the model used in the project. If you want to use other specifications, don't provide a directory to the estimates.
Finally, the combine_estimates
function allows you to combine the results of the estimation, yielding an approximate 95% credibility interval and the point estimate of the mean of the total number of victims in a stratum of interest including both the uncertainty from the missing data imputation and from the multiple systems estimation model. The function uses the Normal approximation using the laws of total expectation and total variance. See Section 18.2 of Bayesian Data Analysis for more information.
We thank Micaela Morales for her thoughtful beta testing of the package.
Comments and suggestions are very welcome. If you have a problem, question, or issue with verdata
, please open an issue on GitHub. If you would like to add new functionality to the package, please open a pull request. Continuous integration is setup to automatically run tests upon a pull request being opened. If you would like to run the existing tests locally prior to opening a pull request you can do so using testthat::test_local()
.
You can cite the package as:
Gargiulo et al., (2024). verdata: An R package for analyzing data from the Truth Commission in Colombia. Journal of Open Source Software, 9(93), 5844, https://doi.org/10.21105/joss.05844.
BibTex entry:
@article{Gargiulo2024,
doi = {10.21105/joss.05844},
url = {https://doi.org/10.21105/joss.05844},
year = {2024},
publisher = {The Open Journal},
volume = {9},
number = {93},
pages = {5844},
author = {Maria Gargiulo and María Juliana Durán and Paula Andrea Amado and Patrick Ball},
title = {verdata: An R package for analyzing data from the Truth Commission in Colombia},
journal = {Journal of Open Source Software}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.