```{css, echo=FALSE} body .main-container { max-width: 640px !important; width: 640px !important; } body { max-width: 640px !important; } .caption { font-size: 70% !important; }
```r knitr::opts_chunk$set( collapse = TRUE, out.width = "100%", echo = FALSE, comment = "#>" )
The COVID19 Sampling Toolbox is primarily intended for sampling from list frames with a rich set of auxiliary information. It is not intended to be used for any complex sampling design at this stage.
The graphical user interface (GUI) is implemented in flexdashboard, with runtime: shiny, for more details see here: https://rmarkdown.rstudio.com/flexdashboard/shiny.html
Currently it contains two sampling approaches:
In the following i will only present the use of the two approaches, assuming that the user is familiar with the two approaches.
After installation, you can run the following commands:
library(SurveySolutionsCOVID19tools) suso_covid19_samplingApp()
This will open the application in your default browser (recommended browser: MS Edge or Google Chrome) with the following start screen.
knitr::include_graphics("./img/start.png")
Uploading the frame is done through clicking on Browse... The frame file has to be in .csv format, and should ideally contain only the variables used for sampling, which are:
Another important requirement is that none of the variables used in any of the two approaches contains any missing values.
After uploading the available variables can be selected from the corresponding inputs.
knitr::include_graphics("./img/upload.png")
Next step is the selection of the desired sampling approach through the available radio buttons.
SamplingStrata requires the specification of several input parameters:
Select the (single) variable specifying the desired domain for the estimation. These can be geographic domains (i.e. provinces) or socio-economic domains (i.e. gender). The more domains you provide, the larger the sample size will be. If you only require the desired precision at the national level, you domain variable should include only a single value (i.e. 1).
After selection of the domain variable, you need to specify the variable of interest, which is: The variable for which you require estimates at the desired level of precision for each of the provided domains. After having done that, you will see the CV table to the right
knitr::include_graphics("./img/cvtab.png")
For each target variable, the table will contain a separate column, in the same order as the specified variables. The number of rows is determined by the number of desired domains. Each value in this table can be modified. This means, you can specify a separate CV for each domain and variable. In the following we change the desired CV from 5% to 1% for the first target variable only.
knitr::include_graphics("./img/strat_cv_mod.png")
After setting the target variables, it is now time to select the stratification variables. Currently the stratification only works for categorical variables, which need to be provided as numeric inputs. However you may also provide continuous variables, which are transformed to categorical. The transformation is described further down below.
Let's start with a set of categorical stratification variables for now.
knitr::include_graphics("./img/strat_cat_sel.png")
That's it. You can now start the stratification by clicking on the Start Stratified Sampling button.
knitr::include_graphics("./img/strat_start1.png")
A progress bar in the lower right corner will inform you, when the optimization is finished.
Attention: The application uses a genetic algorithm for the optimization, and depending on the number of domains/target variables, this may require substantial computational resources. The function supports parallel execution, however the availability depends on the number of (logical) CPU cores. If you system has 4 or less cores, the optimization will be carried out sequentially, and may take significantly longer to complete.
In case were continuous variables are provided, a transformation to a categorical format is achieved by using the function:
SamplingStrata::var.bin()
which requires the specification of the number of desired categories. The default to this is 3. Changing this parameter to an unreasonable number of categories may result in non-convergence of the optimization.
For the purpose of creating reproduceable samples in both, the final sample as well as the random seed for the optimization, it is recommended to provide a seed value. Using this seed with the application, will allow you to always get exactly the same sample every time you run the stratification (assuming all inputs are the same). Therefore it is recommended, to write down the seed together with the sample after creation of the final sample.
For the final estimation it is helpful if you have at least 2 units in each stratum, however an increase of this parameter is recommended. Nevertheless be careful, since increasing it too much may result in non-convergence of the optimization.
Selecting the Sample Properties section allows you too view the quality of the specified design, and if all restrictions on your CV are met. Currently the screen shows the CV for each variable, it's bias for the variables, and across domains, as well as total and domain sample sizes and number of strata. If your require this for a report, you may very well take a screenshot now.
knitr::include_graphics("./img/strat_eval.png")
To download the data, switch to the Final Sample section, which includes the download button.
knitr::include_graphics("./img/strat_dwl.png")
The download file is .zip compressed, and contains three files:
Switching to Balanced Sampling requires selection of Cube Sample after uploading the file:
knitr::include_graphics("./img/cube_radio.png")
Which will also result in a slightly different set of inputs.
The first required step is the selection of a single target variable, either continuous ore categorical. If the latter, the categorical variable requires to be numeric, and coded with 0 and 1. After selection, and specification of the desired CV, The sample size window will contain the required sample size. This is only the theoretical one, in case you require more (i.e. to compensate for non-response), you may increase the value.
knitr::include_graphics("./img/cube_target.png")
In the final step you need to specify the balancing variables (i.e. the variables for which you require the means to be equal to the means of your frame population). Having done so, allows you to start the cube sampling algorithm.
knitr::include_graphics("./img/cube_balancingvars.png")
Similar to the stratification, the provision of the seed allows your sample to be reproduceable.
After completion of the cube sampling approach, you can download the sample, the frame and the design in the last section, Final Sample.
knitr::include_graphics("./img/cube_download.png")
The download file is .zip compressed, and contains three files:
For the moment this is all, however check by frequently during the next days, since updates will follow. Current projects are:
In case of any questions or suggestions, feel free to drop me an email.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.