This package provides results from the Delhi-NCR Coronavirus Telephone Survey (DCVTS) in an easily analysable tidy format.
The objective of Delhi-NCR Coronavirus Telephone Survey (DCVTS) is to conduct rapid assessments using phone surveys to track people’s knowledge and attitude towards the risk of coronavirus, feasibility of adherence to the control measures, its impact on people’s lives and finally the sustainability of those measures in the Delhi National Capital Region.
The survey data was is being collected on Limesurvey platform which also provides remote access to export survey results.
You can install the development version of dcvts
as follows (refreshed
on an occasional basis):
remotes::install_github("maskegger/dcvts")
The dcvts
package currently includes 3 rounds of data from DCVTS.
There are 2 important types of datasets for each round of the survey.
dcvts_responses_r1
and dcvts_responses_r2
are
survey datasets for round 1 & 2 respectively.dcvts_timing_r1
and dcvts_timing_r2
are process
data (or paradata) datasets for round 1 & 2 respectively.Survey data responses from round 1 can be selected as follows:
library(dcvts)
library(dplyr)
## Selecting questions on Impact of Coronavirus from round 2 responses
dcvts_responses_r2 %>%
select(sampleid, iwstat, starts_with("im"))
#> # A tibble: 2,970 x 42
#> sampleid iwstat im1 im2_sq001 im2_sq002 im2_sq003 im2_sq004 im2_sq005
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 0601180~ compl~ "Ver~ Yes No Yes Yes Yes
#> 2 0603102~ compl~ "Not~ No Yes Yes Yes Yes
#> 3 0604161~ compl~ "Ver~ Yes Yes No No Yes
#> 4 0601020~ compl~ "Som~ Yes No No No Yes
#> 5 0602220~ incom~ "" N/A N/A N/A N/A N/A
#> 6 0601050~ compl~ "Not~ No Yes Yes No No
#> 7 0601120~ incom~ "" N/A N/A N/A N/A N/A
#> 8 0701031~ compl~ "Ver~ Yes Yes No No Yes
#> 9 0604032~ wrong~ "" N/A N/A N/A N/A N/A
#> 10 0604070~ compl~ "Ver~ Yes No No No No
#> # ... with 2,960 more rows, and 34 more variables: im2_sq006 <chr>,
#> # im2_sq007 <chr>, im2_sq008 <chr>, im3 <chr>, im4 <chr>, im5_sq001 <chr>,
#> # im5_sq002 <chr>, im5_sq003 <chr>, im5_sq004 <chr>, im5_sq005 <chr>,
#> # im5_sq006 <chr>, im5_sq007 <chr>, im5_other <chr>, im6_sq001 <chr>,
#> # im6_sq002 <chr>, im6_sq003 <chr>, im6_sq004 <chr>, im6_sq005 <chr>,
#> # im6_sq006 <chr>, im6_sq007 <chr>, im6_sq008 <chr>, im7 <chr>, im8 <chr>,
#> # im0a <lgl>, im9 <chr>, im10 <chr>, im11 <chr>, im12 <chr>, im13 <chr>,
#> # im14 <chr>, im15 <chr>, im16 <chr>, im17 <chr>, im18 <chr>
Documentation for each question variable within the survey can be found
my searching the help file for that data file. For example,
documentation can be found my typing ?dcvts_responses_r2
Responses to any question-variable in the survey data can be summarised as follows:
library(dcvts)
library(dplyr)
library(tidyr)
library(ggplot2)
library(forcats)
# Symptomps of Coronavirus (Kn1)
kn1_responses <- dcvts_responses_r2 %>%
filter(iwstat == "complete interview") %>%
select(starts_with("kn1")) %>%
mutate(kn1_other = case_when(kn1_other != "" ~ "Yes",
TRUE ~ kn1_other)) %>%
summarise_all(~sum(. == "Yes")/n()) %>%
pivot_longer(everything()) %>%
mutate(label = case_when(name == "kn1_sq001" ~ "Fever",
name == "kn1_sq002" ~ "Sneezing",
name == "kn1_sq003" ~ "Runny nose",
name == "kn1_sq004" ~ "Pain in throat",
name == "kn1_sq005" ~ "Loose Motion",
name == "kn1_sq006" ~ "Cough",
name == "kn1_sq007" ~ "Difficulty in breathing",
name == "kn1_sq008" ~ "Common cold",
name == "kn1_sq009" ~ "Body/Joint Pain",
name == "kn1_sq010" ~ "Headache",
name == "kn1_sq011" ~ "Tiredness",
name == "kn1_sq012" ~ "Don't Know",
name == "kn1_other" ~ "Other",
)) %>%
mutate(label = fct_reorder(label, value))
kn1_responses %>%
ggplot(aes(label, value)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
labs(title = "Coronavirus Symptoms - DCVTS, Round 2",
y = "Percentage of Respondents",
x = "")
Since Couper (1998) first introduced the term “paradata” to the field of survey methodology, the term has expanded to cover all types of data about the process of collecting survey data such as interviewer call records, length of interview, keystroke data, interviewer characteristics.
The key process data from Limesurvey are the timing statistics. These are obtained by each section. Timing statistics are not available from Limesurvey’s API endpoint and are manually updated. Hence, these might not be upto date on ocassion.
Currently dcvts_timing_r1
and dcvts_timing_r2
datasets are avaiable
for analysing timing statistics of round 1 and 2 respectively.
dcvts_timing_r1
#> # A tibble: 2,270 x 13
#> sampleid iwstat startdatetime iwstartdate iwlen iwlen_s1 iwlen_s2
#> <chr> <chr> <dttm> <date> <dbl> <dbl> <dbl>
#> 1 0602101~ compl~ 2020-04-03 14:08:58 2020-04-03 11.3 4.8 1
#> 2 0602051~ compl~ 2020-04-03 14:15:04 2020-04-03 20.5 6.3 1.1
#> 3 0701060~ compl~ 2020-04-03 14:24:19 2020-04-03 14.1 0.6 2.3
#> 4 0701061~ compl~ 2020-04-03 14:25:06 2020-04-03 10.1 2.2 1.7
#> 5 0701070~ compl~ 2020-04-03 14:27:12 2020-04-03 15.3 0.5 1.4
#> 6 0602050~ compl~ 2020-04-03 14:31:59 2020-04-03 6.5 0.8 1.1
#> 7 0602060~ compl~ 2020-04-03 14:33:46 2020-04-03 10.2 0.3 2
#> 8 0701070~ compl~ 2020-04-03 14:35:14 2020-04-03 15.1 1.4 1.8
#> 9 0701051~ compl~ 2020-04-03 14:36:17 2020-04-03 14.1 0.4 2.1
#> 10 0602061~ compl~ 2020-04-03 14:37:53 2020-04-03 24.8 6.6 2.3
#> # ... with 2,260 more rows, and 6 more variables: iwlen_s3 <dbl>,
#> # iwlen_s4 <dbl>, iwlen_s5 <dbl>, iwlen_s6 <dbl>, iwlen_s7 <dbl>,
#> # iwlen_s8 <dbl>
Reducing non-response is an important quality control mechanism in household surveys. Visualizing the missingness of section level paradata among incomplete responses will help us understand which sections contributed most to this non-response.
The example from round 2 suggests, as expected, most of the nonresponse occurs at the very beginning of the interview. And we can see that if the respondent has completed section 5a, he is very likely to stay throughout and complete the interview. Further analysis on reasons for this non-response will be explored.
dcvts_timing_r2 %>%
arrange(startdatetime) %>%
filter(iwstat == "incomplete") %>%
select(starts_with("iwlen_")) %>%
visdat::vis_miss()
In addition to sharing data in a tidy format easily, dcvts
package can
also be used to fetch live data from Limesurvey’s website as the survey
is in progress. dcvts
uses the
limer package which provides access
to LimeSurvey’s RemoteControl 2
API, allowing you to
collect and analyze survey data in a simple, reproducible workflow.
Firstly, we need to setup limesurvey’s authentication after loading
dcvts
package. To do this, simple type dcvts_limesurvey_login()
which asks for your Limesurvey username and password.
dcvts_getresponses()
fetches the responses from Limesurvey website.
Note - If you do not have access to Limesurvey API endpoint, you need to contact your administrator (i.e., me!)
library(dcvts)
# setup limesurvey authentication.
dcvts_limesurvey_login()
# get all responses (complete/incomplete are other two options available for status)
dcvts_getresponses(round = "r1", status = "all")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.