README.md

DCVTS

This package provides results from the Delhi-NCR Coronavirus Telephone Survey (DCVTS) in an easily analysable tidy format.

The objective of Delhi-NCR Coronavirus Telephone Survey (DCVTS) is to conduct rapid assessments using phone surveys to track people’s knowledge and attitude towards the risk of coronavirus, feasibility of adherence to the control measures, its impact on people’s lives and finally the sustainability of those measures in the Delhi National Capital Region.

The survey data was is being collected on Limesurvey platform which also provides remote access to export survey results.

Installation

You can install the development version of dcvts as follows (refreshed on an occasional basis):

remotes::install_github("maskegger/dcvts")

Usage

The dcvts package currently includes 3 rounds of data from DCVTS. There are 2 important types of datasets for each round of the survey.

Survey Data

Survey data responses from round 1 can be selected as follows:

library(dcvts)
library(dplyr)

## Selecting questions on Impact of Coronavirus from round 2 responses
dcvts_responses_r2 %>% 
  select(sampleid, iwstat, starts_with("im"))
#> # A tibble: 2,970 x 42
#>    sampleid iwstat im1   im2_sq001 im2_sq002 im2_sq003 im2_sq004 im2_sq005
#>    <chr>    <chr>  <chr> <chr>     <chr>     <chr>     <chr>     <chr>    
#>  1 0601180~ compl~ "Ver~ Yes       No        Yes       Yes       Yes      
#>  2 0603102~ compl~ "Not~ No        Yes       Yes       Yes       Yes      
#>  3 0604161~ compl~ "Ver~ Yes       Yes       No        No        Yes      
#>  4 0601020~ compl~ "Som~ Yes       No        No        No        Yes      
#>  5 0602220~ incom~ ""    N/A       N/A       N/A       N/A       N/A      
#>  6 0601050~ compl~ "Not~ No        Yes       Yes       No        No       
#>  7 0601120~ incom~ ""    N/A       N/A       N/A       N/A       N/A      
#>  8 0701031~ compl~ "Ver~ Yes       Yes       No        No        Yes      
#>  9 0604032~ wrong~ ""    N/A       N/A       N/A       N/A       N/A      
#> 10 0604070~ compl~ "Ver~ Yes       No        No        No        No       
#> # ... with 2,960 more rows, and 34 more variables: im2_sq006 <chr>,
#> #   im2_sq007 <chr>, im2_sq008 <chr>, im3 <chr>, im4 <chr>, im5_sq001 <chr>,
#> #   im5_sq002 <chr>, im5_sq003 <chr>, im5_sq004 <chr>, im5_sq005 <chr>,
#> #   im5_sq006 <chr>, im5_sq007 <chr>, im5_other <chr>, im6_sq001 <chr>,
#> #   im6_sq002 <chr>, im6_sq003 <chr>, im6_sq004 <chr>, im6_sq005 <chr>,
#> #   im6_sq006 <chr>, im6_sq007 <chr>, im6_sq008 <chr>, im7 <chr>, im8 <chr>,
#> #   im0a <lgl>, im9 <chr>, im10 <chr>, im11 <chr>, im12 <chr>, im13 <chr>,
#> #   im14 <chr>, im15 <chr>, im16 <chr>, im17 <chr>, im18 <chr>

Documentation for each question variable within the survey can be found my searching the help file for that data file. For example, documentation can be found my typing ?dcvts_responses_r2

Summarising the data

Responses to any question-variable in the survey data can be summarised as follows:

library(dcvts)
library(dplyr)
library(tidyr)
library(ggplot2)
library(forcats)

# Symptomps of Coronavirus (Kn1)

kn1_responses <- dcvts_responses_r2 %>%
  filter(iwstat == "complete interview") %>% 
  select(starts_with("kn1")) %>%
  mutate(kn1_other = case_when(kn1_other != "" ~ "Yes",
                               TRUE ~ kn1_other)) %>% 
  summarise_all(~sum(. == "Yes")/n()) %>%
  pivot_longer(everything()) %>%
  mutate(label = case_when(name == "kn1_sq001" ~ "Fever",
                           name == "kn1_sq002" ~ "Sneezing",
                           name == "kn1_sq003" ~ "Runny nose",
                           name == "kn1_sq004" ~ "Pain in throat",
                           name == "kn1_sq005" ~ "Loose Motion",
                           name == "kn1_sq006" ~ "Cough",
                           name == "kn1_sq007" ~ "Difficulty in breathing",
                           name == "kn1_sq008" ~ "Common cold",
                           name == "kn1_sq009" ~ "Body/Joint Pain",
                           name == "kn1_sq010" ~ "Headache",
                           name == "kn1_sq011" ~ "Tiredness",
                           name == "kn1_sq012" ~ "Don't Know",
                           name == "kn1_other" ~ "Other",
                           )) %>% 
  mutate(label = fct_reorder(label, value)) 

kn1_responses %>% 
  ggplot(aes(label, value)) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Coronavirus Symptoms - DCVTS, Round 2",
       y = "Percentage of Respondents",
       x = "")

Process Data

Since Couper (1998) first introduced the term “paradata” to the field of survey methodology, the term has expanded to cover all types of data about the process of collecting survey data such as interviewer call records, length of interview, keystroke data, interviewer characteristics.

The key process data from Limesurvey are the timing statistics. These are obtained by each section. Timing statistics are not available from Limesurvey’s API endpoint and are manually updated. Hence, these might not be upto date on ocassion.

Currently dcvts_timing_r1 and dcvts_timing_r2 datasets are avaiable for analysing timing statistics of round 1 and 2 respectively.

dcvts_timing_r1
#> # A tibble: 2,270 x 13
#>    sampleid iwstat startdatetime       iwstartdate iwlen iwlen_s1 iwlen_s2
#>    <chr>    <chr>  <dttm>              <date>      <dbl>    <dbl>    <dbl>
#>  1 0602101~ compl~ 2020-04-03 14:08:58 2020-04-03   11.3      4.8      1  
#>  2 0602051~ compl~ 2020-04-03 14:15:04 2020-04-03   20.5      6.3      1.1
#>  3 0701060~ compl~ 2020-04-03 14:24:19 2020-04-03   14.1      0.6      2.3
#>  4 0701061~ compl~ 2020-04-03 14:25:06 2020-04-03   10.1      2.2      1.7
#>  5 0701070~ compl~ 2020-04-03 14:27:12 2020-04-03   15.3      0.5      1.4
#>  6 0602050~ compl~ 2020-04-03 14:31:59 2020-04-03    6.5      0.8      1.1
#>  7 0602060~ compl~ 2020-04-03 14:33:46 2020-04-03   10.2      0.3      2  
#>  8 0701070~ compl~ 2020-04-03 14:35:14 2020-04-03   15.1      1.4      1.8
#>  9 0701051~ compl~ 2020-04-03 14:36:17 2020-04-03   14.1      0.4      2.1
#> 10 0602061~ compl~ 2020-04-03 14:37:53 2020-04-03   24.8      6.6      2.3
#> # ... with 2,260 more rows, and 6 more variables: iwlen_s3 <dbl>,
#> #   iwlen_s4 <dbl>, iwlen_s5 <dbl>, iwlen_s6 <dbl>, iwlen_s7 <dbl>,
#> #   iwlen_s8 <dbl>

Visualizing the data

Reducing non-response is an important quality control mechanism in household surveys. Visualizing the missingness of section level paradata among incomplete responses will help us understand which sections contributed most to this non-response.

The example from round 2 suggests, as expected, most of the nonresponse occurs at the very beginning of the interview. And we can see that if the respondent has completed section 5a, he is very likely to stay throughout and complete the interview. Further analysis on reasons for this non-response will be explored.

dcvts_timing_r2 %>%
  arrange(startdatetime) %>% 
  filter(iwstat == "incomplete") %>% 
  select(starts_with("iwlen_")) %>%
  visdat::vis_miss()

Additional Functionality (Not required to use the package)

In addition to sharing data in a tidy format easily, dcvts package can also be used to fetch live data from Limesurvey’s website as the survey is in progress. dcvts uses the limer package which provides access to LimeSurvey’s RemoteControl 2 API, allowing you to collect and analyze survey data in a simple, reproducible workflow.

Fetching live data

Firstly, we need to setup limesurvey’s authentication after loading dcvts package. To do this, simple type dcvts_limesurvey_login() which asks for your Limesurvey username and password. dcvts_getresponses() fetches the responses from Limesurvey website.

Note - If you do not have access to Limesurvey API endpoint, you need to contact your administrator (i.e., me!)

library(dcvts)

# setup limesurvey authentication.
dcvts_limesurvey_login()

# get all responses (complete/incomplete are other two options available for status)
dcvts_getresponses(round = "r1", status = "all")


maskegger/dcvts documentation built on Aug. 8, 2020, 4:34 p.m.