scrubRows: Function to scrub sensata data

View source: R/scrubRows.R

scrubRowsR Documentation

Function to scrub sensata data

Description

This function eliminates cases because they are duplicates, or minors, or due to duration of the survey, or because it doesn't have geolocation coordinates. It also creates attributes with the number of people left after each scrubbing step.

Usage

scrubRows(
  df,
  removeDupes = T,
  timeMin = 2.5,
  geoloc = F,
  ageVar = NULL,
  ageVal = NULL,
  testParamName = NULL,
  completeVars = NULL,
  maxSkippedQs = NULL,
  skipQuestionString = "S99",
  particularVal = NULL
)

Arguments

df

data downloaded from Mongo and cleaned with cleanData.R

removeDupes

logical, if TRUE it scrubs removeDupeslicate data

timeMin

minimum amount of minutes that the survey should have. Default 2.5 mins. If no scrubbing by time is required make it 0.

geoloc

logical, if TRUE it will scrub surveys that have no geolocation.

ageVar

name of variable of age variable, if empty then it will not scrub by age.

ageVal

value(s) of age that should be excluded, if ageVar numeric, or more than one ageVar should be scrubbed, provide all values as a vector.

testParamName

character object of name of test param, usually test (the full column is called params.test)

completeVars

character vector of variables that have to be complete. It erases individuals that did not answer ALL of them.

maxSkippedQs

maximum number of missing questions accepted, if someone skips more than this number of questions, then they will be scrubbed.

skipQuestionString

skip question string, by default S99

particularVal

named vector, where name is the variable to be used as filter and the value is the value to be kept

Value

Dataframe with the cases scrubbed, and the attributes with the number of cases left after each step for the report: oriNum, removeDupesNum, ageNum, timeNum, geoNum, and finNum

Author(s)

Gabriel N. Camargo-Toledo gcamargo@sensata.io

Examples

bogData1 <- bogData1 %>% scrubData(removeDupes = F, ageVar = "EVCS2", ageVal = "Menos de 18 aƱos", geoloc = T)

SensataUX/sensataDataProg documentation built on April 18, 2023, 3:48 p.m.