filter_data: Prepare input data

Description Usage Arguments Details Value Examples

View source: R/input_processing.R

Description

This function prepares the raw input data for model fitting.

Usage

1
2
3
filter_data(data, detection_threshold = 20, censortime = 365,
  censor_value = 10, decline_buffer = 500, initial_buffer = 3,
  n_min_single = 3, threshold_buffer = 10, nsuppression = 1)

Arguments

data

raw data set. Must be a data frame with the following columns: 'id' - stating the unique identifier for each subject; 'vl' - numeric vector with the viral load measurements for each subject; 'time' - numeric vector of the times at which each measurement was taken.

detection_threshold

numeric value indicating the detection threshold of the assay used to measure viral load. Measurements below this value will be assumed to represent undetectable viral load levels. Default value is 20.

censortime

numeric value indicating the maximum time point to include in the analysis. Subjects who do not suppress viral load below the detection threshold within this time will be discarded. Units are assumed to be the same as the 'time' column. Default value is 365.

censor_value

positive numeric value indicating the maximum time point to include in the analysis. Subjects who do not suppress viral load below the detection threshold within this time will be discarded. Units are assumed to be the same as the 'time' column. Default value is 365.

decline_buffer

numeric value indicating the value assigned to measurements below the detection threshold. Must be less than or equal to the detection threshold.

initial_buffer

numeric (integer) value indicating the maximum number of initial observations from which the beginning of each trajectory will be chosen. Default value is 3.

n_min_single

numeric value indicating the minimum number of data points required to be included in the analysis. Defaults to 3. It is highly advised not to go below this threshold.

threshold_buffer

numerical value indicating the range above the detection threshold which represents potential skewing of model fits. Subjects with their last two data points within this range will have the last point removed. Default value is 10.

nsuppression

numerical value (1 or 2) indicating whether suppression is defined as having one observation below the detection threshold, or two sustained observations. Default value is 1.

Details

Steps include: 1. Setting values below the detection threshold to half the detection threshold (following standard practice). 2. Filtering out subjects who do not suppress viral load below the detection threshold by a certain time. 3. Filtering out subjects who do not have a decreasing sequence of viral load (within some buffer range). 4. Filtering out subjects who do not have enough data for model fitting. 5. Removing the last data point of subjects with the last two points very close to the detection threshold. This prevents skewing of the model fit. Further details can be found in the Vignette.

Value

data frame of individuals whose viral load trajectories meet the criteria for model fitting. Includes columns for 'id', 'vl', and 'time'.

Examples

1
2
3
4
5
set.seed(1234567)

simulated_data <- simulate_data(nsubjects = 20)

filter_data(simulated_data)

ushr documentation built on April 22, 2020, 1:05 a.m.