dlabel: Distance label

View source: R/dlabel.R

dlabelR Documentation

Distance label

Description

This function adds a label to a data frame based on the distance between records as defined by the distance column and calculated by dfilter. The label is a boolean stating whether the distance between records is within the specified range.

bc_contamination calls dlabel with default parameters for the labelling of possibly contaminated blood cultures.

Usage

dlabel(
  df,
  id,
  category,
  distance,
  min.dist,
  max.dist,
  temporal.unit,
  label_name = "dlabel",
  invert = FALSE,
  df_filter = NULL
)

bc_contamination(
  ...,
  min.dist = 5,
  max.dist = 7 * 60 * 24,
  temporal.unit = "minutes",
  label_name = "contamination",
  invert = TRUE
)

Arguments

df

A data frame.

id

A character string specifying the column name of the id.

category

A character string specifying the column name of the category.

distance

A character string specifying the column name of the distance column; see dfilter for details.

min.dist

A numeric value specifying the minimum distance between records; see dfilter for details.

max.dist

A numeric value specifying the maximum distance between records; see dfilter for details.

temporal.unit

A character string specifying the temporal unit of the distance; see dfilter for details.

label_name

A character string specifying the name of the label column.

invert

A logical value specifying whether to invert the label.

df_filter

A character string specifying the filter expression to be applied to the data frame before the distance label is calculated.

...

Arguments to be passed to dlabel.

Value

A data frame with the distance label added.

Examples

# create test data
set.seed(123)
dl.test <- data.frame(id = sample(1:10, 30, replace = TRUE), 
           category = sample(letters[1:4], 30, replace = TRUE), 
           timestamp = as.POSIXct(runif(30, 1704063600, 1711922400), 
                                        origin = "1970-01-01"))

# test: dlabel will reveal three id-category combinations with temporal 
# distances within the range of 2 to 40 days pertaining to category 'a'
test <- dlabel(dl.test, id = "id", category = "category", 
        distance = "timestamp", 
        min.dist = 2, max.dist = 40,
        temporal.unit = "days",
        label_name = "within_range",
        df_filter = "category == 'a'")

set.seed(123)

bugs <- data.frame(species = c("S. epidermidis", "C. acnes", "S. aureus", "E. coli"),
                  category = c("skin flora", "skin flora", "pathogen", "pathogen"))

blood_cultures <- data.frame(lab_no = 1:50,
                       patient = sample(1:10, 50, replace = TRUE),
                       species = sample(bugs$species, 50, replace = TRUE),
                       timestamp = as.POSIXct(runif(50, 1704063600, 1711000000), 
                                               origin = "1970-01-01"))

blood_cultures <- blood_cultures %>% left_join(bugs, by = "species")

bc_conta <- bc_contamination(blood_cultures, 
                            id = "patient", 
                            category = "species", 
                            distance = "timestamp",
                            df_filter = "category == 'skin flora'")

# Patient 9 has 5 cultures with skin flora, which, despite revealing
# skin flora, could correspond to infection (field contamination equals
# to FALSE), as these cultures satisfy the temporal distance criterion
# given by min.dist and max.dist.

# check:

# bc_conta %>% filter(category=="skin flora" & !contamination)

# The remaining samples yielding skin flora likely represent 
# contamination, as their temporal occurrence is outside the range
# given by min.dist and max.dist.


joheli/kungfu documentation built on March 25, 2024, 10:10 a.m.