findCondition: Find conditions in registry data

findConditionR Documentation

Find conditions in registry data

Description

This functions is useful for the very common task of selecting cases based on a code which has complete or partial match to a vector of character vari- ables.

The function is designed to search a group of variables (character) for multiple conditions defined in a list of named character vectors. The func- tion will produce a data.table with selected variables for cases where a match is found. In addition a list of names character vectors can have exclusions from the search. This last facility is useful if e.g. all cancer except non melanoma skin cancer is sought. In that case inclusion can have all cancers and the exclusions just the non-melanoma skin cancer.

See examples for common use of the output

Usage

findCondition(data, vars, keep, conditions, exclusions=NULL, 
match="contain",condition.name="X")

Arguments

data

Data in which to search for conditions

vars

Name(s) of variable(s) in which to search.

keep

a character vector of the columns in Data.table to keep in output

conditions

A named list of (vectors of) search strings. See examples.

exclusions

A names list of (vectors of) search strings to exclude from the output.

match

A variable to tell how to use the character vectors: "exact"=exactly matches the search string, "contains"=contains the search string, "start"=Starts with the search string, "end"=Ends with the search string

condition.name

Name of variable(s) where values define conditions. The values of this variable are the names from parameter "conditions".

Value

A data table that includes the "keep-variables" and a variable named condition.name which #' identifies the condition searched for

Author(s)

Christian Torp-Pedersen <ctp@heart.dk>, Thomas A. Gerds <tag@biostat.ku.dk>

Examples

library(heaven)
library(data.table)

# find all diagnoses that start with "DT" 
set.seed(800); adm <- simAdmissionData(800)
x <- findCondition(adm,vars=c("diag"),
        keep=c("pnr","inddto","uddto"),
        conditions=list(THIS=c("DT")),
        match="start",condition.name="THAT")
x
# restrict to first by pnr
x[x[,.I[1],by=list(pnr)]$V1]
# restrict to last by pnr
x[x[,.I[.N],by=list(pnr)]$V1]
 
opr <- data.table(
  pnr=1:100,opr=paste0(rep(c('A','B'),50),seq(0,100,10)),
  oprtil=paste0(rep(c('A','C'),50),seq(0,100,10)),
  odto=101:200
)
search <- list(Cond1=c('A1','A2'),Cond2=c('B10','B40','B5'),
Cond3=c('A1','C20','B2'))

excl <- list(Cond2='B100')

out <- findCondition(opr,vars=c("opr","oprtil"),
        keep=c("pnr","odto"),
        conditions=search, exclusions=excl,
        match="start",condition.name="cond")
### And how to use the result:
# Find first occurence of each condition and then use "dcast" to create
# a data.table with vectors corresponding to each condition.
test <- out[,list(min=min(odto)),by=c("pnr","cond")]
# provide a list of variables with one value each
test2 <- dcast(pnr~cond,data=test,value.var="min")
test2 # A datatable with first dates of each condition for each pnr, but only 
      # for pnr with at least one condition
# Define a condition as present when before a certain index date
dates <- data.table (pnr=1:100,basedate=sample(0:200,size=100,replace=TRUE))       
test3 <- merge(out,dates,by="pnr")
test3[,before:=as.numeric(odto<=basedate)] # 1 when condition fulfille
test3 <- test3[,list(before=max(before)),by=c("pnr","cond")]
test4 <- dcast(pnr~cond,value.var="before",data=test3)
test4[is.na(test4)] <- 0 # Converts NAs to zero
test4[]


tagteam/heaven documentation built on Oct. 24, 2024, 7:40 p.m.