getFeatureVectors: Create feature vectors that can be used for logit modeling

Description Usage Arguments Value See Also Examples

Description

Create feature vectors that can be used for logit modeling

Usage

1
getFeatureVectors(patternDays, events)

Arguments

patternDays

list of patterns found by cSPADE, for each pattern there is a list of patientIDs (see getPatIDs), for each patientID there is a list of eventIDs when the pattern was observed

event

dataframe, rows are single events (used as input to cSPADE), columns are event details plus patient demogrpahics, tumor laterality, survival labels and MGMT biomarker

Value

dataframe, where rows are clinical visits, and columns are features of the visit that can be used for logit modeling: binary temporal (cSPADE patterns) events, demographic data, biomarkers, and tumor lateriality.

See Also

vectorizePatterns

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
data("fake_data")
outputDir <- '~/test'

# save tumor location and laterility strings before event cleaning
fake_tumorInfo <- fake_data$events 
fake_demo <- fake_data$demo

# clean data
fake_data$events <- cleanData(fake_data$events, tType = 'rate')
cat('...',nrow(fake_data$events), " events left for SPM after cleaning", '\n')

# collect patient info for each event
fake_data <- merge(fake_data$events, fake_data$person, by='iois', all.x=T) 

# prep for each event, since age does change
# get survival labels, these also change
fake_data <- prepDemographics(fake_data, fake_demo) 
fake_data <- prepSurvivalLabels(fake_data) 

# get first tumor location
fake_data <- getTumorLocation(fake_data, fake_tumorInfo) 

# spm
pSPM <- getSeqPatterns(event = fake_data,
                       transFilename = file.path(outputDir, 'example_transactions.txt'),
                       createT = T,
                       support = 0.4,
                       maxgap = 60,
                       maxlen = 2,
                       maxsize = 2)
pSPM$patterns <- as(pSPM$freqseq, "data.frame")
pSPM$patterns$sequence <- as.character(pSPM$patterns$sequence)

# days when pattern occur
patternDays <- findPatternDays(pSPM$patterns, pSPM$data, maxgap=60)

# feature vectors to supply to logits
feat_vecs <- getFeatureVectors(patternDays, events=pSPM$data)

novasmedley/gbmSpm documentation built on May 17, 2019, 10:39 a.m.