In antononcube/ERTMon-R: Event Records Transformations Monad

library(stringi)
library(stringr)
library(RcppRoll)
library(devtools)
library(SparseMatrixRecommender)
#library(ERTMon)
devtools::load_all()

Data directory

Here we specify the directory with data and transformations specifications:

directoryName <- if( grepl("testthat", getwd()) ) { 
  file.path( getwd(), "..", "..", "data", "FakeData") 
} else { 
  file.path( getwd(), "..", "data", "FakeData")
}

Ingest computations specification data

compSpecObj <- new( "ComputationSpecification" )
compSpecObj <- readSpec( compSpecObj,  file.path( directoryName, "computationSpecification.csv" ) )  
compSpecObj <- ingestSpec( compSpecObj )

The ingestion process below is done with this data transformation specification:

compSpecObj@parameters

Ingest data

diObj <- new( "DataIngester")

diObj <- readData( diObj, 
                   file.path( directoryName, "eventRecords.csv" ),
                   file.path( directoryName, "entityAttributes.csv" ) )

diObj <- ingestData( diObj, "Label" )

dwObj <- diObj@dataObj

dwObj@labels

ERTMon does not require the data ingester object to have its fields "diedLabel" and "survivedLabel" set to have the correct values. If it is done it is for convenience or "as a memo". (The label values are set through the parameters CSV table in the class ComputationSpecification.)

#dwObj@survivedLabel <- compSpecObj@parameters[ compSpecObj@parameters$Variable == "Label", "Critical.label"]
## Note the the computation here has two different approaches.
#dwObj@diedLabel <- paste0("Non.", dwObj@survivedLabel)
#dwObj@diedLabel <- setdiff( dwObj@labels, dwObj@survivedLabel)

## If this is really needed it can be in the validation function for DataWrapper.
#assertthat::assert_that( mean( c(dwObj@diedLabel, dwObj@survivedLabel) %in% dwObj@labels ) == 1 )

Split data

Obtaining splitting indices:

set.seed(1456)
entityIDs <- unique(dwObj@eventRecords$EntityID)
trainingEntityIDs <- sample( entityIDs, floor( params$trainingDataFraction * length(entityIDs) ) )
testEntityIDs <- setdiff( unique(dwObj@eventRecords$EntityID), trainingEntityIDs )

Splitting of data into training and test parts:

trainingData <- dwObj@eventRecords[ dwObj@eventRecords$EntityID %in% trainingEntityIDs, ]
testData <- dwObj@eventRecords[ dwObj@eventRecords$EntityID %in% testEntityIDs, ]

Remark: PCCPF has a data splitter object, but here use a more direct approach in order to simulate real-life scenarios.

Transform training data

Make a new data transformer Obj:

if( params$categoricalMatricesQ ) {
  dtObj <- new( "DataTransformerCatMatrices" )
} else {
  dtObj <- new( "DataTransformer" )
}

Note that the data has not been "seen" by the data transformation object:

compSpecObj@parameters

dtObj <- transformData( object = dtObj, 
                        compSpec = compSpecObj, 
                        eventRecordsData = trainingData, 
                        entityAttributes = dwObj@entityAttributes[ (dwObj@entityAttributes$EntityID %in% trainingEntityIDs), ], 
                        outlierIdentifierParameteres = SPLUSQuartileIdentifierParameters) # also HampelIdentifierParameters or QuartileIdentifierParameters

transformedTrainingDataDF <- dtObj@transformedData

summary( as.data.frame(unclass(dtObj@transformedData), stringsAsFactors = T),  maxsum=20 )

summary( as.data.frame(unclass(dtObj@transformedData %>% dplyr::filter( MatrixName == "HR.OutFrc" )), stringsAsFactors = T) )

Matrix version of the transformed data:

transformedTrainingDataMat <- dtObj@dataMat

dtObj@groupAggregatedValues

What data do we have?

The ingestion function call above encapsulates a lot of steps. Here we show summaries of the entity data and medical data that are going to be used in the classification.

Remark: These data objects contain transformed versions of the data that is placed in the specified directory.

Patient data

summary(as.data.frame(unclass(dwObj@entityAttributes), stringsAsFactors = T))

Event records

summary(as.data.frame(unclass(dwObj@eventRecords), stringsAsFactors = T))

Transformed event records data (training)

dim(transformedTrainingDataDF)

summary(as.data.frame(unclass(transformedTrainingDataDF), stringsAsFactors = T), maxsum=12)

The corresponding data matrix:

dim(transformedTrainingDataMat)

Transformed event records data (test)

The test data should not be "known" at this point.

rm("transformedTestDataDF")
exists("transformedTestDataDF")

Transform test data

Here we repeat the transformations over the test data using the aggregation values from the training data transformation. (This is specified with the parameter testDataRun.)

dtObj <- transformData( dtObj, compSpecObj, testData, dwObj@entityAttributes[ dwObj@entityAttributes$EntityID %in% testEntityIDs, ], testDataRun = TRUE )
transformedTestDataDF <- dtObj@transformedData 
transformedTestDataMat <- dtObj@dataMat

Sparse matrix object (for explanations and proofs)

Make a Sparse Matrix Recommender (SMR) object.

smats <- dtObj@sparseMatrices
smats <- setNames( purrr::map(names(smats), function(x) {m<-smats[[x]]; colnames(m)<-paste(x,colnames(m)); m}), names(smats))

purrr::map_df( smats, function(x) data.frame( NRow = nrow(x), NCol = ncol(x) ), .id = "MatrixName" )

names(smats)

clSMRFreq <- SMRCreateFromMatrices( matrices = smats[1:8], tagTypes = NULL, itemColumnName = "EntityID" )

antononcube/ERTMon-R documentation built on Oct. 14, 2021, 2:27 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

antononcube/ERTMon-R
Event Records Transformations Monad

In antononcube/ERTMon-R: Event Records Transformations Monad

Data directory

Ingest computations specification data

Ingest data

Split data

Transform training data

What data do we have?

Patient data

Event records

Transformed event records data (training)

Transformed event records data (test)

Transform test data

Sparse matrix object (for explanations and proofs)

R Package Documentation

Browse R Packages

We want your feedback!

antononcube/ERTMon-R Event Records Transformations Monad

In antononcube/ERTMon-R: Event Records Transformations Monad

Data directory

Ingest computations specification data

Ingest data

Split data

Transform training data

What data do we have?

Patient data

Event records

Transformed event records data (training)

Transformed event records data (test)

Transform test data

Sparse matrix object (for explanations and proofs)

R Package Documentation

Browse R Packages

We want your feedback!

antononcube/ERTMon-R
Event Records Transformations Monad