CoRF: CoRF: improved high-dimensional prediction with the RF by the...

Description Usage Arguments Value Reference Author(s) Examples

View source: R/CoRF.R

Description

This fits a RF guided by co-data (CoRF).

Usage

1
2
3
4
5
CoRF(Y, X, CoData, CoDataModelText = NULL, CoDataRelation = NULL,
  ScreenCoData = FALSE, ScreenPvalThreshold = NULL, TuneGamma = FALSE,
  GammaSeq = NULL, GammaNVar = NULL, BaseRF = NULL,
  ForestsInOutput = TRUE, setseed = 1, importance = c("none"),
  nodesize = 2, ntree = 2000, ...)

Arguments

Y

A response variable.

X

The primary set of variables (n rows, and p colums).

CoData

A data.frame containing the co-data, p rows and one columns per set of co-data.

CoDataModelText

Optionally a text string containing the specification of the co-data model. Not needed if CoDataRelation is specified.

CoDataRelation

The to be fitted relationship in the co-data model, e.g. linear, (monotome) increasing, (monotome) decreasing. Alternatively, choose one of the scam smooth contrained contructs (mpd, mpi, mdcv, mdcx, micv, micx, cv, cx, see ??shape.constrained.smooth.terms).

ScreenCoData

Boolean that indicates whether or not co-data selection step should be conducted Default is TRUE.

ScreenPvalThreshold

The threshold value used in the co-data selection step (when ScreenCoData is FALSE). If TRUE it Defaults to 0.05/CD, where CD is the number of co-data sets.

TuneGamma

If TRUE this sets GammaSeq to c(1/3, 2/3, 0.9, 1, 1.1, 1.2, 1.3). CoRF is fitted for each value of gamma, and returns results for all refited forests. Default is FALSE, in which case GammaSeq = 1.

GammaSeq

Specifies the sequence of gamma values. Default is to only use a gamma of 1.

BaseRF

Optionally use an earlier fitted base RF (i.e. uniform sampling probabilities).

ForestsInOutput

Whether or not to save the various rfsrc objects. These objects are needed to make further predictions. Defaults to TRUE. Optionally set to FALSE if the CoRF objects become too big and are not needed.

setseed

seed used to fit the RF.

importance

default set to "none". Not needed to fit CoRF and very computational expensive.

nodesize

Sets the minimal node size, set at recommended default for CoRF (2).

ntree

The number of trees used in CoRF, default set to 2000. Convergences improves for a larger of number trees. If set too low the co-data model will give a poor fit.

Value

Returns an CoRF object with the following components.

InitialCall

Details of the statements used when calling CoRF.

SamplingProbabilities

The sampling probabilities used in refitting. A matrix of size p x number of gammas.

ScreenPval

The screening p-values. Each p-value is the result of glm fit per type of co-data (univariable) to the variables used. For monotome increasing/decreasing relationships p-value is one-sided.

SavedForests

A list of the fitted RFs. Element [[1]] contains the base RF, subsequent elements contain the refitted RFs. See attr(SavedForests[[1]],"WhichForest").

CoDataModel

Fit of the co-data model.

ResultPerGamma

Result overview of the fit of the base RF and of the refitted RFs.

Reference

CoRF paper

Author(s)

authors

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#---Run CoRF:
#data(LNM_Example)
#CoDataRelation <- c("increasing","linear","decreasing")
#CoRF_Fit <- CoRF(Y=RespTrain,X=TrainData,CoData=CoDataTrain,CoDataRelation=CoDataRelation,ntree=2000)

#---These elements contains the rfsrc objects
#CoRF_Fit$SavedForests contains the rfsrc objects
#CoRF_Fit$SavedForests[[1]]  #The base RF
#CoRF_Fit$SavedForests[[2]]  #Subsequent numbers contain CoRF fits

#---Overview of results, per gamma. The first row is the base RF:
#CoRF_Fit$ResultPerGamma

#---SamplingProbabilities used to refit:
#CoRF_Fit$SamplingProbabilities

#---Co-data model:
#CoRF_Fit$CoDataModel

#---Plot fit of the co-data model:
#plot(CoRF_Fit$CoData$Corrs,plogis(predict(CoRF_Fit$CoDataModel)))

#---The second option to run CoRF is through specifying the co-data model with CoDataModelText
#Example to run CoRF through CoDataModelText

#---first take care of the missing values.
#CoData$pvalsVUmc[is.na(CoData$pvalsVUmc)] <- mean(CoData$pvalsVUmc,na.rm=TRUE)

#---specify the co-data model:
#CoDataModelText <- "~ s(Corrs,k=25,bs=\"mpi\",m=2)+RoepmanGenes+s(pvalsVUmc,k=25,bs=\"mpd\",m=2)"
#CoRF_Fit <- CoRF(Y=RespTrain,X=TrainData,CoData=CoDataTrain,CoDataModelText=CoDataModelText)

#---The third way to run CoRF is directly through randomForestSRC and scam (or glm).
#DF <- data.frame(Ydf=RespTrain,Xdf=TrainData)
#Forest <- rfsrc(Ydf ~ .,data=DF,ntree=2000,var.used="all.trees",importance=c("none"),nodesize=2,seed=1)
#CoDataTrain$pvalsVUmc[is.na(CoDataTrain$pvalsVUmc)] <- mean(CoDataTrain$pvalsVUmc,na.rm=TRUE)
#CoDataModell <- scam(VarUsed/sum(VarUsed)~  s(Corrs,k=25,bs="mpi",m=2)+RoepmanGenes+s(pvalsVUmc,k=25,bs="mpd",m=2),data=CoDataTrain,family=quasibinomial)
#preds <- as.numeric(plogis(predict(CoDataModell)))
#P <- length(preds)
#preds2 <- pmax(preds2-1/P,0)
#Mtry <- ceiling(sqrt(sum(preds2!=0)))
#ReffitedCoRF <- rfsrc(Ydf ~ .,data=DF,ntree=2000,var.used="all.trees",importance=c("none"),xvar.wt=preds2,mtry=Mtry,nodesize=2,setseed=1)

DennisBeest/CoRF documentation built on Feb. 20, 2020, 11:06 p.m.