CoRF: CoRF: improved high-dimensional prediction with the RF by the...
In DennisBeest/CoRF: Co-data guided RandomForest (CoRF)

CoRF	R Documentation

CoRF: improved high-dimensional prediction with the RF by the use of co-data.

Description

This fits a RF guided by co-data (CoRF).

Usage

CoRF(Y, X, CoData, CoDataModelText = NULL, CoDataRelation = NULL,
  ScreenCoData = FALSE, ScreenPvalThreshold = NULL, TuneGamma = FALSE,
  GammaSeq = NULL, GammaNVar = NULL, BaseRF = NULL,
  ForestsInOutput = TRUE, setseed = 1, importance = c("none"),
  nodesize = 2, ntree = 2000, ...)

Arguments

`Y`	A response variable.
`X`	The primary set of variables (n rows, and p colums).
`CoData`	A data.frame containing the co-data, p rows and one columns per set of co-data.
`CoDataModelText`	Optionally a text string containing the specification of the co-data model. Not needed if CoDataRelation is specified.
`CoDataRelation`	The to be fitted relationship in the co-data model, e.g. linear, (monotome) increasing, (monotome) decreasing. Alternatively, choose one of the scam smooth contrained contructs (mpd, mpi, mdcv, mdcx, micv, micx, cv, cx, see ??shape.constrained.smooth.terms).
`ScreenCoData`	Boolean that indicates whether or not co-data selection step should be conducted Default is TRUE.
`ScreenPvalThreshold`	The threshold value used in the co-data selection step (when ScreenCoData is FALSE). If TRUE it Defaults to 0.05/CD, where CD is the number of co-data sets.
`TuneGamma`	If TRUE this sets GammaSeq to c(1/3, 2/3, 0.9, 1, 1.1, 1.2, 1.3). CoRF is fitted for each value of gamma, and returns results for all refited forests. Default is FALSE, in which case GammaSeq = 1.
`GammaSeq`	Specifies the sequence of gamma values. Default is to only use a gamma of 1.
`BaseRF`	Optionally use an earlier fitted base RF (i.e. uniform sampling probabilities).
`ForestsInOutput`	Whether or not to save the various rfsrc objects. These objects are needed to make further predictions. Defaults to TRUE. Optionally set to FALSE if the CoRF objects become too big and are not needed.
`setseed`	seed used to fit the RF.
`importance`	default set to "none". Not needed to fit CoRF and very computational expensive.
`nodesize`	Sets the minimal node size, set at recommended default for CoRF (2).
`ntree`	The number of trees used in CoRF, default set to 2000. Convergences improves for a larger of number trees. If set too low the co-data model will give a poor fit.

Value

Returns an CoRF object with the following components.

`InitialCall`	Details of the statements used when calling CoRF.
`SamplingProbabilities`	The sampling probabilities used in refitting. A matrix of size p x number of gammas.
`ScreenPval`	The screening p-values. Each p-value is the result of glm fit per type of co-data (univariable) to the variables used. For monotome increasing/decreasing relationships p-value is one-sided.
`SavedForests`	A list of the fitted RFs. Element [[1]] contains the base RF, subsequent elements contain the refitted RFs. See attr(SavedForests[[1]],"WhichForest").
`CoDataModel`	Fit of the co-data model.
`ResultPerGamma`	Result overview of the fit of the base RF and of the refitted RFs.

Reference

CoRF paper

Author(s)

authors

Examples

#---Run CoRF:
#data(LNM_Example)
#CoDataRelation <- c("increasing","linear","decreasing")
#CoRF_Fit <- CoRF(Y=RespTrain,X=TrainData,CoData=CoDataTrain,CoDataRelation=CoDataRelation,ntree=2000)

#---These elements contains the rfsrc objects
#CoRF_Fit$SavedForests contains the rfsrc objects
#CoRF_Fit$SavedForests[[1]]  #The base RF
#CoRF_Fit$SavedForests[[2]]  #Subsequent numbers contain CoRF fits

#---Overview of results, per gamma. The first row is the base RF:
#CoRF_Fit$ResultPerGamma

#---SamplingProbabilities used to refit:
#CoRF_Fit$SamplingProbabilities

#---Co-data model:
#CoRF_Fit$CoDataModel

#---Plot fit of the co-data model:
#plot(CoRF_Fit$CoData$Corrs,plogis(predict(CoRF_Fit$CoDataModel)))

#---The second option to run CoRF is through specifying the co-data model with CoDataModelText
#Example to run CoRF through CoDataModelText

#---first take care of the missing values.
#CoData$pvalsVUmc[is.na(CoData$pvalsVUmc)] <- mean(CoData$pvalsVUmc,na.rm=TRUE)

#---specify the co-data model:
#CoDataModelText <- "~ s(Corrs,k=25,bs=\"mpi\",m=2)+RoepmanGenes+s(pvalsVUmc,k=25,bs=\"mpd\",m=2)"
#CoRF_Fit <- CoRF(Y=RespTrain,X=TrainData,CoData=CoDataTrain,CoDataModelText=CoDataModelText)

#---The third way to run CoRF is directly through randomForestSRC and scam (or glm).
#DF <- data.frame(Ydf=RespTrain,Xdf=TrainData)
#Forest <- rfsrc(Ydf ~ .,data=DF,ntree=2000,var.used="all.trees",importance=c("none"),nodesize=2,seed=1)
#CoDataTrain$pvalsVUmc[is.na(CoDataTrain$pvalsVUmc)] <- mean(CoDataTrain$pvalsVUmc,na.rm=TRUE)
#CoDataModell <- scam(VarUsed/sum(VarUsed)~  s(Corrs,k=25,bs="mpi",m=2)+RoepmanGenes+s(pvalsVUmc,k=25,bs="mpd",m=2),data=CoDataTrain,family=quasibinomial)
#preds <- as.numeric(plogis(predict(CoDataModell)))
#P <- length(preds)
#preds2 <- pmax(preds2-1/P,0)
#Mtry <- ceiling(sqrt(sum(preds2!=0)))
#ReffitedCoRF <- rfsrc(Ydf ~ .,data=DF,ntree=2000,var.used="all.trees",importance=c("none"),xvar.wt=preds2,mtry=Mtry,nodesize=2,setseed=1)

DennisBeest/CoRF documentation built on July 27, 2024, 8:32 p.m.