fia: A function identifying features of importance.

View source: R/mrbin.R

fiaR Documentation

A function identifying features of importance.

Description

This function finds features that can change the outcomes of a model's prediction. Example: fia=1.00 means single compound found in all but 0 percent of samples. fia=2.45 indicates this compound is found in pairs in all but 45 percent of tested samples A function named predict needs to be present for this to work. If the function name of the prediction function is different, the function name has to be provided in the parameter functionNamePredict.

Usage

fia(
  model,
  dataSet,
  factors,
  nSeed = 6,
  numberOfSamples = 100,
  maxFeatures = 10000,
  innerLoop = 300,
  verbose = TRUE,
  maxNumberAllTests = 5,
  firstLevel = 1,
  saveMemory = FALSE,
  kerasClearMemory = 0,
  functionNamePredict = "predict",
  parameterNameObject = "object",
  parameterNameData = "x",
  ...
)

Arguments

model

A predictive model. Make sure to have loaded all required packages before starting this function

dataSet

An object containing data, columns=features, rows=samples. This should be either a matrix or a dataframe, depending on which of these two the specific prediction function requires

factors

A factor vector with group membership of each sample in the data set. Order of levels must correspond to the number predicted by the model

nSeed

Number of times that the test will be repeated, selecting different random features

numberOfSamples

Number of samples that will be randomly chosen from each group

maxFeatures

Maximum number of features that will be tested. Larger numbers will be split into child nodes without testing to increase speed

innerLoop

Number of repeated loops to test additional child nodes

verbose

A logical vector to turn messages on or off

maxNumberAllTests

Combinations of features of this length or shorter will not be split in half to create two children, but into multiple children with one feature left out each. This is done make sure no combination is missed.

firstLevel

Numeric value of first level or group. Usually 1 but for glm such as in the example this needs to be 0.

saveMemory

Save memory by performing only two predictions per step, which will be much slower. If you are using keras, use parameter kerasClearMemory=2 instead to free more memory and be a lot faster. FALSE to turn off.

kerasClearMemory

Save memory by clearing model from memory and reloading the model between chunks of predictions. Will only work when using package keras. 0=off, 1=medium (reload between repeat with different seeds), 2=maximum memory savings (reload after each run for a single sample). This will write a model file to the working directory.

functionNamePredict

The name of the prediction function. This only needs to be changed if the prediction function is not called predict

parameterNameObject

The name of the parameter for passing the model to the prediction function

parameterNameData

The name of the parameter for passing the data to the prediction function

...

Optional, additional parameters that will be passed to the prediction function.

Value

A list of results: scoresSummary A vector of fia scores for the whole dataset; scores contains vectors of fia scores for each predicted group; scoresIndividual A list of fia scores for each individual sample; fiaListPerSample A list of important combinations of features for each predicted sample; fiaMatrix A list of fia scores for each predicted group.

Examples

 #First, define group membership and create the example feature data
 group<-factor(c(rep("Group1",4),rep("Group2",5)))
 names(group)<-paste("Sample",1:9,sep="")
 dataset<-data.frame(
   Feature1=c(5.1,5.0,6.0,2.9,4.8,4.6,4.9,3.8,5.1),
   Feature2=c(2.6,4.0,3.2,1.2,3.1,2.1,4.5,6.1,1.3),
   Feature3=c(3.1,6.1,5.8,5.1,3.8,6.1,3.4,4.0,4.4),
   Feature4=c(5.3,5.2,3.1,2.7,3.2,2.8,5.9,5.8,3.1),
   Feature5=c(3.2,4.4,4.8,4.9,6.0,3.6,6.1,3.9,3.5),
   Feature6=c(6.8,6.7,7.2,7.0,7.3,7.1,7.2,6.9,6.8)
   )
 rownames(dataset)<-names(group)
 #train a model - here we use a logit model instead of ANN as a demonstration
 mod<-glm(group~Feature1+Feature2+Feature3+Feature4+Feature5+Feature6,
   data=data.frame(group=group,dataset),family="binomial")
 fiaresults<-fia(model=mod,dataSet=dataset,factors=group,parameterNameData="newdata",
   firstLevel=0,type="response")
 fiaresults$scores

mrbin documentation built on June 22, 2024, 9:39 a.m.