findFeatures: findFeatures

View source: R/featurefinder.r

findFeaturesR Documentation

findFeatures

Description

Perform analysis of residuals grouped by factor to identify features which explain the target variable

Usage

findFeatures(
  OutputPath,
  fcsv,
  ExclusionVars,
  FactorToNumericList,
  treeGenerationMinBucket = 50,
  treeSummaryMinBucket = 20,
  treeSummaryResidualThreshold = 0,
  treeSummaryResidualMagnitudeThreshold = 0,
  doAllFactors = TRUE,
  maxFactorLevels = 20
)

Arguments

OutputPath

A string containing the location of the input csv file. Results are also stored in this location.

fcsv

A string containing the name of a csv file

ExclusionVars

A string consisting of a list of variable names with double quotes around each variable

FactorToNumericList

A list of variable names as strings

treeGenerationMinBucket

Desired minimum number of data points per leaf (default 50)

treeSummaryMinBucket

Minimum number of data points in each leaf for the summary (default 20)

treeSummaryResidualThreshold

Minimum residual in the summary (default 0 for positive residuals)

treeSummaryResidualMagnitudeThreshold

Minimum residual magnitude in the summary (default 0 i.e. no restriction)

doAllFactors

Flag to indicate whether to analyse the levels of all factor variables (default TRUE)

maxFactorLevels

(maximum number of levels per factor before it is converted to numeric (default 20)

Value

Saves residual CART trees and associated highlighted residuals for each to the path provided.

Examples


require(featurefinder)
data(mycsv)
data$SMIfactor=paste("smi",as.matrix(data$SMIfactor),sep="")
nn=floor(length(data$DAX)/2)

# Can we predict the relative movement of DAX and SMI?
data$y=data$DAX*0
data$y[1:(nn-1)]=((data$DAX[2:nn])-(data$DAX[1:(nn-1)]))/
                  (data$DAX[1:(nn-1)])-(data$SMI[2:nn]-(data$SMI[1:(nn-1)]))/(data$SMI[1:(nn-1)])

thismodel=lm(formula=y ~ .,data=data)
expected=predict(thismodel,data)
actual=data$y
residual=actual-expected
data=cbind(data,expected, actual, residual)

OutputPath=tempdir()
fcsv <- file.path(OutputPath, "mycsv.csv")
write.csv(data[(nn+1):(length(data$y)),], file = fcsv, row.names=FALSE)

ExclusionVars="\"residual\",\"expected\", \"actual\",\"y\""
FactorToNumericList=c()
findFeatures(OutputPath, fcsv, ExclusionVars,FactorToNumericList,                     
         treeGenerationMinBucket=50,
         treeSummaryMinBucket=20)  

featurefinder documentation built on April 4, 2025, 12:30 a.m.