findFeatures: findFeatures
In featurefinder: Feature Finder

View source: R/featurefinder.r

findFeatures

R Documentation

findFeatures

Description

Perform analysis of residuals grouped by factor to identify features which explain the target variable

Usage

findFeatures(
  OutputPath,
  fcsv,
  ExclusionVars,
  FactorToNumericList,
  treeGenerationMinBucket = 50,
  treeSummaryMinBucket = 20,
  treeSummaryResidualThreshold = 0,
  treeSummaryResidualMagnitudeThreshold = 0,
  doAllFactors = TRUE,
  maxFactorLevels = 20
)

Arguments

`OutputPath`	A string containing the location of the input csv file. Results are also stored in this location.
`fcsv`	A string containing the name of a csv file
`ExclusionVars`	A string consisting of a list of variable names with double quotes around each variable
`FactorToNumericList`	A list of variable names as strings
`treeGenerationMinBucket`	Desired minimum number of data points per leaf (default 50)
`treeSummaryMinBucket`	Minimum number of data points in each leaf for the summary (default 20)
`treeSummaryResidualThreshold`	Minimum residual in the summary (default 0 for positive residuals)
`treeSummaryResidualMagnitudeThreshold`	Minimum residual magnitude in the summary (default 0 i.e. no restriction)
`doAllFactors`	Flag to indicate whether to analyse the levels of all factor variables (default TRUE)
`maxFactorLevels`	(maximum number of levels per factor before it is converted to numeric (default 20)

Value

Saves residual CART trees and associated highlighted residuals for each to the path provided.

Examples


require(featurefinder)
data(mycsv)
data$SMIfactor=paste("smi",as.matrix(data$SMIfactor),sep="")
nn=floor(length(data$DAX)/2)

# Can we predict the relative movement of DAX and SMI?
data$y=data$DAX*0
data$y[1:(nn-1)]=((data$DAX[2:nn])-(data$DAX[1:(nn-1)]))/
                  (data$DAX[1:(nn-1)])-(data$SMI[2:nn]-(data$SMI[1:(nn-1)]))/(data$SMI[1:(nn-1)])

thismodel=lm(formula=y ~ .,data=data)
expected=predict(thismodel,data)
actual=data$y
residual=actual-expected
data=cbind(data,expected, actual, residual)

OutputPath=tempdir()
fcsv <- file.path(OutputPath, "mycsv.csv")
write.csv(data[(nn+1):(length(data$y)),], file = fcsv, row.names=FALSE)

ExclusionVars="\"residual\",\"expected\", \"actual\",\"y\""
FactorToNumericList=c()
findFeatures(OutputPath, fcsv, ExclusionVars,FactorToNumericList,                     
         treeGenerationMinBucket=50,
         treeSummaryMinBucket=20)

featurefinder documentation built on April 4, 2025, 12:30 a.m.