TwoSpamH: The 2SpamH algorithm
In jihuilee/TwoSpamH:

View source: R/TwoSpamH.R

TwoSpamH

R Documentation

The 2SpamH algorithm

Description

This function performs the 2SpamH algorithm on a ceratin variable in the input data

Usage

TwoSpamH(
  data,
  variable,
  PC.vars,
  step2.var = c(""),
  imp.method = na_mean,
  thresholds = list(low = c(0.3), high = c(0.7)),
  num.neighbor = 5,
  check.cor = 0.8,
  plot.data = F,
  seed = NULL
)

Arguments

`data`	A data frame contains the variables to be labelled and the variables on which principle component analysis is performed (PC vars).
`variable`	A variable name or the position indexes of the variable to be labelled.
`PC.vars`	A list object, such that each element of this list contains the vectors of either the name of the PC vars or the position indexes of the PC vars in the data frame.
`step2.var`	A vector of variable names to be added into the step 2 KNN feature space.
`imp.method`	A function which serves as the imputation method for missing data in PC vars. This function should take a vector with missing and retun a vector without.
`thresholds`	A list of which the first element contains the 'low' quantile thresholds for each group of PC var, and the second for 'high'. Each element in it should be a vector eitehr of the same length as the number of PC var groups or 1.
`num.neighbor`	Number of the neibors considered by each unlabelled data points in stage 2
`check.cor`	Whether the highly correlated variables should be removed when performing stage 2 of the TSknn. If no, input should be NULL. If yes, input shou be the correlation threshold for variables to be removed.
`plot.data`	If TRUE, the outputted results are for plotting. If FALSE, the function outputs the original data frame where the filtered variable is labelled with extra NAs.
`seed`	The seed to be set, default is NULL.