Description Usage Arguments Details Author(s) Examples
Capping inputs by diferent level for numarical variables there are two types of capping, one is by quantile, the other is by standard deviation
1 |
Data |
data frame with at least two columns |
keeplist |
Name of the Independent Variables keept for capping, if missing then for all Independent Variables |
y |
Name of the dependent variable |
level |
by the level of quantile/standard deviation to cap, would be any numerical value captype='quant', could be +-1 or +-5, defaut +-5 captype='std', could be +-3 or +-5, defaut +-3 |
captype |
two captype, by quantile 'quant', or by standard deviation 'std'. captype='quant', means the capping low/high bound would be low/high quantile level captype='std', means the capping low/high bound would be low/high standard deviation level |
test: data is from the Titanic project https://www.kaggle.com/c/titanic/data traindata <- read.csv('train.csv',header=T,na.strings=c("")) Data <- subset(traindata,select=c(2,3,5,6,7,8,10,12)) capping(Data, y='Survived')
Yuan Yao
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
## The function is currently defined as
function (Data, keeplist = NULL, y, level = NULL, captype = "quant")
{
if (is.null(keeplist)) {
keeplist <- names(Data)[names(Data) != y]
}
if (is.null(level)) {
if (captype == "quant") {
level <- 5
}
if (captype == "std") {
level <- 3
}
}
if (captype == "quant") {
varlist <- lapply(keeplist, function(x) {
quantiles <- quantile(Data[, x], c(level/100, (100 -
level)/100))
Data[, x][Data[, x] < quantiles[1]] <- quantiles[1]
Data[, x][Data[, x] > quantiles[2]] <- quantiles[2]
Data[, x]
})
}
if (captype == "std") {
varlist <- lapply(keeplist, function(x) {
stdbond <- c(mean(Data[, x]) - level * sd(Data[,
x]), mean(Data[, x]) + level * sd(Data[, x]))
Data[, x][Data[, x] < stdbond[1]] <- stdbond[1]
Data[, x][Data[, x] > stdbond[2]] <- stdbond[2]
Data[, x]
})
}
return(varlist)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.