Description Usage Arguments Value Examples
univariate calculation for all independent variables in a data framework mean, median, variance, std, missing rate, unique rate
1 |
Data |
data frame with at least two columns |
keeplist |
Name of the Independent Variables keept for capping, if missing then for all Independent Variables |
intmiss |
automatically fill missing values. defuat for numarical vairable is NA, but can be 0 (there would be multiple missing types of a model) |
test: data is from the Titanic project https://www.kaggle.com/c/titanic/data traindata <- read.csv('train.csv',header=T,na.strings=c("")) Data <- subset(traindata,select=c(2,3,5,6,7,8,10,12)) univ(Data)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
## The function is currently defined as
function (Data, keeplist = NULL, intmiss = NULL)
{
if (!is.null(keeplist)) {
Data <- Data[, keeplist]
}
if (is.null(intmiss)) {
intmiss <- 0
}
nums <- sapply(Data, is.numeric)
Data <- Data[, nums]
varlist <- sapply(Data, function(x) {
data.frame(mean = mean(x), median = median(x), var = var(x),
sd = sd(x), nmiss = sum(is.na(x)), n = length(x),
missrate = sum(is.na(x))/length(x))
})
transfvar <- data.frame(t(varlist))
nums <- sapply(Data, is.numeric)
filldata <- Data[, nums]
filldata[is.na(filldata)] <- intmiss
Data <- filldata
return(transfvar)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.