univ: univariate calcualted for data framework

Description Usage Arguments Value Examples

Description

univariate calculation for all independent variables in a data framework mean, median, variance, std, missing rate, unique rate

Usage

1
univ(Data, keeplist = NULL, intmiss = NULL)

Arguments

Data

data frame with at least two columns

keeplist

Name of the Independent Variables keept for capping, if missing then for all Independent Variables

intmiss

automatically fill missing values. defuat for numarical vairable is NA, but can be 0 (there would be multiple missing types of a model)

Value

test: data is from the Titanic project https://www.kaggle.com/c/titanic/data traindata <- read.csv('train.csv',header=T,na.strings=c("")) Data <- subset(traindata,select=c(2,3,5,6,7,8,10,12)) univ(Data)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (Data, keeplist = NULL, intmiss = NULL)
{
    if (!is.null(keeplist)) {
        Data <- Data[, keeplist]
    }
    if (is.null(intmiss)) {
        intmiss <- 0
    }
    nums <- sapply(Data, is.numeric)
    Data <- Data[, nums]
    varlist <- sapply(Data, function(x) {
        data.frame(mean = mean(x), median = median(x), var = var(x),
            sd = sd(x), nmiss = sum(is.na(x)), n = length(x),
            missrate = sum(is.na(x))/length(x))
    })
    transfvar <- data.frame(t(varlist))
    nums <- sapply(Data, is.numeric)
    filldata <- Data[, nums]
    filldata[is.na(filldata)] <- intmiss
    Data <- filldata
    return(transfvar)
  }

billyuanyao/WOECredit documentation built on May 28, 2019, 7:11 p.m.