dataPreprocess: Data Preprocess for Fast Linear Regression

Description Usage Arguments Value Author(s) Examples

View source: R/LinearRegression.R

Description

Data Preprocess for Fast Linear Regression

Usage

1
dataPreprocess(data, features, target, training_part = 0.8, seed = 200)

Arguments

data

data is a data.frame.

training_part

training_part is the ratio for training data.

seed

seed is the random seed, it is used for result reproducibility.

features

features is a vector with the names of the features you want to use as predictors.

target

target is a string with the name of the target. We only allow uni-label predicting in this package.

Value

X_train

The data used for training the linear regression model, with only features.

Y_train

The data used for training the linear regression model, with only target.

X_test

The data used for prediction and computing RMSE, with only features.

Y_test

The data used for prediction and computing RMSE, with only target.

Author(s)

Li (Richard) Liu

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (data, features, target, training_part = 0.8)
{
    X = select(data, features)
    Y = select(data, target)
    X = as.matrix(X)
    Y = as.matrix(Y)
    X_row = nrow(X)
    seq_1 = rep(1, X_row)
    X = cbind(seq_1, X)
    index = sample(1:nrow(data), training_part * nrow(data))
    X_train = X[index, ]
    X_test = X[-index, ]
    Y_train = Y[index, ]
    Y_test = Y[-index, ]
    return(list(X_train = X_train, X_test = X_test, Y_train = Y_train,
        Y_test = Y_test))
  }

WeakCha/BIOSTAT625_HW4 documentation built on Dec. 18, 2021, 7:16 p.m.