dataPreprocess: Data Preprocess for Fast Linear Regression
In WeakCha/BIOSTAT625_HW4: Linear Regression

Description Usage Arguments Value Author(s) Examples

View source: R/LinearRegression.R

Data Preprocess for Fast Linear Regression

1	dataPreprocess(data, features, target, training_part = 0.8, seed = 200)

`data`	`data` is a `data.frame`.
`training_part`	`training_part` is the ratio for training data.
`seed`	`seed` is the random seed, it is used for result reproducibility.
`features`	`features` is a vector with the names of the features you want to use as predictors.
`target`	`target` is a string with the name of the target. We only allow uni-label predicting in this package.

`X_train`	The data used for training the linear regression model, with only features.
`Y_train`	The data used for training the linear regression model, with only target.
`X_test`	The data used for prediction and computing RMSE, with only features.
`Y_test`	The data used for prediction and computing RMSE, with only target.

Li (Richard) Liu

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (data, features, target, training_part = 0.8)
{
    X = select(data, features)
    Y = select(data, target)
    X = as.matrix(X)
    Y = as.matrix(Y)
    X_row = nrow(X)
    seq_1 = rep(1, X_row)
    X = cbind(seq_1, X)
    index = sample(1:nrow(data), training_part * nrow(data))
    X_train = X[index, ]
    X_test = X[-index, ]
    Y_train = Y[index, ]
    Y_test = Y[-index, ]
    return(list(X_train = X_train, X_test = X_test, Y_train = Y_train,
        Y_test = Y_test))
  }