Home

/

GitHub

/

README.md
In DSCI-310/DSCI-310-Group-6-Package: Helper Functions for DSCI-310-Group-6

package6

The goal of package6 is to encapsulate useful helper functions used for data analysis projects similar to New Taipei City Real Estate Value Prediction.

You can install the development version of package6 from GitHub with:

# install.packages("devtools")
devtools::install_github("DSCI-310/DSCI-310-Group6-Package")

`format_column_names()`

Sometimes raw data set have column names with blank spaces in the column names.

df <- data.frame(`col Name1`= c(1,2),`col Name2` = c("3", "4"),  check.names = FALSE)
df
#>   col Name1 col Name2
#> 1         1         3
#> 2         2         4

it is generally a good idea to remove the blank spaces. This is what base::gsub() do.

names(df) <- gsub(" ", "_", names(df))
df
#>   col_Name1 col_Name2
#> 1         1         3
#> 2         2         4

Notice how gsub() requires three arguments and modifies the original data frame.Also the return value is not a data frame. package6::format_column_names() requires only one argument. It does not modify the original data frame and returns the data frame with formatted column names.

df <- data.frame(`col Name1`= c(1,2),`col Name2` = c("3", "4"),  check.names = FALSE)
formatted_df <- format_column_names(df)
formatted_df
#>   col_Name1 col_Name2
#> 1         1         3
#> 2         2         4

`split_data(dataset, train_perc, vs_prec, test_perc)`

To split a data set/ date frame into 3 parts:train data, test data and variable selection data

df <- mtcars
glimpse(df)
#> Rows: 32
#> Columns: 11
#> $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
#> $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
#> $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
#> $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
#> $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
#> $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

Let’s start with mtcars data frame

train<-split_data(mtcars)$train
cv<-split_data(mtcars)$cv
test<-split_data(mtcars)$test
glimpse(train)
#> Rows: 19
#> Columns: 11
#> $ mpg  <dbl> 15.0, 18.7, 21.4, 22.8, 21.4, 15.2, 13.3, 15.2, 10.4, 19.2, 27.3,…
#> $ cyl  <dbl> 8, 8, 4, 4, 6, 8, 8, 8, 8, 6, 4, 8, 4, 4, 8, 4, 8, 4, 8
#> $ disp <dbl> 301.0, 360.0, 121.0, 108.0, 258.0, 304.0, 350.0, 275.8, 472.0, 16…
#> $ hp   <dbl> 335, 175, 109, 93, 110, 150, 245, 180, 205, 123, 66, 180, 97, 65,…
#> $ drat <dbl> 3.54, 3.15, 4.11, 3.85, 3.08, 3.15, 3.73, 3.07, 2.93, 3.92, 4.08,…
#> $ wt   <dbl> 3.570, 3.440, 2.780, 2.320, 3.215, 3.435, 3.840, 3.780, 5.250, 3.…
#> $ qsec <dbl> 14.60, 17.02, 18.60, 18.61, 19.44, 17.30, 15.41, 18.00, 17.98, 18…
#> $ vs   <dbl> 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0
#> $ am   <dbl> 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0
#> $ gear <dbl> 5, 3, 4, 4, 3, 3, 3, 3, 3, 4, 4, 3, 3, 4, 3, 5, 3, 4, 3
#> $ carb <dbl> 8, 2, 2, 1, 1, 2, 4, 3, 4, 4, 1, 3, 1, 1, 4, 2, 3, 2, 4
glimpse(cv)
#> Rows: 7
#> Columns: 11
#> $ mpg  <dbl> 33.9, 24.4, 30.4, 21.5, 26.0, 19.2, 15.2
#> $ cyl  <dbl> 4, 4, 4, 4, 4, 6, 8
#> $ disp <dbl> 71.1, 146.7, 95.1, 120.1, 120.3, 167.6, 275.8
#> $ hp   <dbl> 65, 62, 113, 97, 91, 123, 180
#> $ drat <dbl> 4.22, 3.69, 3.77, 3.70, 4.43, 3.92, 3.07
#> $ wt   <dbl> 1.835, 3.190, 1.513, 2.465, 2.140, 3.440, 3.780
#> $ qsec <dbl> 19.90, 20.00, 16.90, 20.01, 16.70, 18.30, 18.00
#> $ vs   <dbl> 1, 1, 1, 1, 0, 1, 0
#> $ am   <dbl> 1, 0, 1, 0, 1, 0, 0
#> $ gear <dbl> 4, 4, 5, 3, 5, 4, 3
#> $ carb <dbl> 1, 2, 2, 1, 2, 4, 3
glimpse(test)
#> Rows: 6
#> Columns: 11
#> $ mpg  <dbl> 13.3, 21.5, 14.3, 15.2, 17.3, 21.0
#> $ cyl  <dbl> 8, 4, 8, 8, 8, 6
#> $ disp <dbl> 350.0, 120.1, 360.0, 304.0, 275.8, 160.0
#> $ hp   <dbl> 245, 97, 245, 150, 180, 110
#> $ drat <dbl> 3.73, 3.70, 3.21, 3.15, 3.07, 3.90
#> $ wt   <dbl> 3.840, 2.465, 3.570, 3.435, 3.730, 2.620
#> $ qsec <dbl> 15.41, 20.01, 15.84, 17.30, 17.60, 16.46
#> $ vs   <dbl> 0, 1, 0, 0, 0, 0
#> $ am   <dbl> 0, 0, 0, 0, 0, 1
#> $ gear <dbl> 3, 3, 3, 3, 3, 4
#> $ carb <dbl> 4, 1, 4, 2, 3, 4

`cal_rmse(x,y)`

To calculate RMSE

predicted = c(12, 5, 19, 3)
actual = c(11, 4, 15, 6)

cal_rmse(predicted, actual)
#> [1] 2.6

| Package | Version | |-----------|---------| | tidyverse | 1.3.1 |

DSCI-310/DSCI-310-Group-6-Package documentation built on April 21, 2022, 3:55 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DSCI-310/DSCI-310-Group-6-Package
Helper Functions for DSCI-310-Group-6

README.md
In DSCI-310/DSCI-310-Group-6-Package: Helper Functions for DSCI-310-Group-6

package6

Installation

Usage

`format_column_names()`

`split_data(dataset, train_perc, vs_prec, test_perc)`

`cal_rmse(x,y)`

Dependencies

License

R Package Documentation

Browse R Packages

We want your feedback!

DSCI-310/DSCI-310-Group-6-Package Helper Functions for DSCI-310-Group-6

README.md In DSCI-310/DSCI-310-Group-6-Package: Helper Functions for DSCI-310-Group-6

package6

Installation

Usage

format_column_names()

split_data(dataset, train_perc, vs_prec, test_perc)

cal_rmse(x,y)

Dependencies

License

R Package Documentation

Browse R Packages

We want your feedback!

DSCI-310/DSCI-310-Group-6-Package
Helper Functions for DSCI-310-Group-6

README.md
In DSCI-310/DSCI-310-Group-6-Package: Helper Functions for DSCI-310-Group-6

`format_column_names()`

`split_data(dataset, train_perc, vs_prec, test_perc)`

`cal_rmse(x,y)`