README.md

1. Contents

2. Documentation with R

2.1. Installation

To use the DataCleaning package run the following lines in your r script

install.packages("devtools")
devtools::install_github("https://github.com/christophperrins/Documentation-with-R")
library(DataCleaning)

2.2. Usage

The library includes 3 functions which are: - removeNaRows - normalise - normaliseDf

The removeNaRows function will remove rows which have 'NA' data. If only certain rows want to be constrained to this criteria, they can be entered as a parameter

The normalise function will normalise data such that a higher weighting is not given to one field over another.

The normaliseDf function will normalise the entire dataframe, if there are columns which don't need to be normalised, they can be entered as a parameter

For more information see the help documentation e.g. ?removeNaRows

3. How to create my own R documentation

3.1. Create a Package

To start create a R package. This can be done easily with RStudio.

File > New Project... > New Directory > R Package > Type: Package, Package Name: DataCleaning, Create project as subdirectory of: C:/Users/\<User>/Desktop.

A folder should now exist on your Desktop called DataCleaning. Inside the folder there are the following files: - NAMESPACE - This allows R packages to talk to one another using import() and export(). This is an advanced topic so I would recommend deleting the file - we will create an autogenerated NAMESPACE file later with Roxygen. - DESCRIPTION - Information about the package, who owns and maintains the package, as well as dependencies on other packages. - .Rbuildignore - when a package is bundled, any files listed in here will not be bundled in the package. Similar to a .gitignore

There will also be two folders: - man - for man(uals). Here is where help documentation will live. - R - for your R scripts. This is where your functions will live.

3.2. Create a function

Create a function in an R script and save it as an R script in the "R" folder.

For instance:

removeNaRows <- function(dataframe, applicableColumns=c()) {
  if (length(applicableColumns)==0) {
    isNa <- is.na(dataframe)
  } else {
    isNa <- is.na(dataframe[, applicableColumns])
  }
  rowsWithoutNa <- rowSums(isNa) == 0
  dataframe[rowsWithoutNa, ]
}

3.3. Adding Documentation with Roxygen comments (#')

Lets give it some documentation. Documentation can be created manually and added into the man folder, as long as it shares the same name and is an .Rd file. However the syntax for writing out such files is quite frustrating and most documentation is written with Roxygen.

A Roxygen comment uses a #' instead of just #

There are many ways to find out how to document your code - to find them simply write:

' @

Followed by ctrl+space.

More information about Roxygen documentation can be found here.

I decorated my function with the following documentation: - The first line - implictly calls @title, although @title can be added to be explicit, however many authors prefer the implicit version. - the second line - implicitly calls @description. Again this can be added if the author wants to be explicit - @param - this identifies what each of the arguments being passed should be - @return - this identifies what the result value will be coming back from the function - @examples - this identifies with examples how to use the function - @export - this tells Roxygen to add this function to the NAMESPACE file.

After adding in my Roxygen comments my removeNaRows.R file now looks like this:

#' Remove rows with Na Values from dataframe
#'
#' The function will remove all rows which contain na values.
#' Specific columns can be cleaned whilst ignoring the other columns
#' @param dataframe
#' a dataframe
#' @param applicableColumns
#' a vector of dataframe names which should be checked, ignore others
#' @return The dataframe with rows which have NA values removed
#' @examples
#' removeNaRows(data)
#' removeNaRows(read.csv("train.csv"))
#' removeNaRows(data, c("A", "B", "C"))
#' @export
removeNaRows <- function(dataframe, applicableColumns=c()) {
  if (length(applicableColumns)==0) {
    isNa <- is.na(dataframe)
  } else {
    isNa <- is.na(dataframe[, applicableColumns])
  }
  rowsWithoutNa <- rowSums(isNa) == 0
  dataframe[rowsWithoutNa, ]
}

3.4. Creating help documentation

It is time to get Roxygen to run through our code and create the help docs.

Be sure to delete the NAMESPACE file in your package.

In the RStudio console run the following lines of code:

install.packages("devtools")
devtools::document()

document() function in devtools will call the roxygenize() function in roxygen2.

The function will run through the R scripts, find the Roxygen comments associated with the function, and create the help .Rd docs in the man folder.

Once its complete you should be able to see the help docs associated with the function

?removeNaRows

3.5. Troubleshooting

Q: It keeps updating and loading my package!

A: You ran devtools::document() within your script. The document() function will run over your code, and will then call the document() function, which will then run over your code, which will then call the document function......

Q: My helps documentation has doubled everything

A: Shut down the file, restart RStudio, and then install your package and call the help file, it should be ok now.



christophperrins/Documentation-with-R documentation built on Nov. 4, 2019, 8:51 a.m.