In thisisdaryn/workshop:

options(htmltools.dir.version = FALSE)
knitr::opts_chunk$set(fig.align="center", fig.width=5, fig.height=5, warning = FALSE, message = FALSE)

library(xaringanthemer)
duo_accent(
  primary_color = "ivory",
  secondary_color = "#310A31",
  header_font_google = google_font("Roboto", "400"),
  text_font_google   = google_font("Lato", "300"),
  code_font_family = "Fira Code",
  code_font_url = "https://cdn.rawgit.com/tonsky/FiraCode/1.204/distr/fira_code.css",
  header_color = "#f54278",
  title_slide_text_color = "#354a66"
)

A note

These slides make use of the chi_emps data set contained in the workshop package.

library(workshop) data(chi_emps) Getting summary information How big is the data set? dim(chi_emps) -- What are the names of the columns? names(chi_emps) -- You can use summary or View as well to get more info. summary What happens when you run the following? summary(chi_emps) summary What happens when you run the following? summary(chi_emps) (View) and what about this? View(chi_emps) Selecting only some columns In base R, [] is used to subset data. For a 2d data structure, we have row conditions and column conditions separated by a comma inside the []. For example, we can select the first 5 rows and the first 4 columns of the chi_emps data frame: -- chi_emps[c(1:5),c(1:4)] Selecting columns by name It's often more practical to select columns by name however. The following code: - keeps only the Name, Dept, AnnSalary columns - displays the dimensions of the smaller data set -- chi2 <- chi_emps[, c("Name", "Dept", "AnnSalary")] dim(chi2) -- Note that there was no row constraint in the first line as we were intending to keep all rows of the data. Rows can also be selected using logical operators How do we select all the rows where the annual salary is in the interquartile range The code below: - keeps only the rows where the salary is between than $75408 and $97440 - displays the first few rows -- midsal <- chi2[chi2$AnnSalary >= 75408 & chi2$AnnSalary <= 97440, ] head(midsal) -- Note that there is no column constraint as we are keeping all the columns. What are the other logical operators that can be used in filtering rows? The previous slide made use of <, multiple other operators are available for filtering: Operator <- c( "==", "!=", "<", "<=", ">", ">=", "|", "&", "!") Meaning <- c("equal", "not equal", "less than", "less than or equal to", "greater than", "greater than or equal to", "Or: at least one of the expressions is true", "And: Both expressions are true", "Not: the expression is not true" ) optable <- data.frame(Operator, Meaning) kableExtra::kable(optable) Some more examples How would we get a data set with only hourly paid employees? hourly <- chi_emps[chi_emps$SalHour == "Hourly", ] Following up on that How would we get loans that meet both of the following: - have an hourly rate more than $20/hr - typical hours more than 25 -- df <- hourly[(hourly$HourlyRate > 20) & (hourly$TypicalHours > 25), ] head(df) Another example How would we get loans that meet either of the following: - the Department is PUBLIC LIBRARY - the employee is Part-time, (FullPart: P) -- lib_part <- chi_emps[chi_emps$Dept == "PUBLIC LIBRARY" | chi_emps$FullPart == "P", ] head(lib_part) Using $ to reference columns Another means of subsetting data that we would have seen before is the $ hist(chi_emps$HourlyRate) -- We have been using it to extract single columns from the data $ can be used to create new columns We can also create new columns in the data set. For example, here's a new column that says whether or not the interest rate on a loan is greater than 10%: chi2$sal_gt100 <- ifelse(chi2$AnnSalary > 100000, TRUE, FALSE) head(chi2) Explaining ifelse The ifelse function is very handy and has three arguments: an expression to evaluate a result if the expression is TRUE a result if the expression is FALSE thisisdaryn/workshop documentation built on Jan. 17, 2020, 7:31 p.m.

R Package Documentation rdrr.io home R language documentation Run R code online Browse R Packages CRAN packages Bioconductor packages R-Forge packages GitHub packages We want your feedback! Note that we can't provide technical support on individual packages. You should contact the package authors for that. Tweet to @rdrrHQ GitHub issue tracker ian@mutexlabs.com Personal blog