Version Note: Up-to-date with v0.3.0
knitr::opts_chunk$set(message=FALSE,warning = FALSE, comment = NA) old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks)
library(psycModel) library(dplyr)
This article briefly introduces the usage of dplyr::select
, and how it is applied to this package. In the first section, I will briefly describe dplyr::select
syntax for R-beginners. If you are already familiar with dplyr::select
syntax, then you can skip to the next section where I describe how to apply the syntax in this pacakge.
dplyr::select
(abbreviated as select
hereafter) is an extremely power function for R. It allows you to subset columns with the a set of syntax that is also known as the select
syntax / semantics in the R community. A side note here. With the new introduction of dplyr::across
function, the select
syntax can be applied to dplyr::mutate
and dplyr::filter
where make these two already powerful function even more powerful. I will first introduce the usage of :
, c()
and -
. Then, I will discuss how to use everything
, starts_with
, end_with
, contains
, and where
. This is not an exhaustive list of the select
syntax, but there are the most relevant one. If you want to learn more, I encourage you to check the vignette of dplyr
or just google it. There are tons of article that discuss this in detail.
I am going to use the iris
dataset for the demonstration. Let's take a quick peek of the dataset.
iris %>% head() # head() show the first 5 rows of the data frame
If I want to select the first 3 columns, you can use :
to do that
iris %>% select(1:3) %>% head(1) iris %>% select(Sepal.Length:Petal.Length) %>% head(1)
Next, if you want to combine selection then you can use c()
. For example, I want the 1st, 3rd and 4th columns. Then, you can do it like this
iris %>% select(c(1, 3:4)) %>% head(1) iris %>% select(Sepal.Length, Petal.Length:Petal.Width) %>% head(1)
Finally, if you want to delete a column from selection, then you can use -
. For example, you want to select all columns except the 3rd column, then you can do it like this
iris %>% select(1:5, -3) %>% head(1) iris %>% select(Sepal.Length:Species, -Petal.Length) %>% head(1)
Ok. Now you understand the basic usage. Let's get to something a little bit more advanced. First, let's talk about my favorite which is everything
. As the name entails, it select all the variables in the data frame. It is usually used in combination with c()
if you are using in select
function. However, it is very powerful in other use cases like the one in this package. For example, you want to fit a linear regression with all the variables, then you can use everything
(a more detailed discussion is presented in the next section).
# select all columns iris %>% select(everything()) %>% head(1) # select everything except Sepal.Width iris %>% select(c(everything(),-Sepal.Width)) %>% head(1)
Next, we can talk about starts_with
. starts_with
select all columns that is starts with a certain specified string. For example, we want to select all columns start with Sepal, then we can do something like this
iris %>% select(starts_with('Sepal')) %>% head(1)
Similar to starts_with
, ends_with
select all columns that is ends with a certain specified string. For example, we want to select all columns ends with Width.
iris %>% select(ends_with('Width')) %>% head(1)
Next, we are going talk about contains
. As the name entails, it select all columns that contains a specified string.
iris %>% select(contains('Sepal')) %>% head(1) # same as starts_with iris %>% select(contains('Width')) %>% head(1) # same as ends_with iris %>% select(contains('.')) %>% head(1) # contains "." will be selected
Finally, we are going to conclude this section with where
. where
is not used alone. It is usually pair with a function return TRUE
or FALSE
. I think the most common use case for this package is paired with is.numeric
. where(is.numeric)
will select all numeric variables. A little tip, you need to pass is.numeric
instead of is.numeric()
. I will not go into the detail of why because this is out of the scope of this article. It required a little bit more advanced understanding of how function work in R. If you have that, you wouldn't reading this article anyway.
iris %>% select(where(is.numeric)) %>% head(1)
First, I will demonstrate the usage of linear regression. I will first create a data frame. You don't need to know anything about how this data frame is created. Just know that it has 1 DV / outcome / response variable (i.e, y) and 5 IV / predictor variable (i.e, x1 to x5)
set.seed(1) test_data = data.frame(y = rnorm(n = 100,mean = 2,sd = 3), x1 = rnorm(n = 100,mean = 1.5, sd = 4), x2 = rnorm(n = 100,mean = 1.7, sd = 4), x3 = rnorm(n = 100,mean = 1.5, sd = 4), x4 = rnorm(n = 100,mean = 2, sd = 4), x5 = rnorm(n = 100,mean = 1.5, sd = 4))
Ok, let's fit that linear regression now.
# Without this package: model1 = lm(data = test_data, formula = y ~ x1 + x2 + x3 + x4 + x5) # With this package: model2 = lm_model(data = test_data, response_variable = y, predictor_variable = c(everything(),-y))
This is already a step up from the basic lm()
function. We can still make is even simpler by just passing everyhing()
. The function is designed to remove the response variable from predictor variables (if selected) automatically. The following model3
is the same as model2
model3 = lm_model(data = test_data, response_variable = y, predictor_variable = everything())
The same logic is applied to all other functions in this package. Arguments that support dplyr::select
syntax will ends with "support dplyr::select syntax" in the description of the argument.That's it for this brief introduction. If you want to learn more about this package, I encourage you to check out this article or use vignette('quick-introduction')
if you are in R Studio.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.