chooseCRANmirror(graphics = FALSE, ind = 1) knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, out.width="100%") if (!require(pacman)) install.packages("pacman") p_load( lubridate, here, rio, tidyverse, drhur )
Does perceptive inequality have an impact on a person's social and political behavior?
Basic Concepts
Object-Oriented Programming (OOP)
Classes: User-definable object structure with its characteristics and methods.
summary()
.summary(wvs7) summary(wvs7$age)
R's Polymorphism allows us to use common methods or functions to deal with object types that have not yet been defined. In addition to summary()
, plot()
is another example of Polymorphism. Please use plot()
to process two different objects.
plot(wvs7$age) plot(wvs7$age, wvs7$incomeLevel)
Inheritance: The so-called inheritance means that all sub-classes of the same parent class automatically have the characteristics of the parent class. As an analogy, if humans had legs, then every individual human being would have this property.
Safety: When the command acts on the object, it will judge the class of the object. If it is not an object that can be processed within the scope of the command definition, it will stop running and give an error message.
The principle of OOP: don't change data manually , let the commands do it. Syntax: <
command name
>(<target data
>, <condition 1
>, <condition 2
>, ...)
{height=500}
light <- function(finger){ {{shadow <- finger + 5}} } handShadow <- light(finger = 3) handShadow
r nrow(available.packages())
in CRAN (more in Github)install.packages("drhur")
devtools::install_github("sammo3182/drhur")
{height=300}
<-
Assignment operator, the shorthand for the assign()
command
Syntax: <
variable name
><-
<object
>
aValidObject <- 1:5 aValidObject
->
, =
, <<-
Four symbols for assignment in R:
assign()
<-
<<-
=
<-
a <- 12 25 -> b
=
Alt + -
option + -
=
: When you do not want to create an object.median(y <- 1:10); y median(x = 1:10); x
<<-
: Invoking parent variablesnew_counter <- function() { i <- 0 function() { # do something useful, then ... i <<- i + 1 i } }
1stday
)..
and _
(Error: M&M
).X != x
), !
means "not"/"no", !=
means "not equal to".list <- c(1:5)
).Please create a compliant and non-compliant object:
# Create a non-compliant object # 5var_name <- data_frame($education) # Create a compliant object var_name5 <- data_frame(wvs7$education)
question("Which variables are compliant? Please select all valid variable names.", answer("my_data_frame <- data_frame(wvs7$education)", correct = TRUE), answer("mydata&frame <- data_frame(wvs7$education)"), answer("MyDataFrame <- data_frame(wvs7$education)", correct = TRUE), answer("1data_frame <- data_frame(wvs7$education)"), incorrect = "Incorrect")
data()
Select a data in data
to open and check the variables in it by summary:
# example data(uspop) summary(uspop)
.RDS
(single object).RData
(multiple objects).txt
.csv
Syntax: <
name
><-
<read command
>(<data path
>)
df_rds <- readRDS("aDataset.rds") df_txt <- read.table("D:/aDataset.txt") df_csv <- read.csv("./aDataset.csv")
Call the package through library
or require
, and then use the commands in it.
# SPSS, Stata, SAS library(haven) df_spss <- read_spss("<FileName>.sav") df_stata <- read_dta("<FileName>.dta") df_sas <- read_sas("<FileName>.sas7bdat") # Quick Import of Forms library(reader) df_csv <- read.csv("<FileName>.csv") df_table <- read.table("<FileName>.csv/txt") # Excel library(readxl) df_excel <- read_excel("<FileName>.xls") df_excel2 <- read_excel("<FileName>.xlsx") # JSON (JavaScript Object Notation) library(rjson) df_json <- fromJSON(file = "<FileName>.json" ) # XML/Html library(xml) df_xml <- xmlTreeParse("<url>") df_html <- readHTMLTable(url, which=3)
rio
:the Swiss Army Knife of data reading.
library(rio) df_anything <- import(<AnyTypeOfData>)
The command c()
which performs a composition function can be used to create a vector.
vec_integer <- c(1, -2, NA) vec_double <- c(1.5, -2.34, 1/3)
Notes: 1. NA
means not available.
2. The data in a single vector must have the same type (numeric, character, or logical).
Generate a vector containing all even numbers from 1 to 100:
# hint: help(seq)
x <- seq(2,100,by=2)
vec_chr <- c("牛", "^_^", "R is hard,but I can nail it.")
Generate a sequence of letters a-z:
vec_letters <- c("a", "b", "c", "d", "e")
letters[1:26]
vec_tf <- c(TRUE, TRUE, FALSE) vec_tf # c(TRUE, TRUE, FALSE) == c(1, 1, 0)
Assuming x is a vector containing (1,1,0), convert it to a logical vector:
x <- c(1, 1, 0) x <- as.logical(x)
vec_fac <- factor(c(1, 2, 2, 3))
vec_ord <- ordered(c(1, 2, 2, 3)) vec_fac2 <- factor(c(1, 2, 2, 3), levels = c(3, 2, 1), labels = c("Apple", "Pear", "Orange"))
After getting a data set, you must first have a general understanding of the data.
check the types of variables in the wvs7 data:
str(wvs7)
check the properties of the incomeLevel variable in the data set:
class(wvs7$incomeLevel)
view the value of incomeLevel and the frequency of each value:
table(wvs7$incomeLevel)
as.POSIXct
(numeric input), integer storageas.POSIXlt
(character input), column storeas.POSIXct
uses the number of seconds elapsed from a certain time to the first year of UNIX (1970-01-01 00:00:00) to record the time, that is, expresses the time (count time) by counting.as.POSIXlt
expresses the time in a list (list time), each part of time is an element of the list.#`as.POSIXct` and `as.POSIXlt` ct <- as.POSIXct("2023-03-20 10:11:12") lt <- as.POSIXlt("2023-03-20 10:11:12")
unlist(ct) unlist(lt)
Sys.time() # get the current time today() # get the year, month, and day of the day now() # get the current day's year, month, day, hour, minute, and second time zone # CST is the time zone where the computer ip is located during operation # the full pack time1 <- Sys.time() time2 <- as.POSIXlt(Sys.time()) time2$wday # week of the day ## What if we only care about the date?
Sys.Date() date1 <- as.Date("2019-01-02") class(date1) # check type of data
lubridate
: the swiss army knife of time data
library(lubridate) ymd("20221016") mdy("10-16-2022") dmy("16/10/2022") ymd_hms("2022-10-16 09:00:00", tz = "Etc/GMT+8")
OlsonNames()
When facing vectors with different orders, such as:
x=c("20190101",'01012019','021901')
How should we identify the time?
#help(parse_date_time)
parse_date_time(x,orders = c("ymd","dmy","dym"))
See drhur("algebra")
for matrix.
Array : As the name implies, it is an "array" of columns, which can be used to record data of more than two dimensions, and can be created by the array
command.
# create two vectors of different lengths vector1 <- c(5, 9, 3) vector2 <- c(10, 11, 12, 13, 14, 15) # enter these vectors into an array result <- array(c(vector1, vector2), dim = c(3, 3, 2)) result
List: A "list" that can contain many different types of objects.
ls_monks <- list(name = c("Wukong Sun", "Sanzang Tang", "Wuneng Zhu", "Wujing Sha"), power = c(100, 20, 90, 40), buddha = c(TRUE, TRUE, FALSE, FALSE)) ls_monks
Data Frame: A special kind of column/matrix
In Excel:
In R:
df_toy <- data.frame(female = c(0,1,1,0), age = c(29, 39, 38, 12), name = c("Iron Man", "Black Widow", "Captain Marvel", "Captain America")) df_toy
In Rstudio:
class
, typeof
: query variable attributesnchars
: get the length of the stringlevels
: get or set the level of the factornrow
: returns the number of rows of the specified matrixncol
: used to return the number of columns of the specified matrixdim
: the subspace formed by the column vector, that is, the dimensionvec_integer <- c(1, -2, NA) vec_double <- c(1.5, -2.34, 1/3) vec_chr <- c("牛", "^_^", "R is hard,but I can nail it.") vec_fac <- factor(c(1, 2, 2, 3)) ls_monks <- list(name = c("Wukong Sun", "Sanzang Tang", "Wuneng Zhu", "Wujing Sha"), power = c(100, 20, 90, 40), buddha = c(TRUE, TRUE, FALSE, FALSE)) df_toy <- data.frame(female = c(0,1,1,0), age = c(29, 39, 38, 12), name = c("Iron Man", "Black Widow", "Captain Marvel", "Captain America")) class(vec_double) typeof(vec_integer) nchar(vec_chr) levels(vec_fac) length(vec_double) length(ls_monks) length(df_toy) nrow(df_toy) ncol(df_toy) dim(df_toy)
Convert the following vector to numeric type:
c(FALSE, TRUE)
# help(as.numeric)
as.numeric(c(FALSE, TRUE))
The value of the gender variable female
in wvs7 isTRUE
,FALSE
In the specific analysis, the character variable is inconvenient to operate, and it can be converted into a numerical variable.
as.numeric(wvs7$female) - 1
Syntax: <
command
>(<data to be saved
>, file = <storage path
>)
saveRDS(df_toy, file = "df_toy.rds") save(df_toy, ls_monks, file = "test.rdata")
write.csv(df_toy, file = "toy.csv")
Hint: If your data is in Chinese, there may be garbled characters in the stored csv.
You can store data in STATA, SPSS, SAS Excel, JSON, Matlab, HTML and other formats through special software packages or "Swiss Army Knife" (rio::export
), but do you really want to do this?
STATA (.dta, \<14): 3.16 G = R (.rds): 0.05 G
| Method | Average Time | Minimum | Maximum | |:-----------------|:----------------:|:-----------:|:-----------:| | base::readRDS | 19.65 | 18.64 | 21.01 | | fst::read_fst | 1.39 | 0.56 | 3.41 | | haven::read_sav | 104.78 | 101.00 | 111.85 | | qs::qread | 3.33 | 3.00 | 4.24 |
: Average time (in seconds) taken by the four ways of reading GSS data in R
| Method | Average Time | Minimum | Maximum | File Size | |:----------------|:----------------:|:-----------:|:-----------:|:-------------:| | base::saveRDS | 98.36 | 93.09 | 103.24 | 30.9 MB | | fst::write_fst | 2.70 | 1.86 | 4.05 | 122.1 MB | | qs::qsave | 5.03 | 4.35 | 6.62 | 44.6 MB |
: Average time taken to write GSS data (and file size) in R
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.