Basic Syntax: Input And Output"

chooseCRANmirror(graphics = FALSE, ind = 1)

knitr::opts_chunk$set(echo = TRUE,
                      message = FALSE,
                      warning = FALSE,
                      out.width="100%")

if (!require(pacman)) install.packages("pacman")

p_load(
  lubridate,
  here,
  rio,
  tidyverse,
  drhur
)

Key Points

Basic Concepts

Object-Oriented Programming Language

summary(wvs7)
summary(wvs7$age)

Learning by Doing

R's Polymorphism allows us to use common methods or functions to deal with object types that have not yet been defined. In addition to summary(), plot() is another example of Polymorphism. Please use plot() to process two different objects.

plot(wvs7$age)
plot(wvs7$age, wvs7$incomeLevel)

Function

The principle of OOP: don't change data manually , let the commands do it. Syntax: <command name>(<target data>, <condition 1>, <condition 2>, ...)

{height=500}

light <- function(finger){
  {{shadow <- finger + 5}}
}
handShadow <- light(finger = 3)
handShadow

Data Packages

{height=300}

<-

Assignment operator, the shorthand for the assign() command

Syntax: <variable name> <- <object>

aValidObject <- 1:5
aValidObject

->, =, <<-

Four symbols for assignment in R:

  1. assign()
  2. <-
  3. <<-
  4. =

Why <-

a <- 12
25 -> b

What Time to Use the Command?

median(y <- 1:10); y
median(x = 1:10); x
new_counter <- function() {
  i <- 0
  function() {
    # do something useful, then ...
    i <<- i + 1
    i
  }
}

Naming Rules

  1. Don't start with a number (Error: 1stday).
  2. No special symbols except . and _ (Error: M&M).
  3. Case sensitive (X != x), ! means "not"/"no", != means "not equal to".
  4. Don't override built-in commands if necessary(avoid: list <- c(1:5)).
  5. Ideographic

Please create a compliant and non-compliant object:

# Create a non-compliant object

# 5var_name <- data_frame($education)


# Create a compliant object

var_name5 <- data_frame(wvs7$education)

Learning by Doing

question("Which variables are compliant? Please select all valid variable names.",
  answer("my_data_frame <- data_frame(wvs7$education)", correct = TRUE),
  answer("mydata&frame <- data_frame(wvs7$education)"),
  answer("MyDataFrame <- data_frame(wvs7$education)", correct = TRUE),
  answer("1data_frame <- data_frame(wvs7$education)"),
  incorrect = "Incorrect")

Data Input

Built-in Data

data()

Learning by Doing

Select a data in data to open and check the variables in it by summary:


# example

data(uspop)
summary(uspop)

Data Types That Can Be Read Directly

Syntax: <name><- <read command>(<data path>)

df_rds <- readRDS("aDataset.rds")
df_txt <- read.table("D:/aDataset.txt")
df_csv <- read.csv("./aDataset.csv")

Data Types Need To Call The Package To Read

Call the package through library or require, and then use the commands in it.

# SPSS, Stata, SAS
library(haven)
df_spss <- read_spss("<FileName>.sav")
df_stata <- read_dta("<FileName>.dta")
df_sas <- read_sas("<FileName>.sas7bdat")  

# Quick Import of Forms
library(reader)
df_csv <- read.csv("<FileName>.csv")
df_table <- read.table("<FileName>.csv/txt")

# Excel
library(readxl)
df_excel <- read_excel("<FileName>.xls")
df_excel2 <- read_excel("<FileName>.xlsx")

# JSON (JavaScript Object Notation)
library(rjson)
df_json <- fromJSON(file = "<FileName>.json" )

# XML/Html
library(xml)
df_xml <- xmlTreeParse("<url>")
df_html <- readHTMLTable(url, which=3)

rio:the Swiss Army Knife of data reading.

library(rio)
df_anything <- import(<AnyTypeOfData>)

Data Type

  1. Vector
  2. Matrix
  3. Data frame
  4. List
  5. Array

Vector

The command c() which performs a composition function can be used to create a vector.

vec_integer <- c(1, -2, NA)
vec_double <- c(1.5, -2.34, 1/3)

Notes: 1. NA means not available. 2. The data in a single vector must have the same type (numeric, character, or logical).

Learning by Doing

Generate a vector containing all even numbers from 1 to 100:

# hint: help(seq)
x <- seq(2,100,by=2)

vec_chr <- c("牛", "^_^", "R is hard,but I can nail it.")

Learning by Doing

Generate a sequence of letters a-z:

vec_letters <- c("a", "b", "c", "d", "e")
letters[1:26]

vec_tf <- c(TRUE, TRUE, FALSE)
vec_tf
# c(TRUE, TRUE, FALSE) == c(1, 1, 0)

Learning by Doing

Assuming x is a vector containing (1,1,0), convert it to a logical vector:


x <- c(1, 1, 0)
x <- as.logical(x)

vec_fac <- factor(c(1, 2, 2, 3))
vec_ord <- ordered(c(1, 2, 2, 3))
vec_fac2 <- factor(c(1, 2, 2, 3), 
                   levels = c(3, 2, 1), 
                   labels = c("Apple", "Pear", "Orange"))

Learning by Doing

After getting a data set, you must first have a general understanding of the data.

check the types of variables in the wvs7 data:


str(wvs7)

check the properties of the incomeLevel variable in the data set:


class(wvs7$incomeLevel)

view the value of incomeLevel and the frequency of each value:


table(wvs7$incomeLevel)

#`as.POSIXct` and `as.POSIXlt`
ct <- as.POSIXct("2023-03-20 10:11:12")
lt <- as.POSIXlt("2023-03-20 10:11:12")
unlist(ct)
unlist(lt)
Sys.time() # get the current time
today() # get the year, month, and day of the day
now() # get the current day's year, month, day, hour, minute, and second time zone
# CST is the time zone where the computer ip is located during operation

# the full pack
time1 <- Sys.time()
time2 <- as.POSIXlt(Sys.time())
time2$wday # week of the day

## What if we only care about the date?
Sys.Date()
date1 <- as.Date("2019-01-02")
class(date1)  # check type of data

lubridate: the swiss army knife of time data

library(lubridate)

ymd("20221016")
mdy("10-16-2022")
dmy("16/10/2022")
ymd_hms("2022-10-16 09:00:00", tz = "Etc/GMT+8")
OlsonNames()

Learning by Doing

When facing vectors with different orders, such as:

x=c("20190101",'01012019','021901')

How should we identify the time?

#help(parse_date_time)
parse_date_time(x,orders = c("ymd","dmy","dym"))

Matrix

See drhur("algebra") for matrix.

Array

Array : As the name implies, it is an "array" of columns, which can be used to record data of more than two dimensions, and can be created by the array command.

# create two vectors of different lengths
vector1 <- c(5, 9, 3)
vector2 <- c(10, 11, 12, 13, 14, 15)

# enter these vectors into an array
result <- array(c(vector1, vector2), dim = c(3, 3, 2))
result

List

List: A "list" that can contain many different types of objects.

ls_monks <- list(name = c("Wukong Sun", "Sanzang Tang", "Wuneng Zhu", "Wujing Sha"),
                 power = c(100, 20, 90, 40),
                 buddha = c(TRUE, TRUE, FALSE, FALSE))

ls_monks

Data Frame

Data Frame: A special kind of column/matrix

In Excel:

In R:

df_toy <- data.frame(female = c(0,1,1,0),
           age = c(29, 39, 38, 12),
           name = c("Iron Man", "Black Widow", "Captain Marvel", "Captain America"))

df_toy

In Rstudio:

Data Attributes

  1. class, typeof: query variable attributes
  2. nchars: get the length of the string
  3. levels: get or set the level of the factor
  4. nrow: returns the number of rows of the specified matrix
  5. ncol: used to return the number of columns of the specified matrix
  6. dim: the subspace formed by the column vector, that is, the dimension
vec_integer <- c(1, -2, NA)

vec_double <- c(1.5, -2.34, 1/3)

vec_chr <- c("牛", "^_^", "R is hard,but I can nail it.")

vec_fac <- factor(c(1, 2, 2, 3))

ls_monks <- list(name = c("Wukong Sun", "Sanzang Tang", "Wuneng Zhu", "Wujing Sha"),
                 power = c(100, 20, 90, 40),
                 buddha = c(TRUE, TRUE, FALSE, FALSE))

df_toy <- data.frame(female = c(0,1,1,0),
           age = c(29, 39, 38, 12),
           name = c("Iron Man", "Black Widow", "Captain Marvel", "Captain America"))

class(vec_double)
typeof(vec_integer)

nchar(vec_chr)
levels(vec_fac)

length(vec_double)
length(ls_monks)
length(df_toy)

nrow(df_toy)
ncol(df_toy)
dim(df_toy)

Learning by Doing

Convert the following vector to numeric type:

c(FALSE, TRUE)
# help(as.numeric)
as.numeric(c(FALSE, TRUE))

The value of the gender variable female in wvs7 isTRUE,FALSE

In the specific analysis, the character variable is inconvenient to operate, and it can be converted into a numerical variable.


as.numeric(wvs7$female) - 1

Data Output

Syntax: <command>(<data to be saved>, file = <storage path>)

Store as R data

saveRDS(df_toy, file = "df_toy.rds")
save(df_toy, ls_monks, file = "test.rdata")

Save as csv file

write.csv(df_toy, file = "toy.csv")

Hint: If your data is in Chinese, there may be garbled characters in the stored csv.

You can store data in STATA, SPSS, SAS Excel, JSON, Matlab, HTML and other formats through special software packages or "Swiss Army Knife" (rio::export), but do you really want to do this?

STATA (.dta, \<14): 3.16 G = R (.rds): 0.05 G

| Method | Average Time | Minimum | Maximum | |:-----------------|:----------------:|:-----------:|:-----------:| | base::readRDS | 19.65 | 18.64 | 21.01 | | fst::read_fst | 1.39 | 0.56 | 3.41 | | haven::read_sav | 104.78 | 101.00 | 111.85 | | qs::qread | 3.33 | 3.00 | 4.24 |

: Average time (in seconds) taken by the four ways of reading GSS data in R

| Method | Average Time | Minimum | Maximum | File Size | |:----------------|:----------------:|:-----------:|:-----------:|:-------------:| | base::saveRDS | 98.36 | 93.09 | 103.24 | 30.9 MB | | fst::write_fst | 2.70 | 1.86 | 4.05 | 122.1 MB | | qs::qsave | 5.03 | 4.35 | 6.62 | 44.6 MB |

: Average time taken to write GSS data (and file size) in R

Summary



Try the drhur package in your browser

Any scripts or data that you put into this service are public.

drhur documentation built on May 31, 2023, 6:03 p.m.