In benjnguyen/NguyenTools: Tools for STAT 3701

This vignette is documented to demonstrate the innards of the my package "NguyenTools". Inside the package are datasets and functions. The following documentation will illustrate rudimentary ways to use the data sets and the functions.

library(ggplot2)
library(magrittr)
library(NguyenTools)

Demonstration of using functions in package

The following functions were provided in Homework 1 and Quiz 1 of STAT 3701.

The function 'continentAsia()' is a wrapper that uses the dataset 'gp2007', which will be described later down in the vignette. It filters the data by continents; namely it pulls out the part of the data in which the continent is 'Asia.'

continentAsia()

The function 'func1' computes the mean, variance, and standard deviation of a vector of data.

func1(rnorm(10))

The function 'func2' does the same thing as 'func1', with the addition that the supplied vector argument must be finite, numeric, and has a length greater than 0.

func2(rnorm(10))

The function 'func3' computed an estimate for the maximum likelihood estimator for data generated by the gamma distribution.

func3(rgamma(10, pi))

The function 'func4' computes the mean, variance, and standard deviation of a vector given a vector of weighted probabilities.

data(d)
head(d)
func4(d)

The function 'func5' does the same thing as 'func4', with the addition that the supplied vectors must be finite, numeric, and has a length greater than 0.

d <- read.table(url("http://www.stat.umn.edu/geyer/3701/data/q1p4.txt"),header = TRUE)
func5(d)

The function 'func6' supplies an error message when the arguments supplied to a function is not viable.

func6(NA)

The function 'func7' computes the maximum likelihood estimate for a supplied data set generated by some distribution. In the following example, the data has been generated from the gamma distribution and the function implies that the maximum likelihood estimation should be done over the log likelihood function of the gamma distribution. The output returned in the maximum likelihood estimate of the parameter theta.

x1 <- ga_data
func1 = function(theta, x) dgamma(x, shape = theta, log = TRUE)
result7_gamma <- func7(x1,func1,c(0,3))
result7_gamma

The following functions were provided in Homework 2 and Quiz 2 of STAT 3701

The function 'xAx' and the binary operator '%xAx%' both computes the scalar value from $x^{T}A^{-1}x$, where x is an nx1 vector and A is a square nxn matrix.

load(url("http://www.stat.umn.edu/geyer/3701/data/q2p1.rda"))
a <- as.matrix(a)
x <- as.matrix(x)
xAx(a, x)
a %xAx% x

The function 'standardize' takes a matrix A and standardizes the columns of the matrix A. In this case, standardization of a column is computed by $\frac{x - \text{mean(x)}}{\text{sd(x)}}$, where $x \in A$, x is a column of the matrix A.

load(url("http://www.stat.umn.edu/geyer/3701/data/q2p3.rda"))
standardize(a)

The function 'myapply' is a function that intends to replicate the results of the base R 'apply()' family functions. In this case, the function 'myapply' takes in args. myapply(MATRIX, MARGIN, FUNCTION) where MATRIX is a matrix, MARGIN is an integer 1 or 2 that corresponds to the row or column of the matrix respectively, and a FUNCTION function to act along the margins of the matrix.

fred <- matrix(1:6, ncol = 2)
myapply(fred, 1, mean)
myapply(fred, 2, mean)
myapply(fred, 1, max)
myapply(fred, 2, max)
myapply(fred, 1, function(x) quantile(x, 0.75))
myapply(fred, 2, function(x) quantile(x, 0.75))

The wrapper 'p6wrapper()' returns the median values along a 3 dimensional data set.

load(url("http://www.stat.umn.edu/geyer/3701/data/q2p6.rda"))
pat
p6wrapper()

ggplot2 Graphics

The following code chunk outputs three ways to display data presented in a TED talk given by Hans Rosling called "The best stats you've ever seen."

The variables used are 'gdpPercap' and 'lifeExp', which stand for Gross Domestic Product (in US dollars) and Life Expectancy in years, respectively.

The plots were produced by using the package ggplot2.

gp2007 <- read.csv('http://users.stat.umn.edu/~almquist/3811_examples/gapminder2007ex.csv')
ggplot2::ggplot(data=gp2007, ggplot2::aes(x=`gdpPercap`, y=`lifeExp`, size = `pop`, col = `continent`)) + ggplot2::geom_point()
ggplot2::ggplot(data = gp2007, mapping = ggplot2::aes(x = gdpPercap)) + ggplot2::geom_histogram()
ggplot2::ggplot(data = gp2007, ggplot2::aes(x = gdpPercap, y = lifeExp)) + ggplot2::geom_line()

In Quiz 3, a requirement was to produce a wrapper that generated a plot of data. I will show the wrapper working below, which produces the plot above from the Rosling data.

NguyenTools::plotstuff()

Plot of Data Sets included in the package

The data sets included in the package are binomial data, cauchy data, gamma data, alaskan flight data, early january weather data, life expectancy ~ gdp data, and weather data. The data was provided in STAT 3701 to use for various quizzes and homework assignments.

plot(ga_data)
plot(cau_data)
plot(bin_data)

The plots for alaskan flight data, early january eather, and weather will be generated below. The plot for the dataset 'gp2007' has already been shown above, and will be omitted here.

The following chunk shows two ways to make the same plots for alaskan flight data.

s <- all_alaska_flights
plot(arr_delay ~ dep_delay, data = s, pch = 19, cex = .5, col = rgb(0, 0, 0, .2) )
plot(s$dep_delay, s$arr_delay, pch = 19, cex = .5, col = rgb(0, 0, 0, .2) )

ggplot(data = early_january_weather, aes(x = time_hour, y = temp)) + geom_line()
ggplot(data = weather, mapping = aes(x = temp, y = factor("A"))) + 
  geom_point() +
  theme(axis.ticks.y = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank())
hist_title <- "Histogram of Hourly Temperature Recordings from NYC 2013"
ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram() + labs(title = hist_title)