README.md
In sfeuerriegel/ResearchGroupTools: Collection of Custom Helper Functions

ResearchGroupTools
Functionality

ResearchGroupTools

ResearchGroupTools provides a collection of utilitiy function for rapid prototyping. These functions facilitate implemenation works related to advanced analytics. As such, it specifically supports data handling, preprocessing, visualization and analytics.

Using the devtools package, you can easily install the latest development version of ResearchGroupTools with

install.packages("devtools")

# Recommended option: download and install latest version from "GitHub"
devtools::install_github("sfeuerriegel/ResearchGroupTools", dependencies = TRUE)

Notes:

The package will only be shipped via GitHub; CRAN support is not intended due to several hooks.

This section shows the basic functionality of how accelerate data science in R. First, load the corresponding package ResearchGroupTools.

library(ResearchGroupTools)

By default, the seed for the random number generator is initialized to 0.

Some export routines require a few changes to your LaTeX document in order to get it running. The steps are documented in the help of e.g. ; below is a minimal working example:

\documentclass{article}
\usepackage{SIunitx}
  \newcommand{\sym}[1]{\rlap{$^{#1}$}}
  \sisetup{input-symbols={()*}}
\begin{document}

\begin{tabular}{l SSS}
\toprule
\include{table_cor}
\end{tabular}
\end{document}

Above, we included SIunitx, introduced a command \sym, changed the input-symbols and used custom column alignments (S).

Functionality

Library() (note the capital "L") loads packages. If not available, these are automatically installed.

Library("texreg")
#> texreg

loadRegressionLibraries() loads and installs common libraries for econometric purposes.

loadRegressionLibraries()

%+% concatenates strings (as an alterantive to paste()).

"a" %+% "b"
#> [1] "ab"
3 %+% 4
#> [1] "34"
do.call(`%+%`, as.list(letters))
#> [1] "abcdefghijklmnopqrstuvwxyz"

ceil() computes the largest integer less or equal given a numerical value. It is a wrapper for ceiling with a more consistent naming.

ceil(3.4)
#> [1] 4

cumskewness(), cumkurtosis(), cumsd() (standard deviation) and cumadev() (average deviation) return a vector with cumulative results of the specific function.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- data_frame(x = 1:10, y = rnorm(10))
cumsd(df$x)
#>  [1]        NA 0.7071068 1.0000000 1.2909944 1.5811388 1.8708287 2.1602469
#>  [8] 2.4494897 2.7386128 3.0276504

df %>%
  mutate_all(funs("mean" = cummean, "sd" = cumsd))
#> # A tibble: 10 × 6
#>        x            y x_mean    y_mean      x_sd      y_sd
#>    <int>        <dbl>  <dbl>     <dbl>     <dbl>     <dbl>
#> 1      1  1.262954285    1.0 1.2629543        NA        NA
#> 2      2 -0.326233361    1.5 0.4683605 0.7071068 1.1237254
#> 3      3  1.329799263    2.0 0.7555067 1.0000000 0.9374104
#> 4      4  1.272429321    2.5 0.8847374 1.2909944 0.8078538
#> 5      5  0.414641434    3.0 0.7907182 1.5811388 0.7305264
#> 6      6 -1.539950042    3.5 0.4022735 1.8708287 1.1542404
#> 7      7 -0.928567035    4.0 0.2121534 2.1602469 1.1675809
#> 8      8 -0.294720447    4.5 0.1487942 2.4494897 1.0957240
#> 9      9 -0.005767173    5.0 0.1316207 2.7386128 1.0262500
#> 10    10  2.404653389    5.5 0.3589240 3.0276504 1.2053364

pull(), pull_string() and pull_ith() extract single columns from a dplyr tbl object and return them as a vector.

d %>% pull(x)
#>  [1]  1  2  3  4  5  6  7  8  9 10
d %>% pull("x")
#>  [1]  1  2  3  4  5  6  7  8  9 10

v <- "x"
d %>% pull_string(v)
#>  [1]  1  2  3  4  5  6  7  8  9 10

d %>% pull_ith(1)
#>  [1]  1  2  3  4  5  6  7  8  9 10

completeLowResolutionData() takes data in low resolution (e.g. monthly) and copies its values to match a high resolution (e.g. daily).

ts <- data.frame(Date = seq(from = as.Date("2000-01-01"), to = as.Date("2000-03-31"), by = "1 day"))
df_monthly <- data.frame(Month = c(as.Date("2000-01-31"), as.Date("2000-02-29"), as.Date("2000-03-31")),
                         Values = 1:3)

df_daily <- completeLowResolutionData(ts$Date, df_monthly, "Month")

# example of how to bind things together
ts <- ts %>%
  left_join(df_daily, by = c("Date" = "Month"))

lags() computes several lags of a vector.

lags(1:5, c(1, 2, 3))
#>      [,1] [,2] [,3]
#> [1,]   NA   NA   NA
#> [2,]    1   NA   NA
#> [3,]    2    1   NA
#> [4,]    3    2    1
#> [5,]    4    3    2
lags(ts(1:5), c(1, 2, 5))
#> Time Series:
#> Start = 1 
#> End = 5 
#> Frequency = 1 
#>   structure(c(NA, 1L, 2L, 3L, 4L), .Tsp = c(1, 5, 1), class = "ts")
#> 1                                                                NA
#> 2                                                                 1
#> 3                                                                 2
#> 4                                                                 3
#> 5                                                                 4
#>   structure(c(NA, NA, 1L, 2L, 3L), .Tsp = c(1, 5, 1), class = "ts")
#> 1                                                                NA
#> 2                                                                NA
#> 3                                                                 1
#> 4                                                                 2
#> 5                                                                 3
#>   structure(c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
#> 1                                                               NA
#> 2                                                               NA
#> 3                                                               NA
#> 4                                                               NA
#> 5                                                               NA

differences() calculates lagged differences of a given order. It is more convenient thant diff() as it adds leading NA values.

differences(1:10)
#>  [1] NA  1  1  1  1  1  1  1  1  1
differences(c(1, 2, 4, 8, 16, 32))
#> [1] NA  1  2  4  8 16
differences(c(1, 2, 4, 8, 16, 32), order = 2)
#> [1] NA NA  1  2  4  8
differences(c(1, 2, 4, 8, 16, 32), na_padding = FALSE)
#> [1]  1  2  4  8 16

returns() calculates returns of a time series (similar to diff() for differenes).

returns(1:10)
#>  [1]        NA 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000 0.1666667
#>  [8] 0.1428571 0.1250000 0.1111111
returns(c(1, 2, 4, 8, 16, 32))
#> [1] NA  1  1  1  1  1
returns(c(1, 2, 4, 8, 16, 32), na_padding = FALSE) # remove trailing NA's
#> [1] 1 1 1 1 1

logReturns() computes log-returns (by default, with base exp(1)).

logReturns(c(1, 2, 4, 8, 16, 32), base = 2)
#> [1] NA  1  1  1  1  1

findRowsNA() and showRowsNA(), as well as findColsNA() and showColsNA(), help find NA values within a dataset.

m <- matrix(letters[c(1, 2, NA, 3, NA, 4, 5, 6, 7, 8)], ncol = 2, byrow = FALSE)
colnames(m) <- c("x", "y")
m
#>      x   y  
#> [1,] "a" "d"
#> [2,] "b" "e"
#> [3,] NA  "f"
#> [4,] "c" "g"
#> [5,] NA  "h"

anyNA(m)      # use built-in routine to test for NA values
#> [1] TRUE

findRowsNA(m) # returns indices of that rows
#> [1] 3 5
showRowsNA(m) # prints rows with NA values
#>      x  y  
#> [1,] NA "f"
#> [2,] NA "h"

findColsNA(m) # returns name of that columns
#> [1] "x"
showColsNA(m) # print columns with NA values
#> [1] "a" "b" NA  "c" NA

last_non_NA() returns the last entry in a vector which is not NA. This is helpful when aggregating high resolution data (see example below).

last_non_NA(c(1, 2, 3, 4, NA))
#> [1] 4

values <- 1:100
values[sample(1:100, 10)] <- NA
df <- cbind(Year = c(rep(2000, 5), rep(2001, 5)),
              as.data.frame(matrix(values, nrow = 10)))

df %>%
  group_by(Year) %>%
  summarize_each(funs(last_non_NA)) %>%
  ungroup() %>%
  head()
#> # A tibble: 2 × 11
#>    Year    V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
#>   <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1  2000     5    15    25    35    45    55    65    74    85    95
#> 2  2001    10    20    30    40    50    60    70    80    90   100

removeOutlierObservations() trims the dataset with regard to certain variables. It thus removes outliers at the 0.5% level at both ends (or any other threshold defined by the argument cutoff).

d <- data.frame(x1 = rnorm(200), x2 = rnorm(200), y = rnorm(200))

d_trimmed <- removeOutlierObservations(d)
#> Dropping 6 observations, i.e. 0.03 %.
dim(d_trimmed)
#> [1] 194   3

d_trimmed <- removeOutlierObservations(d, variables = "y", cutoff = 2.0)
#> Dropping 8 observations, i.e. 0.04 %.
dim(d_trimmed)
#> [1] 192   3

d_trimmed <- removeOutlierObservations(d, variables = c("x1", "x2"), cutoff = 2.0)
#> Dropping 16 observations, i.e. 0.08 %.
dim(d_trimmed)
#> [1] 184   3

descriptiveStatistics() produces pretty summary statistics. By default, it exports the statistics into a LaTeX file. An optional parameter filename can be used to change the filename for the export.

data(USArrests)
descriptiveStatistics(USArrests)
#> Column names: \textbf{Mean} & \textbf{Median} & \textbf{Min.} & \textbf{Max} & \textbf{Std. dev.} & \textbf{Skewness} & \textbf{Excess kurtosis} \\
#>             mean median  min   max     sd   skew excess_kurtosis
#> Murder     7.788   7.25  0.8  17.4  4.356  0.371          -0.949
#> Assault  170.760 159.00 45.0 337.0 83.338  0.221          -1.145
#> UrbanPop  65.540  66.00 32.0  91.0 14.475 -0.213          -0.872
#> Rape      21.232  20.10  7.3  46.0  9.366  0.754           0.075
unlink("table_descriptives.tex")

correlationMatrix() computes a pretty correlation matrix. An optional parameter filename can be used to specify a LaTeX file to which the result is exported with significance stars. Note: this requires a few changes to your LaTeX preamble.

correlationMatrix(USArrests)
#>            Murder  Assault UrbanPop Rape
#> Murder                                  
#> Assault  0.802***                       
#> UrbanPop    0.070    0.259              
#> Rape     0.564*** 0.665***  0.411**
correlationMatrix(USArrests, filename = "table_cor.tex") # stores output in LaTeX file
#> Column names: \textbf{Murder} & \textbf{Assault} & \textbf{UrbanPop} & \textbf{Rape} \\
#>            Murder  Assault UrbanPop Rape
#> Murder                                  
#> Assault  0.802***                       
#> UrbanPop    0.070    0.259              
#> Rape     0.564*** 0.665***  0.411**
unlink("table_cor.tex")

jplot() is an alternative to ggplot() but with a journal-style layout

library(ggplot2)
#> 
#> Attaching package: 'ggplot2'
#> The following object is masked from 'package:ResearchGroupTools':
#> 
#>     %+%

df <- data.frame(x = 1:20,
                 y = 1:20,
                 z = as.factor(rep(1:4, each = 5)))

jplot(df) +
  geom_line(aes(x = x, y = y, color = z, linetype = z))

# For comparison:
# ggplot(df) +
#  geom_line(aes(x = x, y = y, color = z, linetype = z))

jplot(df) +
  geom_point(aes(x = x, y = y, color = z))

# For comparison:
# ggplot(df) +
#   geom_point(aes(x = x, y = y, color = z))

linePlot() is a simple wrapper to ggplot2.

linePlot(1:10)


x <- seq(0, 4, length.out = 100)
linePlot(x, sin(x))

scientificLabels() enables a nice exponential notation in ggplot2 plots.

ggplot(df, aes(x=x, y=y)) +
  geom_point() +
  scale_x_continuous(labels=scientificLabels) +
  scale_y_continuous(labels=scientificLabels)

allDigitsLabels() enforces that all digits are displayed in ggplot2 plots.

ggplot(df, aes(x=x, y=y)) +
  geom_point() +
  scale_x_continuous(labels=allDigitsLabels) +
  scale_y_continuous(labels=allDigitsLabels)

makeFormula() lets one build formulae based on strings to identify the individual variables.

makeFormula("y", "x")
#> y ~ x
makeFormula("y", c("x1", "x2", "x3"))
#> y ~ x1 + x2 + x3
makeFormula("y", c("x1", "x2", "x3"), "dummies")
#> y ~ x1 + x2 + x3 + dummies

regression() is a customized, all-in-one routine for ordinary least squares with optional dummy variables. It can filter for a subset of observations, remove outliers at a certain cutoff and remove dummies that are NA. It also changes to covariance matrix internally if desired (note: this requires a different estimator from sandwich).

x1 <- 1:100
x2 <- sin(1:100)
clusters <- rep(c(1, 2), 50)
dummies <- model.matrix(~ clusters)
y <- x1 + x2 + clusters + rnorm(100)
d <- data.frame(x1 = x1, x2 = x2, y = y)

m_dummies <- regression(formula("y ~ x1 + x2 + dummies"), data = d, subset = 1:90,
                        dummies = "dummies", cutoff = 0.5)
#> Removing 2 observations; i.e. 0.02222222 percent.
#> Dropping 1 coefficients: dummies(Intercept)
summary(m_dummies)
#> 
#> Call:
#> lm(formula = formula, data = data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.04179 -0.76100  0.00665  0.54697  2.30934 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   0.2678     0.3829   0.699    0.486    
#> x1            0.9982     0.0041 243.467  < 2e-16 ***
#> x2            1.1542     0.1503   7.680 2.64e-11 ***
#> dummies       0.8815     0.2135   4.128 8.58e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.001 on 84 degrees of freedom
#> Multiple R-squared:  0.9986, Adjusted R-squared:  0.9985 
#> F-statistic: 1.979e+04 on 3 and 84 DF,  p-value: < 2.2e-16

library(sandwich)

m_dummies <- regression(formula("y ~ x1 + x2 + dummies"), data = d, subset = 1:90,
                        dummies = "dummies", cutoff = 0.5, vcov = NeweyWest)
#> Removing 2 observations; i.e. 0.02222222 percent.
#> Dropping 1 coefficients: dummies(Intercept) 
#> Overriding internal attributes to use non-OLS covariance matrix.
summary(m_dummies)
#> 
#> Call:
#> lm(formula = formula, data = data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.04179 -0.76100  0.00665  0.54697  2.30934 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.267773   0.424562   0.631 0.529946    
#> x1          0.998245   0.003324 300.360  < 2e-16 ***
#> x2          1.154176   0.147325   7.834  1.3e-11 ***
#> dummies     0.881494   0.253195   3.481 0.000794 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.001 on 84 degrees of freedom
#> Multiple R-squared:  0.9986, Adjusted R-squared:  0.9985 
#> F-statistic: 1.979e+04 on 3 and 84 DF,  p-value: < 2.2e-16

regressionStepwise() is an extension to iteratively incorporate regressors one by one. The resulting list can then easily be exported. It also changes to covariance matrix internally if desired (note: this requires a different estimator from sandwich).

models <- regressionStepwise(formula("y ~ x1 + x2 + dummies"), data = d, subset = 1:90,
                            dummies = "dummies", cutoff = 0.5)
#> Removing 2 observations; i.e. 0.02222222 percent.
#> Dropping 1 coefficients: dummies(Intercept) 
#> Removing 2 observations; i.e. 0.02222222 percent.
#> Dropping 1 coefficients: dummies(Intercept)

length(models)
#> [1] 2

library(texreg)

texreg(models, omit.coef = "dummies")
#> 
#> \begin{table}
#> \begin{center}
#> \begin{tabular}{l c c }
#> \hline
#>  & Model 1 & Model 2 \\
#> \hline
#> (Intercept) & $0.53$       & $0.27$       \\
#>             & $(0.50)$     & $(0.38)$     \\
#> x1          & $0.99^{***}$ & $1.00^{***}$ \\
#>             & $(0.01)$     & $(0.00)$     \\
#> x2          &              & $1.15^{***}$ \\
#>             &              & $(0.15)$     \\
#> \hline
#> R$^2$       & 1.00         & 1.00         \\
#> Adj. R$^2$  & 1.00         & 1.00         \\
#> Num. obs.   & 88           & 88           \\
#> RMSE        & 1.28         & 1.00         \\
#> \hline
#> \multicolumn{3}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
#> \end{tabular}
#> \caption{Statistical models}
#> \label{table:coefficients}
#> \end{center}
#> \end{table}

models <- regressionStepwise(formula("y ~ x1 + x2 + dummies"), data = d, subset = 1:90,
                            dummies = "dummies", cutoff = 0.5, vcov = NeweyWest)
#> Removing 2 observations; i.e. 0.02222222 percent.
#> Dropping 1 coefficients: dummies(Intercept) 
#> Overriding internal attributes to use non-OLS covariance matrix.
#> Removing 2 observations; i.e. 0.02222222 percent.
#> Dropping 1 coefficients: dummies(Intercept) 
#> Overriding internal attributes to use non-OLS covariance matrix.
texreg(models, omit.coef = "dummies")
#> 
#> \begin{table}
#> \begin{center}
#> \begin{tabular}{l c c }
#> \hline
#>  & Model 1 & Model 2 \\
#> \hline
#> (Intercept) & $0.53$       & $0.27$       \\
#>             & $(0.50)$     & $(0.38)$     \\
#> x1          & $0.99^{***}$ & $1.00^{***}$ \\
#>             & $(0.01)$     & $(0.00)$     \\
#> x2          &              & $1.15^{***}$ \\
#>             &              & $(0.15)$     \\
#> \hline
#> R$^2$       & 1.00         & 1.00         \\
#> Adj. R$^2$  & 1.00         & 1.00         \\
#> Num. obs.   & 88           & 88           \\
#> RMSE        & 1.28         & 1.00         \\
#> \hline
#> \multicolumn{3}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
#> \end{tabular}
#> \caption{Statistical models}
#> \label{table:coefficients}
#> \end{center}
#> \end{table}

showCoeftest() shows coefficient tests, but hides (dummy) variables starting with a certain string. Note: this is designed for output in the R console or within Rmarkdown. For exporting, better use texreg which has an argument named omit.coef.

showCoeftest(m_dummies, hide = "x") # leaves only the intercept
#>              Estimate Std..Error   t.value     Pr...t.. Stars
#> (Intercept) 0.2677732  0.4245619 0.6307047 0.5299458851      
#> dummies     0.8814936  0.2531952 3.4814776 0.0007937735   ***

standardizeCoefficients() extracts standardized coefficients and hides (dummy) variables if needed.

library(vars)
#> Loading required package: MASS
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> Loading required package: strucchange
#> Loading required package: urca
data(Canada)

prod <- differences(as.numeric(Canada[, 2]))
production <- data.frame(Prod = prod, Lag1 = dplyr::lag(prod), Lag2 = dplyr::lag(prod, 2))

m <- lm(Prod ~ Lag1, data = production)
standardizeCoefficients(m)
#>           Coef  SdChange StandardizedCoef
#> Lag1 0.2969171 0.2143436        0.2968569

library(quantreg)
#> Loading required package: SparseM
#> 
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#> 
#>     backsolve
data(stackloss)

qr <- rq(stack.loss ~ stack.x, 0.25)
standardizeCoefficients(qr)
#>                           Coef      SdChange StandardizedCoef
#> stack.xAir.Flow    5.00000e-01  4.584134e+00     4.506788e-01
#> stack.xWater.Temp  1.00000e+00  3.160771e+00     3.107441e-01
#> stack.xAcid.Conc. -4.57967e-16 -2.454049e-15    -2.412642e-16

extractRegressionStatistics() extracts key statistics of regression and returns them as a data.frame (so that it can later be stacked via row-wise binding).

x <- 1:10
y <- 1 + x + rnorm(10)
m <- lm(y ~ x)

extractRegressionStatistics(m)
#>   Observations DegreesFreedom ResidualError  Rsquared AdjRsquared      AIC
#> 1           10              8      1.110207 0.8902741   0.8765584 34.23826
#>        BIC Fstatistic Fsignficance Fstars
#> 1 35.14601   64.90894 4.150563e-05    ***

getRowsOutlierRemoval() helps to remove outliers at the 0.5% level at both ends (or any other threshold defined by the argument cutoff).

 d <- data.frame(x = 1:200, y = 1:200 + rnorm(200))
m <- lm(y ~ x, d)                  # fit original model

idx_rm <- getRowsOutlierRemoval(m) # identify row indices of outliers
m <- lm(y ~ x, d[-idx_rm, ])       # refit model with outliers removed

texreg_tvalues() converts a the result of an ordinary least squares regression into in LaTeX. Instead of reporting standard errors, it gives t-values as a common alternative in finance. An optional parameter dummies can be specified which removes certain coefficients in the output. More than one model can be passed via a list.

texreg_tvalues(m_dummies)
#> 
#> \begin{table}
#> \begin{center}
#> \begin{tabular}{l c }
#> \hline
#>  & Model 1 \\
#> \hline
#> (Intercept) & $0.27$       \\
#>             & $(0.63)$     \\
#> x1          & $1.00^{***}$ \\
#>             & $(300.36)$   \\
#> x2          & $1.15^{***}$ \\
#>             & $(7.83)$     \\
#> dummies     & $0.88^{***}$ \\
#>             & $(3.48)$     \\
#> \hline
#> R$^2$       & 1.00         \\
#> Adj. R$^2$  & 1.00         \\
#> Num. obs.   & 88           \\
#> RMSE        & 1.00         \\
#> \hline
#> \multicolumn{2}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
#> \end{tabular}
#> \caption{Statistical models}
#> \label{table:coefficients}
#> \end{center}
#> \end{table}
texreg_tvalues(m_dummies, hide = "dummies")
#> 
#> \begin{table}
#> \begin{center}
#> \begin{tabular}{l c }
#> \hline
#>  & Model 1 \\
#> \hline
#> (Intercept) & $0.27$       \\
#>             & $(0.38)$     \\
#> x1          & $1.00^{***}$ \\
#>             & $(0.00)$     \\
#> x2          & $1.15^{***}$ \\
#>             & $(0.15)$     \\
#> dummies     & $0.88^{***}$ \\
#>             & $(0.21)$     \\
#> \hline
#> R$^2$       & 1.00         \\
#> Adj. R$^2$  & 1.00         \\
#> Num. obs.   & 88           \\
#> RMSE        & 1.00         \\
#> \hline
#> \multicolumn{2}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
#> \end{tabular}
#> \caption{Statistical models}
#> \label{table:coefficients}
#> \end{center}
#> \end{table}
texreg_tvalues(list(m, m_dummies))
#> 
#> \begin{table}
#> \begin{center}
#> \begin{tabular}{l c c }
#> \hline
#>  & Model 1 & Model 2 \\
#> \hline
#> (Intercept) & $0.11$       & $0.27$       \\
#>             & $(0.77)$     & $(0.63)$     \\
#> x           & $1.00^{***}$ &              \\
#>             & $(809.74)$   &              \\
#> x1          &              & $1.00^{***}$ \\
#>             &              & $(300.36)$   \\
#> x2          &              & $1.15^{***}$ \\
#>             &              & $(7.83)$     \\
#> dummies     &              & $0.88^{***}$ \\
#>             &              & $(3.48)$     \\
#> \hline
#> R$^2$       & 1.00         & 1.00         \\
#> Adj. R$^2$  & 1.00         & 1.00         \\
#> Num. obs.   & 198          & 88           \\
#> RMSE        & 1.00         & 1.00         \\
#> \hline
#> \multicolumn{3}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
#> \end{tabular}
#> \caption{Statistical models}
#> \label{table:coefficients}
#> \end{center}
#> \end{table}

qr25 <- rq(stack.loss ~ stack.x, 0.25)
qr50 <- rq(stack.loss ~ stack.x, 0.50)
qr75 <- rq(stack.loss ~ stack.x, 0.75)
texreg_tvalues(list(qr25, qr50, qr75))
#> Warning in summary.rq(model, se = se): 1 non-positive fis

#> Warning in summary.rq(model, se = se): 1 non-positive fis
#> 
#> \begin{table}
#> \begin{center}
#> \begin{tabular}{l c c c }
#> \hline
#>  & Model 1 & Model 2 & Model 3 \\
#> \hline
#> (Intercept)       & $-36.00^{***}$ & $-39.69^{***}$ & $-54.19^{***}$ \\
#>                   & $(-7.70)$      & $(-5.56)$      & $(-5.78)$      \\
#> stack.xAir.Flow   & $0.50^{*}$     & $0.83^{***}$   & $0.87^{***}$   \\
#>                   & $(2.70)$       & $(6.55)$       & $(6.96)$       \\
#> stack.xWater.Temp & $1.00^{*}$     & $0.57$         & $0.98^{**}$    \\
#>                   & $(2.46)$       & $(1.68)$       & $(3.08)$       \\
#> stack.xAcid.Conc. & $0.00$         & $-0.06$        & $0.00$         \\
#>                   & $(0.00)$       & $(-1.01)$      & $(0.00)$       \\
#> \hline
#> Num. obs.         & 21             & 21             & 21             \\
#> Percentile        & 0.25           & 0.50           & 0.75           \\
#> \hline
#> \multicolumn{4}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
#> \end{tabular}
#> \caption{Statistical models}
#> \label{table:coefficients}
#> \end{center}
#> \end{table}

testDiagnostics() checks if non-autocollreation, no serial correlation, homoskedasticity and no multicollinearity is present.

library(car)
#> 
#> Attaching package: 'car'
#> The following object is masked from 'package:dplyr':
#> 
#>     recode
m <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)

testDiagnostics(m)
#> * VIF: The following variables do have a variance inflation factor of 4 or greater:  .
#> * Breusch-Pagan Test: The hypothesis of homoskedasticity cannot be rejected for the given significance level. Hence, homoskedasticity
#> * Breusch-Godfrey Test: The hypothesis of no serial correlation cannot be rejected for the given significance level. Hence, no serial correlation.
#> * Durbin-Watson Test: The hypothesis of no autocorrelation cannot be rejected for the given significance level. Hence, no autocorrelation.
#> The following diagnostic test caused troubles:  Non-multicollinearity .
#>  Non-multicollinearity       Homoskedasticity    Non-autocorrelation 
#>                  FALSE                   TRUE                   TRUE 
#> Non-serial correlation 
#>                   TRUE

standardizeCoefficients() returns standardized coefficients.

var.2c <- VAR(Canada, p = 2, type = "none")

standardizeCoefficients(var.2c$varresult$e)
#>                Coef   SdChange StandardizedCoef
#> e.l1     1.62046761 14.4485612       1.60957876
#> prod.l1  0.17973134  0.7418013       0.08263712
#> rw.l1   -0.04425592 -0.9971380      -0.11108180
#> U.l1     0.11310425  0.1799192       0.02004311
#> e.l2    -0.64815156 -5.7426482      -0.63973460
#> prod.l2 -0.11683270 -0.4640806      -0.05169887
#> rw.l2    0.04475537  1.0337409       0.11515939
#> U.l2    -0.06581206 -0.1040299      -0.01158900

std <- standardizeCoefficients(var.2c)
std$e
#>                Coef   SdChange StandardizedCoef
#> e.l1     1.62046761 14.4485612       1.60957876
#> prod.l1  0.17973134  0.7418013       0.08263712
#> rw.l1   -0.04425592 -0.9971380      -0.11108180
#> U.l1     0.11310425  0.1799192       0.02004311
#> e.l2    -0.64815156 -5.7426482      -0.63973460
#> prod.l2 -0.11683270 -0.4640806      -0.05169887
#> rw.l2    0.04475537  1.0337409       0.11515939
#> U.l2    -0.06581206 -0.1040299      -0.01158900

adf() checks a time series for stationarity using the Augmented Dickey-Fuller (ADF) test. It returns the result in a pretty format and, if an optional argument filename is specified, it also exports it as LaTeX.

adf(USArrests, verbose = FALSE)
#> The following time series appear stationary, as P-values > 0.05:  Murder, Assault, UrbanPop, Rape
#>   Variable Type Lags   TestStat CriticalValue10 CriticalValue5
#> 1   Murder none    1 -1.7630314           -1.61          -1.95
#> 2  Assault none    1 -1.5869393           -1.61          -1.95
#> 3 UrbanPop none    1 -0.4833016           -1.61          -1.95
#> 4     Rape none    1 -1.7395692           -1.61          -1.95
#>   CriticalValue1     Pvalue
#> 1          -2.62 0.07789519
#> 2          -2.62 0.11252638
#> 3          -2.62 0.62888163
#> 4          -2.62 0.08193470
adf(USArrests, vars = c("Murder", "Rape"), type = "drift",
   filename = "adf.tex", verbose = FALSE)
#> 
#> 
#> \begin{tabular}{ll SSSSS} 
#> \toprule 
#> \multicolumn{1}{l}{Variable} & \multicolumn{1}{l}{Deterministic trend} & \multicolumn{1}{c}{Lags}& \multicolumn{1}{c}{Test value} & \multicolumn{3}{c}{\textbf{Critical values}}\\ 
#> \cline{5-7} 
#> &&&& $10\,\%$ & $5\,\%$ & $1\,\%$ \\ 
#> 
#> 
#> All time series appear stationary, since all P-values < 0.05.
#>   Variable     Type Lags  TestStat CriticalValue10 CriticalValue5
#> 1   Murder Constant    1 -5.653178            -2.6          -2.93
#> 2     Rape Constant    1 -4.762830            -2.6          -2.93
#>   CriticalValue1       Pvalue
#> 1          -3.58 1.575081e-08
#> 2          -3.58 1.908966e-06
unlink("adf.tex")

exportAdfDifferences() allows to export an ADF test in levels and in differences in a combined table.

adf_levels <- adf(USArrests)
#> 1
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> 2
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> 3
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> 4
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> The following time series appear stationary, as P-values > 0.05:  Murder, Assault, UrbanPop, Rape
adf_diff1 <- adf(data.frame(Murder = diff(USArrests$Murder),
                            Assault = diff(USArrests$Assault),
                            UrbanPop = diff(USArrests$UrbanPop),
                            Rape = diff(USArrests$Rape)))
#> 1
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> 2
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> 3
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> 4
#> Length  Class   Mode 
#>      1  ur.df     S4 
#> All time series appear stationary, since all P-values < 0.05.
exportAdfDifferences(adf_levels, adf_diff1)
#> 
#> 
#> \begin{tabular}{ll SSSSS} 
#> \toprule 
#> \multicolumn{1}{l}{Variable} & \multicolumn{1}{l}{Deterministic trend} & \multicolumn{1}{c}{Lags}& \multicolumn{1}{c}{Test value} & \multicolumn{3}{c}{\textbf{Critical values}}\\ 
#> \cline{5-7} 
#> &&&& $10\,\%$ & $5\,\%$ & $1\,\%$ \\ 
#> 
#> 
#>   tmp$Variable tmp$Type Lags TestStat CriticalValue10 CriticalValue5
#> 1       Murder     none    1   -1.763           -1.61          -1.95
#> 5      Assault     none    1   -8.020           -1.61          -1.95
#> 2     UrbanPop     none    1   -1.587           -1.61          -1.95
#> 6         Rape     none    1   -8.360           -1.61          -1.95
#> 3       Murder     none    1   -0.483           -1.61          -1.95
#> 7      Assault     none    1   -8.389           -1.61          -1.95
#> 4     UrbanPop     none    1   -1.740           -1.61          -1.95
#> 8         Rape     none    1  -10.775           -1.61          -1.95
#>   CriticalValue1
#> 1          -2.62
#> 5          -2.62
#> 2          -2.62
#> 6          -2.62
#> 3          -2.62
#> 7          -2.62
#> 4          -2.62
#> 8          -2.62
#>   tmp$Variable tmp$Type Lags TestStat CriticalValue10 CriticalValue5
#> 1       Murder     none    1   -1.763           -1.61          -1.95
#> 5      Assault     none    1   -8.020           -1.61          -1.95
#> 2     UrbanPop     none    1   -1.587           -1.61          -1.95
#> 6         Rape     none    1   -8.360           -1.61          -1.95
#> 3       Murder     none    1   -0.483           -1.61          -1.95
#> 7      Assault     none    1   -8.389           -1.61          -1.95
#> 4     UrbanPop     none    1   -1.740           -1.61          -1.95
#> 8         Rape     none    1  -10.775           -1.61          -1.95
#>   CriticalValue1
#> 1          -2.62
#> 5          -2.62
#> 2          -2.62
#> 6          -2.62
#> 3          -2.62
#> 7          -2.62
#> 4          -2.62
#> 8          -2.62
unlink("adf.tex")

cointegrationTable() performs a cointegration test following the Johansen procedure. The output is written as LaTeX into a file named filename.

cointegrationTable(USArrests, vars = c("Murder", "Rape"), K = 2, filename = "cointegration_eigen.tex")
#> Test statistic in the top row is larger than the 1% values:  All time-series variables are stationary, i.e. I(0), to start with. Cointegration is not relevant here. 
#> 
#> 
#> \begin{tabular}{l SSSS} 
#> \toprule 
#> \textbf{$H_{0}$} 
#> & \textbf{Test statistic} & \multicolumn{3}{c}{\textbf{Critical Values}}\\ 
#> \crule{3-5} 
#> & {$n = 2$}& {$10\,\%$} & {$5\,\%$} & {$1\,\%$} \\
#>     H0 TestStatistic CriticalValue10 CriticalValue5 CriticalValue1
#> 1  r=0         28.63           12.91           14.9          19.19
#> 2 r<=1        21.702             6.5           8.18          11.65
unlink("cointegration_eigen.tex")

plotIrf() returns a ggplot with a nice impulse response function in black/white.

plotIrf(irf, ylab = "Production")

impulseResponsePlot() combines computation and plot, thereby returning a ggplot with a nice impulse response function in black/white. If the optional argument filename is specified, the plot is automatically saved on the disk.

impulseResponsePlot(var.2c, impulse = "e", response = "prod", ylab = "Production", n.ahead = 5, filename = "irf_e_prod.pdf")
#> Saving 7 x 5 in image

unlink("irf_e_prod.pdf")

testSpecification() checks if non-autocorrelation, normally distributed residuals and homoskedasticity is present.

testSpecification(var.2c)
#> All specification tests seem fine. No autocorrelation, normally distributed residuals, homoskedasticity.
#>                           BG                PT               JB
#> Statistic   97.8554991662177  193.690038336454 4.37397926053634
#> Pvalue    0.0853032132233613 0.753481554600346 0.82190467653541
#> Stars                                                          
#>                        MARCH
#> Statistic   541.888980939688
#> Pvalue    0.0950466601495369
#> Stars

The default theme of ggplot2 is changed to theme_bw().
coeftostring() from the texreg package is overwritten. This also fixes the behavior of texreg() itself.

coeftostring(-0.000001, digits = 4) # the original function would return "-.0000"
#> [1] ".0000"

d <- data.frame(y = 1:1000 - 0.0000001, x = 1:1000)
m <- lm(y ~ x, data = d)
texreg(m) # intercept would otherwise be "-0.00"
#> Warning in summary.lm(model, ...): essentially perfect fit: summary may be
#> unreliable
#> 
#> \begin{table}
#> \begin{center}
#> \begin{tabular}{l c }
#> \hline
#>  & Model 1 \\
#> \hline
#> (Intercept) & $0.00^{***}$ \\
#>             & $(0.00)$     \\
#> x           & $1.00^{***}$ \\
#>             & $(0.00)$     \\
#> \hline
#> R$^2$       & 1.00         \\
#> Adj. R$^2$  & 1.00         \\
#> Num. obs.   & 1000         \\
#> RMSE        & 0.00         \\
#> \hline
#> \multicolumn{2}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
#> \end{tabular}
#> \caption{Statistical models}
#> \label{table:coefficients}
#> \end{center}
#> \end{table}

sanitize.numbers() inside xtable is overwritten.

xtable(matrix(1:4, nrow = 2) * -0.000001) # would otherwise return "-0.00"
#> % latex table generated in R 3.3.2 by xtable 1.8-2 package
#> % Fri Feb 03 18:21:03 2017
#> \begin{table}[ht]
#> \centering
#> \begin{tabular}{rrr}
#>   \hline
#>  & 1 & 2 \\ 
#>   \hline
#> 1 & 0.00 & 0.00 \\ 
#>   2 & 0.00 & 0.00 \\ 
#>    \hline
#> \end{tabular}
#> \end{table}

rebuildPackage() builds, loads and checks package during the development process all at once. In particular, the manual is updated.

rebuildPackage()
rebuildPackage(TRUE) # also runs README.Rmd

ResearchGroupTools is released under the MIT License

sfeuerriegel/ResearchGroupTools documentation built on May 29, 2019, 8:01 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sfeuerriegel/ResearchGroupTools
Collection of Custom Helper Functions

README.md
In sfeuerriegel/ResearchGroupTools: Collection of Custom Helper Functions

ResearchGroupTools

Installation

Usage

Changes to LaTeX

Functionality

Library handling

Strings

Numerical functions

Data handling

Time series

Matrix functions (or data.frame)

Descriptive statistics

Visualization

Regressions

Time series analysis

Hooks to other packages

Package development

License

R Package Documentation

Browse R Packages

We want your feedback!

sfeuerriegel/ResearchGroupTools Collection of Custom Helper Functions

README.md In sfeuerriegel/ResearchGroupTools: Collection of Custom Helper Functions

ResearchGroupTools

Installation

Usage

Changes to LaTeX

Functionality

Library handling

Strings

Numerical functions

Data handling

Time series

Matrix functions (or data.frame)

Descriptive statistics

Visualization

Regressions

Time series analysis

Hooks to other packages

Package development

License

R Package Documentation

Browse R Packages

We want your feedback!

sfeuerriegel/ResearchGroupTools
Collection of Custom Helper Functions

README.md
In sfeuerriegel/ResearchGroupTools: Collection of Custom Helper Functions