README.md
In statar: Tools Inspired by 'Stata' to Manipulate Tabular Data

statar

This package contains R functions corresponding to useful Stata commands.

The package includes: - panel data functions (monthly/quarterly dates, lead/lag, fillin) - data.frame functions (tabulate, merge) - vector functions (xtile, pctile, winsorize) - graph functions (binscatter)

Data Frame Functions

sum_up prints detailed summary statistics (corresponds to Stata summarize)

N <- 100
df <- tibble(
  id = 1:N,
  v1 = sample(5, N, TRUE),
  v2 = sample(1e6, N, TRUE)
)
sum_up(df)
df %>% sum_up(starts_with("v"), d = TRUE)
df %>% group_by(v1) %>%  sum_up()

tab prints distinct rows with their count. Compared to the dplyr function count, this command adds frequency, percent, and cumulative percent.

N <- 1e2 ; K = 10
df <- tibble(
  id = sample(c(NA,1:5), N/K, TRUE),
  v1 = sample(1:5, N/K, TRUE)       
)
tab(df, id)
tab(df, id, na.rm = TRUE)
tab(df, id, v1)

join is a wrapper for dplyr merge functionalities, with two added functions

The option check checks there are no duplicates in the master or using data.tables (as in Stata).

r # merge m:1 v1 join(x, y, kind = "full", check = m~1) - The option gen specifies the name of a new variable that identifies non matched and matched rows (as in Stata).

r # merge m:1 v1, gen(_merge) join(x, y, kind = "full", gen = "_merge")

The option update allows to update missing values of the master dataset by the value in the using dataset

Vector Functions

# pctile computes quantile and weighted quantile of type 2 (similarly to Stata _pctile)
v <- c(NA, 1:10)                   
pctile(v, probs = c(0.3, 0.7), na.rm = TRUE) 

# xtile creates integer variable for quantile categories (corresponds to Stata xtile)
v <- c(NA, 1:10)                   
xtile(v, n_quantiles = 3) # 3 groups based on terciles
xtile(v, probs = c(0.3, 0.7)) # 3 groups based on two quantiles
xtile(v, cutpoints = c(2, 3)) # 3 groups based on two cutpoints

# winsorize (default based on 5 x interquartile range)
v <- c(1:4, 99)
winsorize(v)
winsorize(v, replace = NA)
winsorize(v, probs = c(0.01, 0.99))
winsorize(v, cutpoints = c(1, 50))

Panel Data Functions

The classes "monthly" and "quarterly" print as dates and are compatible with usual time extraction (ie month, year, etc). Yet, they are stored as integers representing the number of elapsed periods since 1970/01/0 (resp in week, months, quarters). This is particularly handy for simple algebra:

 # elapsed dates
 library(lubridate)
 date <- mdy(c("04/03/1992", "01/04/1992", "03/15/1992"))  
 datem <- as.monthly(date)
 # displays as a period
 datem
 #> [1] "1992m04" "1992m01" "1992m03"
 # behaves as an integer for numerical operations:
 datem + 1
 #> [1] "1992m05" "1992m02" "1992m04"
 # behaves as a date for period extractions:
 year(datem)
 #> [1] 1992 1992 1992

tlag/tlead a vector with respect to a number of periods, not with respect to the number of rows

year <- c(1989, 1991, 1992)
value <- c(4.1, 4.5, 3.3)
tlag(value, 1, time = year)
library(lubridate)
date <- mdy(c("01/04/1992", "03/15/1992", "04/03/1992"))
datem <- as.monthly(date)
value <- c(4.1, 4.5, 3.3)
tlag(value, time = datem)

In constrast to comparable functions in zoo and xts, these functions can be applied to any vector and be used within a dplyr chain:

df <- tibble(
    id    = c(1, 1, 1, 2, 2),
    year  = c(1989, 1991, 1992, 1991, 1992),
    value = c(4.1, 4.5, 3.3, 3.2, 5.2)
)
df %>% group_by(id) %>% mutate(value_l = tlag(value, time = year))

is.panel checks whether a dataset is a panel i.e. the time variable is never missing and the combinations (id, time) are unique.

df <- tibble(
    id1    = c(1, 1, 1, 2, 2),
    id2   = 1:5,
    year  = c(1991, 1993, NA, 1992, 1992),
    value = c(4.1, 4.5, 3.3, 3.2, 5.2)
)
df %>% group_by(id1) %>% is.panel(year)
df1 <- df %>% filter(!is.na(year))
df1 %>% is.panel(year)
df1 %>% group_by(id1) %>% is.panel(year)
df1 %>% group_by(id1, id2) %>% is.panel(year)

fill_gap transforms a unbalanced panel into a balanced panel. It corresponds to the stata command tsfill. Missing observations are added as rows with missing values.

df <- tibble(
    id    = c(1, 1, 1, 2),
    datem  = as.monthly(mdy(c("04/03/1992", "01/04/1992", "03/15/1992", "05/11/1992"))),
    value = c(4.1, 4.5, 3.3, 3.2)
)
df %>% group_by(id) %>% fill_gap(datem)
df %>% group_by(id) %>% fill_gap(datem, full = TRUE)
df %>% group_by(id) %>% fill_gap(datem, roll = "nearest")

Graph Functions

stat_binmean() (a stat for ggplot2) returns the mean of y and x within 20 bins of x. It's a barebone version of the Stata command binscatter

ggplot(iris, aes(x = Sepal.Width , y = Sepal.Length)) + stat_binmean()
# change number of bins
ggplot(iris, aes(x = Sepal.Width , y = Sepal.Length, color = Species)) + stat_binmean(n = 10) 
# add regression line
ggplot(iris, aes(x = Sepal.Width , y = Sepal.Length, color = Species)) + stat_binmean() + stat_smooth(method = "lm", se = FALSE)

Installation

You can install

The latest released version from CRAN with

R install.packages("statar") - The current version from github with

R devtools::install_github("matthieugomez/statar")

Any scripts or data that you put into this service are public.

statar documentation built on Aug. 8, 2025, 6:40 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

statar
Tools Inspired by 'Stata' to Manipulate Tabular Data

README.md
In statar: Tools Inspired by 'Stata' to Manipulate Tabular Data

statar

Data Frame Functions

sum_up = summarize

tab = tabulate

join = merge

Vector Functions

Panel Data Functions

Elapsed dates

lag / lead

is.panel

fill_gap

Graph Functions

stat_binmean

Installation

Try the statar package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

statar Tools Inspired by 'Stata' to Manipulate Tabular Data

README.md In statar: Tools Inspired by 'Stata' to Manipulate Tabular Data

statar

Data Frame Functions

sum_up = summarize

tab = tabulate

join = merge

Vector Functions

Panel Data Functions

Elapsed dates

lag / lead

is.panel

fill_gap

Graph Functions

stat_binmean

Installation

Try the statar package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

statar
Tools Inspired by 'Stata' to Manipulate Tabular Data

README.md
In statar: Tools Inspired by 'Stata' to Manipulate Tabular Data