banks07: U.S. Commercial Banks Data

banks07R Documentation

U.S. Commercial Banks Data

Description

banks07 is a data frame containing selected variables for 500 U.S. commercial banks, randomly sampled from approximately 5000 banks, based on the dataset of Koetter et al. (2012) for year 2007. The dataset is provided solely for illustration and pedagogical purposes and is not suitable for empirical research.

Usage

data(banks07)

Format

A data frame with the following variables:

year

Year (2007).

id

Entity (bank) identifier.

TA

Gross total assets.

LLP

Loan loss provisions.

Y1

Total securities (thousands of USD).

Y2

Total loans and leases (thousands of USD).

W1

Cost of fixed assets divided by the cost of borrowed funds.

W2

Cost of labor (thousands of USD) divided by the cost of borrowed funds.

W3

Price of financial capital.

ER

Equity-to-assets ratio (gross).

TC

Total operating cost.

LA

Ratio of total loans and leases to gross total assets.

SDROA

Standard deviation of return on assets.

ZSCORE

Z-score risk measure.

ZSCORE3

Alternative Z-score risk measure.

lnsdroa

Natural logarithm of SDROA.

lnzscore

Natural logarithm of ZSCORE.

lnzscore3

Natural logarithm of ZSCORE3.

ms_county

Market share in county.

scope

Scope measure.

Details

U.S. Commercial Banks Data (2007)

The dataset was created by sampling and transforming variables as shown in the section Examples. It is intended to illustrate the usage of functions from this package (e.g. stochastic frontier models with skew-normal noise).

Source

http://qed.econ.queensu.ca/jae/2014-v29.2/restrepo-tobon-kumbhakar/

References

Koetter, M., Kolari, J., & Spierdijk, L. (2012). Enjoying the quiet life under deregulation? Evidence from adjusted Lerner indices for U.S. banks. Review of Economics and Statistics, 94(2), 462–480.

Restrepo-Tobon, D. & Kumbhakar, S. (2014). Enjoying the quiet life under deregulation? Not Quite. Journal of Applied Econometrics, 29(2), 333–343.

Examples



## ------------------------------------------------------------------
## Construct sample panel dataset (banks00_07)
## ------------------------------------------------------------------

# Download data from the link in "Source"
banks00_07 <- read.delim("2b_QLH.txt")

# rename 'entity' to 'id'
colnames(banks00_07)[colnames(banks00_07) == "entity"] <- "id"

# keep only years 2000–2007
banks00_07 <- banks00_07[
  banks00_07$year >= 2000 & banks00_07$year <= 2007, ]

# restrict sample to interquartile range of total assets
q1q3 <- quantile(banks00_07$TA, probs = c(.25, .75))
banks00_07 <- banks00_07[
  banks00_07$TA >= q1q3[1] & banks00_07$TA <= q1q3[2], ]

# generate required variables
banks00_07$TC <- banks00_07$TOC
banks00_07$ER <- banks00_07$Z  / banks00_07$TA   # Equity ratio
banks00_07$LA <- banks00_07$Y2 / banks00_07$TA   # Loans-to-assets ratio

# keep only needed variables
keep.vars <- c("id", "year", "Ti", "TC", "Y1", "Y2", "W1","W2",
               "ER", "LA", "TA", "LLP")
banks00_07 <- banks00_07[, colnames(banks00_07) %in% keep.vars]

# number of periods per id
t0 <- as.vector( by(banks00_07$id, banks00_07$id,
                    FUN = function(qq) length(qq)) )
banks00_07$Ti <- rep(t0, times = t0)

# keep if Ti > 4
banks00_07 <- banks00_07[banks00_07$Ti > 4, ]

# complete observations only
banks00_07 <- banks00_07[complete.cases(banks00_07), ]

# sample 500 banks at random
set.seed(816376586)
id_names <- unique(banks00_07$id)
ids2choose <- sample(id_names, 500)
banks00_07 <- banks00_07[banks00_07$id %in% ids2choose, ]

# recompute Ti
t0 <- as.vector( by(banks00_07$id, banks00_07$id,
                    FUN = function(qq) length(qq)) )
banks00_07$Ti <- rep(t0, times = t0)
banks00_07 <- banks00_07[banks00_07$Ti > 4, ]

# sort
banks00_07 <- banks00_07[order(banks00_07$id, banks00_07$year), ]


banks07 <- banks00_07[banks00_07$year == 2007, ]




snreg documentation built on Feb. 6, 2026, 5:08 p.m.