Lahman-package: Sean Lahman's Baseball Database

Lahman-packageR Documentation

Sean Lahman's Baseball Database

Description

This database contains pitching, hitting, and fielding statistics for Major League Baseball from 1871 through 2023. It includes data from the two current leagues (American and National), the four other "major" leagues (American Association, Union Association, Players League, and Federal League), and the National Association of 1871-1875.

This database was created by Sean Lahman, who pioneered the effort to make baseball statistics freely available to the general public. What started as a one man effort in 1994 has grown tremendously, and now a team of researchers have collected their efforts to make this the largest and most accurate source for baseball statistics available anywhere.

This database, in the form of an R package offers a variety of interesting challenges and opportunities for data processing and visualization in R.

In the current version, the examples make extensive use of the dplyr package for data manipulation (tabulation, queries, summaries, merging, etc.), reflecting the original relational database design and ggplot2 for graphics.

Details

Package: Lahman
Type: Package
Version: 12.0-0
Date: 2024-08-24
License: GPL version 2 or newer
LazyLoad: yes
LazyData: yes

The main form of this database is a relational database in Microsoft Access format. The design follows these general principles: Each player is assigned a unique code (playerID). All of the information in different tables relating to that player is tagged with his playerID. The playerIDs are linked to names and birthdates in the People table. Similar links exist among other tables via analogous *ID variables.

The database is composed of the following main tables:

People

Player names, dates of birth, death and other biographical info

Batting

batting statistics

Pitching

pitching statistics

Fielding

fielding statistics

A collection of other tables is also provided:

Teams:

Teams yearly stats and standings
TeamsHalf split season data for teams
TeamsFranchises franchise information

Post-season play:

BattingPost post-season batting statistics
PitchingPost post-season pitching statistics
FieldingPost post-season fielding data
SeriesPost post-season series information

Awards:

AwardsManagers awards won by managers
AwardsPlayers awards won by players
AwardsShareManagers award voting for manager awards
AwardsSharePlayers award voting for player awards

Hall of Fame: links to People via hofID

HallOfFame Hall of Fame voting data

Other tables:

AllstarFull - All-Star games appearances; Managers - managerial statistics; FieldingOF - outfield position data; ManagersHalf - split season data for managers; Salaries - player salary data; Appearances - data on player appearances; Schools - Information on schools players attended; CollegePlaying - Information on schools players attended, by player and year;

Variable label tables are provided for some of the tables:

battingLabels, pitchingLabels, fieldingLabels

Author(s)

Michael Friendly, Dennis Murphy, Chris Dalzell, Martin Monkman

Maintainer: Chris Dalzell <cdalzell@gmail.com>

Source

Lahman, S. (2024) Lahman's Baseball Database, 1871-2023, Main page, http://www.seanlahman.com/


Lahman documentation built on Sept. 27, 2024, 1:06 a.m.