README.md

nameage

Description

This packages uses the U.S. Social Security Administration's baby names dataset and actuarial tables to estimate the age of an American based on their first name. It uses datasets conveniently collected in the babynames package and follows the same general format as the gender package.

The most famous example of using names to estimate age probably comes from FiveThirtyEight.

Installation

To install from Github, use the following commands.

# install.packages("devtools")
devtools::install_github("andland/nameage")

Using the package

The main function is nameage() which takes a vector of names as the first argument. There are two additional arguments:

The function returns a data frame with a row for each name it can find. It includes a summary of the age distribution, including the mean, standard deviation, first quartile, median, and third quartile. In addition, it includes the number of people born with the names, as well as an estimate of the number of people still alive at the reference year.

To start off, we will get the age of some names as of 2015. The names argument is not case sensitive.

library(nameage)
names = c("Ava", "liam", "Jack", "ELLA", "gertrude", "elmer", "Violet")

nameage(names, base_year = 2015)
#> # A tibble: 7 x 8
#>       name      n   n_alive      mean        sd    q1 median    q3
#>      <chr>  <int>     <dbl>     <dbl>     <dbl> <dbl>  <dbl> <dbl>
#> 1      Ava 218673 211933.04  8.565290 11.895632     3      6     9
#> 2     ELLA 281604 170996.91 23.702506 28.987314     4      9    47
#> 3    elmer 129647  34744.58 58.583760 24.460978    45     65    77
#> 4 gertrude 177359  28260.08 76.450368 14.851850    68     79    87
#> 5     Jack 670805 393056.09 38.155401 28.934937    10     35    65
#> 6     liam 155843 154631.51  6.160169  6.783567     2      4     9
#> 7   Violet 127793  59330.14 34.583680 35.197413     3     11    72

The average age of people with a given name changes depending on the effective year. People named Violet were in general much older in 1990 than they are today.

nameage(names, base_year = 1990)
#> # A tibble: 7 x 8
#>       name      n    n_alive     mean        sd    q1 median    q3
#>      <chr>  <int>      <dbl>    <dbl>     <dbl> <dbl>  <dbl> <dbl>
#> 1      Ava  16333  13598.669 36.21455 20.246055    23     35    48
#> 2     ELLA 156965  92272.773 55.17945 18.151856    43     58    69
#> 3    elmer 124568  71598.627 55.00142 17.462330    45     60    68
#> 4 gertrude 177020  93380.556 65.12151 13.443510    59     68    74
#> 5     Jack 489076 362527.842 46.42406 18.163925    34     49    61
#> 6     liam   3108   3038.564 10.64685  9.108123     3      8    16
#> 7   Violet  94798  64561.901 59.03853 16.903045    52     63    70

Looking at just working adults.

nameage(names, base_year = 2015, age_range = c(18, 65))
#> # A tibble: 7 x 8
#>       name      n    n_alive     mean        sd    q1 median    q3
#>      <chr>  <int>      <dbl>    <dbl>     <dbl> <dbl>  <dbl> <dbl>
#> 1      Ava  11473  10722.867 44.43247 15.521935    29     49    59
#> 2     ELLA  22364  20542.477 49.71888 14.131531    42     55    61
#> 3    elmer  16250  14255.575 47.78234 14.019756    38     52    60
#> 4 gertrude   5834   5244.547 56.11059  9.220711    53     59    63
#> 5     Jack 156823 138622.649 45.86359 15.170339    33     50    59
#> 6     liam   9808   9534.760 24.48821  8.921051    19     20    27
#> 7   Violet  10251   9515.762 46.84895 14.539712    34     51    59

The package also includes a function to plot the distribution of the ages for each name. In addition to a few arguments to control the plotting, there is an additional parameter type which tells whether to plot by age...

plot_nameage(c("Joseph", "Anna"), type = "age")

or by year.

plot_nameage(c("Joseph", "Anna"), type = "year")

To do



andland/nameage documentation built on May 7, 2019, 8:57 p.m.