README.md

nplyr

Author: Mark Rieke License: MIT

R-CMD-check Lifecycle:
experimental CRAN
status

Overview

{nplyr} is a grammar of nested data manipulation that allows users to perform dplyr-like manipulations on data frames nested within a list-col of another data frame. Most dplyr verbs have nested equivalents in nplyr. A (non-exhaustive) list of examples:

As of version 0.2.0, nplyr also supports nested versions of some tidyr functions:

nplyr is largely a wrapper for dplyr. For the most up-to-date information on dplyr please visit dplyr’s website. If you are new to dplyr, the best place to start is the data transformation chapter in R for data science.

Installation

You can install the released version of nplyr from CRAN or the development version from github with the devtools or remotes package:

# install from CRAN
install.packages("nplyr")

# install from github
devtools::install_github("markjrieke/nplyr")

Usage

To get started, we’ll create a nested column for the country data within each continent from the gapminder dataset.

library(nplyr)

gm_nest <- 
  gapminder::gapminder_unfiltered %>%
  tidyr::nest(country_data = -continent)

gm_nest
#> # A tibble: 6 × 2
#>   continent country_data        
#>   <fct>     <list>              
#> 1 Asia      <tibble [578 × 5]>  
#> 2 Europe    <tibble [1,302 × 5]>
#> 3 Africa    <tibble [637 × 5]>  
#> 4 Americas  <tibble [470 × 5]>  
#> 5 FSU       <tibble [139 × 5]>  
#> 6 Oceania   <tibble [187 × 5]>

dplyr can perform operations on the top-level data frame, but with nplyr, we can perform operations on the nested data frames:

gm_nest_example <- 
  gm_nest %>%
  nest_filter(country_data, year == max(year)) %>%
  nest_mutate(country_data, pop_millions = pop/1000000)

# each nested tibble is now filtered to the most recent year
gm_nest_example
#> # A tibble: 6 × 2
#>   continent country_data     
#>   <fct>     <list>           
#> 1 Asia      <tibble [43 × 6]>
#> 2 Europe    <tibble [34 × 6]>
#> 3 Africa    <tibble [53 × 6]>
#> 4 Americas  <tibble [33 × 6]>
#> 5 FSU       <tibble [9 × 6]> 
#> 6 Oceania   <tibble [11 × 6]>

# if we unnest, we can see that a new column for pop_millions has been added
gm_nest_example %>%
  slice_head(n = 1) %>%
  tidyr::unnest(country_data)
#> # A tibble: 43 × 7
#>    continent country           year lifeExp        pop gdpPercap pop_millions
#>    <fct>     <fct>            <int>   <dbl>      <int>     <dbl>        <dbl>
#>  1 Asia      Afghanistan       2007    43.8   31889923      975.       31.9  
#>  2 Asia      Azerbaijan        2007    67.5    8017309     7709.        8.02 
#>  3 Asia      Bahrain           2007    75.6     708573    29796.        0.709
#>  4 Asia      Bangladesh        2007    64.1  150448339     1391.      150.   
#>  5 Asia      Bhutan            2007    65.6    2327849     4745.        2.33 
#>  6 Asia      Brunei            2007    77.1     386511    48015.        0.387
#>  7 Asia      Cambodia          2007    59.7   14131858     1714.       14.1  
#>  8 Asia      China             2007    73.0 1318683096     4959.     1319.   
#>  9 Asia      Hong Kong, China  2007    82.2    6980412    39725.        6.98 
#> 10 Asia      India             2007    64.7 1110396331     2452.     1110.   
#> # … with 33 more rows

nplyr also supports grouped operations with nest_group_by():

gm_nest_example <- 
  gm_nest %>%
  nest_group_by(country_data, year) %>%
  nest_summarise(
    country_data, 
    n = n(),
    lifeExp = median(lifeExp),
    pop = median(pop),
    gdpPercap = median(gdpPercap)
  )

gm_nest_example
#> # A tibble: 6 × 2
#>   continent country_data     
#>   <fct>     <list>           
#> 1 Asia      <tibble [58 × 5]>
#> 2 Europe    <tibble [58 × 5]>
#> 3 Africa    <tibble [13 × 5]>
#> 4 Americas  <tibble [57 × 5]>
#> 5 FSU       <tibble [44 × 5]>
#> 6 Oceania   <tibble [56 × 5]>

# unnesting shows summarised tibbles for each continent
gm_nest_example %>%
  slice(2) %>%
  tidyr::unnest(country_data)
#> # A tibble: 58 × 6
#>    continent  year     n lifeExp      pop gdpPercap
#>    <fct>     <int> <int>   <dbl>    <dbl>     <dbl>
#>  1 Europe     1950    22    65.8 7408264      6343.
#>  2 Europe     1951    18    65.7 7165515      6509.
#>  3 Europe     1952    31    65.9 7124673      5210.
#>  4 Europe     1953    17    67.3 7346100      6774.
#>  5 Europe     1954    17    68.0 7423300      7046.
#>  6 Europe     1955    17    68.5 7499400      7817.
#>  7 Europe     1956    17    68.5 7575800      8224.
#>  8 Europe     1957    31    67.5 7363802      6093.
#>  9 Europe     1958    18    69.6 8308052.     8833.
#> 10 Europe     1959    18    69.6 8379664.     9088.
#> # … with 48 more rows

More examples can be found in the package vignettes and function documentation.

Bug reports/feature requests

If you notice a bug, want to request a new feature, or have recommendations on improving documentation, please open an issue in the package repository.



Try the nplyr package in your browser

Any scripts or data that you put into this service are public.

nplyr documentation built on Feb. 16, 2023, 7:24 p.m.