knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

When I design longer_dt and wider_dt, I could find the pivot_longer and pivot_wider in tidyr and melt and dcast in data.table. Still, designing this API is not easy, as my goal is to let users use it with least pain. Here we would try to reproduce the results in the vignette of tidyr(https://tidyr.tidyverse.org/articles/pivot.html). First load the packages:

library(tidyfst)
library(tidyr)

Longer

First inspect the data:

relig_income

In tidyr, to get the longer format you need:

relig_income %>% 
  pivot_longer(-religion, names_to = "income", values_to = "count")

In tidyfst, we have:

relig_income %>% 
  longer_dt("religion",name = "income",value = "count")

Another example from tidyr:

billboard

# tidyr way:
 billboard %>%
   pivot_longer(
     cols = starts_with("wk"),
     names_to = "week",
     values_to = "rank",
     values_drop_na = TRUE
   )

# tidyfst way:
billboard %>% 
  longer_dt(-"wk",
            name = "week",
            value = "rank",
            na.rm = TRUE
            )
# regex could select the groups to keep, and minus could select the reverse

A warning would could come out because the merging column has different data types and do the coercion automatically.

Wider

## data
fish_encounters

## tidyr way:
fish_encounters %>% 
  pivot_wider(names_from = station, values_from = seen)

## tidyfst way:
fish_encounters %>% 
  wider_dt(name = "station",value = "seen")

# if no keeped groups are selected, use all except for name and value columns

If you want to fill with 0s, use:

fish_encounters %>% 
  wider_dt(name = "station",value = "seen",fill = 0)

Note that the parameter of name and value should always be provided and should be explicit called (with the parameter names attached).

More complicated example

This example comes from data.table (https://rdatatable.gitlab.io/data.table/articles/datatable-reshape.html), and has been used in tidyr too. We'll try to do it in tidyfst in this example. If we have a data.frame as below:

family <- fread("family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1         30 1998-11-26 2000-01-29         NA             1             2            NA
2         27 1996-06-22         NA         NA             2            NA            NA
3         26 2002-07-11 2004-04-05 2007-09-02             2             2             1
4         32 2004-10-10 2009-08-27 2012-07-21             1             1             1
5         29 2000-12-05 2005-02-28         NA             2             1            NA")

family

And want to reshape the data.table to be like this:

#     family_id age_mother  child        dob gender
#         <int>      <int> <char>     <char> <char>
#  1:         1         30 child1 1998-11-26      1
#  2:         1         30 child2 2000-01-29      2
#  3:         1         30 child3       <NA>   <NA>
#  4:         2         27 child1 1996-06-22      2
#  5:         2         27 child2       <NA>   <NA>
#  6:         2         27 child3       <NA>   <NA>
#  7:         3         26 child1 2002-07-11      2
#  8:         3         26 child2 2004-04-05      2
#  9:         3         26 child3 2007-09-02      1
# 10:         4         32 child1 2004-10-10      1
# 11:         4         32 child2 2009-08-27      1
# 12:         4         32 child3 2012-07-21      1
# 13:         5         29 child1 2000-12-05      2
# 14:         5         29 child2 2005-02-28      1
# 15:         5         29 child3       <NA>   <NA>

The data.table::dcast and tidyr::pivot_longer could transfer it in one step, however, not so easy to understand. Here we'll do it step by step to see what actually happens in this transfer.

family %>% 
  longer_dt(1:2) %>% 
  separate_dt("name",into = c("class","child")) %>% 
  wider_dt(-"class|value",
           name = "class",
           value = "value")

In such a process, we could find that we actually get a longer table, then separate it, and wider it later. tidyfst is not going to support the complicated transfer in one step, because it might be easier to implement, but much harder to understand 3 procedures in 1 step. If you still prefer that way, use data.table::dcast and tidyr::pivot_longer instead.



hope-data-science/tidyfst documentation built on Sept. 23, 2024, 8:05 p.m.