Introduction to 'dataMojo
In dataMojo: Reshape Data Table

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Built on the top of 'data.table', 'dataMojo' is a grammar of data manipulation with 'data.table', providing a consistent a series of utility functions that help you solve the most common data manipulation challenges:

Calculate the row wise percentage
Calculate the survey type percentage table
Select columns
Split one column to multiple columns based on patterns
Filter cases based on their values
Fill missing values
Summarize and reduces multiple values down to a single summary
Reshape long to wide or wide to long

Calculate the row wise percentage

Calculate the row wise percentage of a frequency table

library(dataMojo)
library(data.table)
test_df <- data.frame(
      Group = c("A", "B", "C"),
      Female = c(2,3,5),
      Male = c(10,11, 13)
    )
print(test_df)
dataMojo::row_percent_convert(test_df, cols_rowsum = c("Female", "Male"))

Calculate the survey type percentage table for single question

library(dataMojo)
library(data.table)
   test_dt <- data.table::data.table(
      Question = c(rep("Good", 3), rep("OK", 3), rep("Bad", 3)),
      Gender = c(rep("F", 4), rep("M", 5))
    )
   print(test_dt)
   dataMojo::pivot_percent_at(test_dt, 
                                 question_col = "Question", aggregated_by_cols = "Gender")

Calcuate the survey type percentage table for multiple question

library(dataMojo)
library(data.table)
test_dt <- data.table(
      Question1 = c(rep("Good", 3), rep("OK", 3), rep("Bad", 3)),
      Question2 = c(rep("Good", 2), rep("OK", 2), rep("Bad", 5)),
      Gender = c(rep("F", 4), rep("M", 5))
    )
print(test_dt)
dataMojo::pivot_percent_at_multi(test_dt, 
                                    question_col = c("Question1","Question2") , aggregated_by_cols = "Gender")

Calculate the column wise percentage with desired numerator and denominator

This function is to calculate column-wise percentage in a new column with desired numerator columns and denominator columns. If denominator is 0, the percentage will be N/A.

library(dataMojo)
test_df <- data.frame(
      hc1 = c(2, 0, 1, 5, 6, 7, 10),
      hc2 = c(1, 0, 10, 12, 4, 1, 9 ),
      total = c(10, 2, 0, 39, 23, 27, 30)
    )
print(test_df)
dataMojo::col_cal_percent(test_df, 
                          new_col_name = "hc_percentage", 
                          numerator_cols = c("hc1", "hc2"), 
                          denominator_cols = "total"
                          )

Select columns

Select variables in a data table. You can also use predicate functions like is.numeric to select variables based on their properties (e.g. 1:3 selects the first column to the third column).

library(dataMojo)
library(data.table)
data("dt_dates")
dt_dates <- setDT(dt_dates)
dataMojo::select_cols(dt_dates, c("Start_Date", "Full_name"))

Split a column

Split a column with its special pattern, and assign to multiple columns respectively. For example, split full name column to first name and last name column.

data("dt_dates")
library(data.table)
data("dt_dates")
dataMojo::str_split_col(dt_dates,
              by_col = "Full_name",
              by_pattern = ", ",
              match_to_names = c("First Name", "Last Name"))

Filter cases based on values

filter_all() is to return a data table with ALL columns (greater than/ less than/ equal to) a desired value.

data("dt_values")
dataMojo::filter_all(dt_values, operator = "l", .2)

filter_any() is to return a data table with ANY columns (greater than/ less than/ equal to) a desired value.

data("dt_values")
dataMojo::filter_any(dt_values, operator = "l", .1)

Similarly, filter_all_at() is to return a data table with ALL selected columns (greater than/ less than/ equal to) a desired value.

data("dt_values")
dataMojo::filter_all_at(dt_values, operator = "l", .1, c("A1", "A2"))

Similarly, filter_any_at() is to return a data table with ANY selected columns (greater than/ less than/ equal to) a desired value.

data("dt_values")
dataMojo::filter_any_at(dt_values, operator = "l", .1, c("A1", "A2"))

Fill missing values

fill_NA_with() will fill NA value with a desired value in the selected columns. If fill_cols is All (same columns type), it will apply to the whole data table.

data("dt_missing")
dataMojo::fill_NA_with(dt_missing, fill_cols = c("Full_name"), fill_value = "pending")

Group by and summarize

dt_group_by() is to group by desired columns and summarize rows within groups.

data("dt_groups")
print(head(dt_groups))

Now we see the dt_groups data table has A1, A2 as numeric columns, and group1, group2 as group infomation.

data("dt_groups")
dataMojo::dt_group_by(dt_groups, 
            group_by_cols = c("group1", "group2"), 
            summarize_at = "A1", 
            operation = "mean")

Now we want to group by group1 and group2, then fetch the first within each group, we can use get_row_group_by() function.

data("dt_groups")
dataMojo::get_row_group_by(dt_groups, 
                 group_by_cols = c("group1", "group2"), 
                 fetch_row = "first")

or last row with same example.

data("dt_groups")
dataMojo::get_row_group_by(dt_groups, 
                 group_by_cols = c("group1", "group2"), 
                 fetch_row = "last")

Reshape `long to wide` or `wide to long`

Here is an example of reshaping a data table from wide to long.

data("dt_dates")
print(head(dt_dates))
dataMojo::reshape_longer(dt_dates, 
               keep_cols = "Full_name", 
               label_cols = c("Date_Type"), 
               value_cols = "Exact_date")

Here is an example of reshaping a data table from long to wide.

data("dt_long")
print(head(dt_long))
dataMojo::reshape_wider(dt_long, 
              keep_cols = c("Full_name"), 
              col_label = c("Date_Type"), 
              col_value = "Exact_date")

Advanced Topic: expand row based on pattern

row_expand_pattern() is to expand rows based on a desired column.

data("starwars_simple")
starwars_simple[]
row_expand_pattern(starwars_simple, "films", ", ", "film")[]

Advanced Topic: expand row given start and end dates

row_expand_dates() is to expand rows to each date given start and end dates.

dt_dates_simple <- data.table(
  Start_Date = as.Date(c("2020-02-03", "2020-03-01") ),
  End_Date = as.Date(c("2020-02-05", "2020-03-02") ),
  group = c("A", "B")
)
dt_dates_simple[]
row_expand_dates(dt_dates_simple, "Start_Date", "End_Date", "Date")[]

Any scripts or data that you put into this service are public.

dataMojo documentation built on March 7, 2023, 8:10 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dataMojo
Reshape Data Table

Introduction to 'dataMojo
In dataMojo: Reshape Data Table

Calculate the row wise percentage

Calculate the survey type percentage table for single question

Calcuate the survey type percentage table for multiple question

Calculate the column wise percentage with desired numerator and denominator

Select columns

Split a column

Filter cases based on values

Fill missing values

Group by and summarize

Reshape `long to wide` or `wide to long`

Advanced Topic: expand row based on pattern

Advanced Topic: expand row given start and end dates

Try the dataMojo package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

dataMojo Reshape Data Table

Introduction to 'dataMojo In dataMojo: Reshape Data Table

Calculate the row wise percentage

Calculate the survey type percentage table for single question

Calcuate the survey type percentage table for multiple question

Calculate the column wise percentage with desired numerator and denominator

Select columns

Split a column

Filter cases based on values

Fill missing values

Group by and summarize

Reshape long to wide or wide to long

Advanced Topic: expand row based on pattern

Advanced Topic: expand row given start and end dates

Try the dataMojo package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

dataMojo
Reshape Data Table

Introduction to 'dataMojo
In dataMojo: Reshape Data Table

Reshape `long to wide` or `wide to long`