df_merge_rows: Merge rows in data.frame.

df_merge_rowsR Documentation

Merge rows in data.frame.

Description

Find blocks of rows with a matching key or rowname, and merge them using a given function.

Usage

df_merge_rows(
  data,
  key = NULL,
  names = NULL,
  new_name = NULL,
  func = purrr::partial(sum, na.rm = T),
  numeric = T,
  ...
)

Arguments

data

(data.frame or matrix) The data object.

key

(character scalar) The name of the key variable, which is the variable to merge rows by. If given ".rownames", then it will use the rownames.

names

(character vector) The rownames to merge.

new_name

(character scalar) The new rownames to use. Defaults to the first member of the names parameter.

func

(function) The function to use. Note that if you set numeric = FALSE, then the function must be able to handle non-numeric data.

numeric

(logical scalar) Whether to apply the function only to the numeric columns. ... Other parameters passed to func.

Details

In a variety of circumstances it is useful to merge several rows of data into a single row. For instance, if one dataset uses covers the same data but one uses a smaller unit than the other, then one may want to merge the smaller units so they correspond to the larger units. Alternative, if one has saved data for one unit under two different names by accident, one wants to merge these two (or more) rows without losing data.

Examples

#suppse you had a data.frame with data for multiple variables
#but accidentally, one observation was given two names, "C" and "D".
#and data has been dispersed among the rows
#we can move all the data into one row without data loss.
t = data.frame(X = c(1, 2, 3, NA), Y = c(1, 2, NA, 3));rownames(t) = LETTERS[1:4]
t
#here the real values for the C observation are both 3, but it has accidentally been called "D".
df_merge_rows(t, names = c("C", "D"), func = mean)
#suppose instead we have the names to match in a column, we can use the key parameter.
t = data.frame(large_unit = c("a", "a", "b", "b", "c"), value = 1:5)
t
df_merge_rows(t, "large_unit") #rows merged by sum by default
df_merge_rows(t, "large_unit", func = mean) #rows merged by mean

Deleetdk/kirkegaard documentation built on April 1, 2024, 2:23 a.m.