row_sums: Row sums and means for data frames

View source: R/row_sums.R

row_sumsR Documentation

Row sums and means for data frames

Description

row_sums() and row_means() compute row sums or means for at least n valid values per row. The functions are designed to work nicely within a pipe-workflow and allow select-helpers for selecting variables.

Usage

row_sums(x, ...)

## Default S3 method:
row_sums(x, ..., n, var = "rowsums", append = TRUE)

## S3 method for class 'mids'
row_sums(x, ..., var = "rowsums", append = TRUE)

row_means(x, ...)

total_mean(x, ...)

## Default S3 method:
row_means(x, ..., n, var = "rowmeans", append = TRUE)

## S3 method for class 'mids'
row_means(x, ..., var = "rowmeans", append = TRUE)

Arguments

x

A vector or data frame.

...

Optional, unquoted names of variables that should be selected for further processing. Required, if x is a data frame (and no vector) and only selected variables from x should be processed. You may also use functions like : or tidyselect's select-helpers. See 'Examples' or package-vignette.

n

May either be

  • a numeric value that indicates the amount of valid values per row to calculate the row mean or sum;

  • a value between 0 and 1, indicating a proportion of valid values per row to calculate the row mean or sum (see 'Details').

  • or Inf. If n = Inf, all values per row must be non-missing to compute row mean or sum.

If a row's sum of valid (i.e. non-NA) values is less than n, NA will be returned as value for the row mean or sum.

var

Name of new the variable with the row sums or means.

append

Logical, if TRUE (the default) and x is a data frame, x including the new variables as additional columns is returned; if FALSE, only the new variables are returned.

Details

For n, must be a numeric value from 0 to ncol(x). If a row in x has at least n non-missing values, the row mean or sum is returned. If n is a non-integer value from 0 to 1, n is considered to indicate the proportion of necessary non-missing values per row. E.g., if n = .75, a row must have at least ncol(x) * n non-missing values for the row mean or sum to be calculated. See 'Examples'.

Value

For row_sums(), a data frame with a new variable: the row sums from x; for row_means(), a data frame with a new variable: the row means from x. If append = FALSE, only the new variable with row sums resp. row means is returned. total_mean() returns the mean of all values from all specified columns in a data frame.

Examples

data(efc)
efc %>% row_sums(c82cop1:c90cop9, n = 3, append = FALSE)

library(dplyr)
row_sums(efc, contains("cop"), n = 2, append = FALSE)

dat <- data.frame(
  c1 = c(1,2,NA,4),
  c2 = c(NA,2,NA,5),
  c3 = c(NA,4,NA,NA),
  c4 = c(2,3,7,8),
  c5 = c(1,7,5,3)
)
dat

row_means(dat, n = 4)
row_sums(dat, n = 4)

row_means(dat, c1:c4, n = 4)
# at least 40% non-missing
row_means(dat, c1:c4, n = .4)
row_sums(dat, c1:c4, n = .4)

# total mean of all values in the data frame
total_mean(dat)

# create sum-score of COPE-Index, and append to data
efc %>%
  select(c82cop1:c90cop9) %>%
  row_sums(n = 1)

# if data frame has only one column, this column is returned
row_sums(dat[, 1, drop = FALSE], n = 0)


strengejacke/sjmisc documentation built on June 29, 2023, 4:28 p.m.