epi_clean_add_rep_num: Add a replicate number to rows with repeated IDs

View source: R/epi_clean_add_rep_num.R

epi_clean_add_rep_numR Documentation

Add a replicate number to rows with repeated IDs

Description

Add a column with count of duplicate/replicate (ie for repeated screening, replicate counts, etc.). Assumes the dataframe passed is sorted logically with repeating IDs next to each other and if date is used as a second ordering criteria for example, then earlier dates are first. Useful for data with repeated measurements but without a column/variable clearly identifying them as such.

Usage

epi_clean_add_rep_num(df = NULL, var_id = NULL, var_to_rep = "")

Arguments

df

a dataframe object as input

var_id

Column to use as ID, read as dataframe vector, can be index or string. This will be used to test if row is duplicate, if it is it will add a replicate number.

var_to_rep

Column variable that can distinguish replicates (eg date, 'baseline' vs 'treated', etc.)

Value

Returns a dataframe with one column which can be merged with existing dataframe.

Note

Facilitates spreading a dataframe and extracting baseline vs repeated measurement rows for example

Author(s)

Antonio Berlanga-Taylor <\url{https://github.com/AntonioJBT/episcout}>

Examples


## Not run: 
n <- 20
df <- data.frame(
var_id = rep(1:(n / 2), each = 2),
var_to_rep = rep(c('Pre', 'Post'), n / 2),
x = rnorm(n),
  y = rbinom(n, 1, 0.50),
  z = rpois(n, 2)
)
var_id <- 'var_id'
var_to_rep <- 'var_to_rep'
reps <- epi_clean_add_rep_num(df, 'var_id', 'var_to_rep')
reps
# Sanity check:
identical(as.character(reps[[var_id]]),
          as.character(df[[var_id]])) # should be TRUE
# Bind:
df2 <- as.tibble(cbind(df, 'rep_num' = reps$rep_num))
# merge() adds all rows from both data frames as there are duplicates
# so use cbind after making sure order is exact
epi_head_and_tail(df2, rows = 3)
epi_head_and_tail(df2, rows = 3, last_cols = TRUE)
df2

## End(Not run)


AntonioJBT/episcout documentation built on Dec. 1, 2024, 4:07 a.m.