find_starts: Find start positions of groups in data
In LudvigOlsen/R-splitters: Creating Groups from Data

View source: R/find_starts.R

find_starts

R Documentation

Find start positions of groups in data

Description

\Sexpr[results=rd, stage=render]{lifecycle::badge("maturing")}

Finds values or indices of values that are not the same as the previous value.

E.g. to use with the "l_starts" method.

Wraps differs_from_previous().

Usage

find_starts(
  data,
  col = NULL,
  return_index = FALSE,
  handle_na = "ignore",
  factor_conversion_warning = TRUE
)

Arguments

`data`	`data.frame` or `vector`. N.B. If checking a `factor`, it is converted to a `character vector`. Conversion will generate a warning, which can be turned off by setting `factor_conversion_warning` to `FALSE`. N.B. If `data` is a grouped `data.frame`, the function is applied group-wise and the output is a `list` of `vector`s. The names are based on the group indices (see `dplyr::group_indices()`).
`col`	Name of column to find starts in. Used when `data` is a `data.frame`. (Character)
`return_index`	Whether to return indices of starts. (Logical)
`handle_na`	How to handle `NA`s in the column. "ignore" Removes the `NA`s before finding the differing values, ensuring that the first value after an `NA` will be correctly identified as new, if it differs from the value before the `NA`(s). "as_element" Treats all `NA`s as the string `"NA"`. This means, that `threshold` must be `NULL` when using this method. Numeric scalar A numeric value to replace `NA`s with.
`factor_conversion_warning`	Throw warning when converting `factor` to `character`. (Logical)

Value

vector with either the start values or the indices of the start values.

N.B. If `data` is a grouped data.frame, the output is a list of vectors. The names are based on the group indices (see dplyr::group_indices()).

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Examples

# Attach packages
library(groupdata2)

# Create a data frame
df <- data.frame(
  "a" = c("a", "a", "b", "b", "c", "c"),
  stringsAsFactors = FALSE
)

# Get start values for new groups in column 'a'
find_starts(df, col = "a")

# Get indices of start values for new groups
# in column 'a'
find_starts(df,
  col = "a",
  return_index = TRUE
)

## Use found starts with l_starts method
# Notice: This is equivalent to n = 'auto'
# with l_starts method

# Get start values for new groups in column 'a'
starts <- find_starts(df, col = "a")

# Use starts in group() with 'l_starts' method
group(df,
  n = starts, method = "l_starts",
  starts_col = "a"
)

# Similar but with indices instead of values

# Get indices of start values for new groups
# in column 'a'
starts_ind <- find_starts(df,
  col = "a",
  return_index = TRUE
)

# Use starts in group() with 'l_starts' method
group(df,
  n = starts_ind, method = "l_starts",
  starts_col = "index"
)

LudvigOlsen/R-splitters documentation built on Dec. 21, 2024, 1:19 a.m.