time_match: find matches for specific units on variables in a given time...

View source: R/time_match.r

time_matchR Documentation

find matches for specific units on variables in a given time period

Description

in a dataset with one or more variables (typically containing text) associated with a date, find matches on those variables for specific individuals within specifed time frames

Usage

time_match(
  pattern,
  x = NULL,
  data,
  id = "id",
  date = "date",
  units = NULL,
  units.id = id,
  begin = "begin",
  end = "end",
  ...,
  long = TRUE,
  stack = TRUE,
  verbose = TRUE
)

Arguments

pattern

a vector of search strings (regular expressions) (the names attribute will be used if it exists)

x

names of variables to search in (given in order of importance), if missing all variables except id and date are chosen

data

a data frame

id

name of id variable (in 'data')

date

name of associated date variable (in 'data')

units

a vector of id's, or a data frame containing id's as well as (but optionally) 'begin' and 'end' variables

units.id

variable name in 'units' to use as id (by default the same is 'id')

begin

variable name in 'units' to use as begin, if missing will be set to earliest date in data

end

variable name in 'units' to use as end, if missing will be set to latest date in data

...

arguments passed to grepl

long

if TRUE all matches will get a row, else first match gets details and information on all other matches is condensed. N.B long = FALSE will be slow for large datasets!

stack

if TRUE results are stacked. Not stacking is only possible when long = FALSE.

verbose

if TRUE the function will give helpful and/or annoying messages

Value

The basic 'long' output is a data frame with

  • id the id variable

  • begin the begin date (could be individual)

  • end the end date (could be individual)

  • date the date of assicated match

  • event indicator for a match

  • time days from 'begin' to 'date'

  • match the match found

  • match.in the variable the match was found in

  • pattern the pattern searched for

  • alias the name of pattern searched for (else p1, p2, etc)

  • first.id indicator for first occurence of associated id

  • first.id_date indicator for first occurence of associated id and date

  • ... all variables in data that are not id, date or search variables. These will be renamed if they are in conflict with output names. These will only be included in output when long = TRUE.

Note that any individual can have more than one match.

The basic output when long = FALSE is one line per individual where the first match (by date and order of search variables, i.e. filtered on first.id == 1) is specified with some detail, and information on all subsequent matches is condensed.

  • id, begin, end, date, event, time, match, match.in, pattern, alias as before, but only relevant for the first match.

  • events the total number of matches found

  • matches all (unique) matches found, separated by a space

  • matches.info all (unique) match/mathing-variable/date-combinations separated by a space

If pattern is a vector, the results can be stacked or not. If they are not stacked, the format must necessarily be one line per individual (i.e. long = FALSE). If the output is unstacked, all pattern specific output (i.e. all but id, begin and end) variables will get a suffix, either the names of the pattern vector, or a created one.

Author(s)

Henrik Renlund


renlund/ucR documentation built on March 25, 2023, 10:10 a.m.