clean_ID: Converts messy names and ID's to tidy clean ones.
In Eiriksen/Genotools: Genotools

View source: R/script - base genotools.R

clean_ID

R Documentation

Converts messy names and ID's to tidy clean ones.

Description

For sorting out a vector with long and complicated identifiers or row names, where the true ID of a row is hidden in a string.
E.g: Make "dirty" ID's like "A0006_3911_BT-F1_GTCGTCTA_run20190930N" turn into "clean" ID's like 3991_BT

Usage

clean_ID(
  vector,
  identifier = "",
  identifier_left = F,
  numLength = 4,
  prefix,
  na_remove = F,
  numeric = F
)

Arguments

`vector`	A vector of "dirty" IDs
`identifier`	ID's need to be formated with a number and following identifier, e.g "34_individuals2019" where "_individuals2019" is the identifier. Any entries not matching this format will be removed.
`identifier_left`	Wether the identifier is on the left hand (T) or right-hand (R) side of the number
`numLength`	if you want leading zeroes, use this parameter to specify the length of the number, e.g "8" for 00000342
`prefix`	if you want a prefix in the new cleaned ID. Ex: "individuals2019_" will give you "individuals2019_0034". If not specified, the old identifier will be used instead. Set to NA if you only want the number.
`na_remove`	if you want to remove any entries that don't follow your pattern (otherwise, they'll turn to NA)