clean_ID: Converts messy names and ID's to tidy clean ones.

View source: R/script - base genotools.R

clean_IDR Documentation

Converts messy names and ID's to tidy clean ones.

Description

For sorting out a vector with long and complicated identifiers or row names, where the true ID of a row is hidden in a string.
E.g: Make "dirty" ID's like "A0006_3911_BT-F1_GTCGTCTA_run20190930N" turn into "clean" ID's like 3991_BT

Usage

clean_ID(
  vector,
  identifier = "",
  identifier_left = F,
  numLength = 4,
  prefix,
  na_remove = F,
  numeric = F
)

Arguments

vector

A vector of "dirty" IDs

identifier

ID's need to be formated with a number and following identifier, e.g "34_individuals2019" where "_individuals2019" is the identifier. Any entries not matching this format will be removed.

identifier_left

Wether the identifier is on the left hand (T) or right-hand (R) side of the number

numLength

if you want leading zeroes, use this parameter to specify the length of the number, e.g "8" for 00000342

prefix

if you want a prefix in the new cleaned ID. Ex: "individuals2019_" will give you "individuals2019_0034". If not specified, the old identifier will be used instead. Set to NA if you only want the number.

na_remove

if you want to remove any entries that don't follow your pattern (otherwise, they'll turn to NA)


Eiriksen/Genotools documentation built on Oct. 1, 2022, 1:40 a.m.