clean_ID_df: In a dataframe, converts messy names and ID's to tidy clean...

clean_ID_dfR Documentation

In a dataframe, converts messy names and ID's to tidy clean ones.

Description

For sorting out column with long and complicated identifiers or row names, where the true ID of a row is hidden in a string.
E.g: Make "dirty" ID's like "A0006_3911_BT-F1_GTCGTCTA_run20190930N" turn into "clean" ID's like 3991_BT

Usage

clean_ID_df(
  df,
  column_name,
  identifier = "",
  identifier_left = F,
  numLength = F,
  prefix,
  na_remove = T,
  keep_name = F,
  numeric = F
)

Arguments

df

The data frame

identifier

ID's need to be formated with a number and following identifier, e.g "34_individuals2019" where "_individuals2019" is the identifier. Any entries not matching this format will be removed.

identifier_left

Wether the identifier is on the left hand (T) or right-hand (R) side of the number

numLength

if you want leading zeroes, use this parameter to specify the length of the number, e.g "8" for 00000342

prefix

if you want a prefix in the new cleaned ID. Ex: "individuals2019_" will give you "individuals2019_0034"

na_remove

if you want to remove any rows that don't follow your pattern (otherwise, they'll turn to NA). Default is True.

column

The name of a column containing dirty IDs


Eiriksen/Genotools documentation built on Oct. 1, 2022, 1:40 a.m.