clean_ID_df: In a dataframe, converts messy names and ID's to tidy clean...
In Eiriksen/Genotools: Genotools

clean_ID_df

R Documentation

In a dataframe, converts messy names and ID's to tidy clean ones.

Description

For sorting out column with long and complicated identifiers or row names, where the true ID of a row is hidden in a string.
E.g: Make "dirty" ID's like "A0006_3911_BT-F1_GTCGTCTA_run20190930N" turn into "clean" ID's like 3991_BT

Usage

clean_ID_df(
  df,
  column_name,
  identifier = "",
  identifier_left = F,
  numLength = F,
  prefix,
  na_remove = T,
  keep_name = F,
  numeric = F
)

Arguments

`df`	The data frame
`identifier`	ID's need to be formated with a number and following identifier, e.g "34_individuals2019" where "_individuals2019" is the identifier. Any entries not matching this format will be removed.
`identifier_left`	Wether the identifier is on the left hand (T) or right-hand (R) side of the number
`numLength`	if you want leading zeroes, use this parameter to specify the length of the number, e.g "8" for 00000342
`prefix`	if you want a prefix in the new cleaned ID. Ex: "individuals2019_" will give you "individuals2019_0034"
`na_remove`	if you want to remove any rows that don't follow your pattern (otherwise, they'll turn to NA). Default is True.
`column`	The name of a column containing dirty IDs