A function to clean karyotype data, by deleting known comments that do not adhere to the ISCN standard.
preclean(x, targetColumns, dirt)
A data frame containing at least one column of karyotype data.
Either a numeric vector of column indices or a character vector of column names.
A character vector containing items to delete from all karyotypes,
The core input data worked on by the
RCytoGPS are karyotypes,
which are text strings written to conform to the ISCN standard. At
many institutions, the cytogeneticists have developed idiosyncratic
conventions that they use to add comments into the string. In most
cases, these karyotypes are simply stored as text strings in a local
database. In particular, they are not checked for synatx or grammar
errors. By contrast, the implementation of the CytoGPS algorithm at
the web site http://cytogps.org uses a formal approach with
lexer and parser. As a result, many karyotypes are rejected by the
preclean function uses
gsub to delete a list
of known (local) comments from all karyotypes, making it more likley
that they will be successfully processed by the lexer and parser. We
provide an example list derived from experience at our own
Implementation Note: The
preclean function removes
strings in the order that they are contained in the
vector. So, you have to be carefully not to delete parts of a long
phrase before trying to delete the whole phrase. For example, you
should not remove "clonal" before removing "nonclonal".
A data frame of te same size and with the same number of columns as the input data frame.
Kevin R. Coombes email@example.com
Abrams ZB, Zhang L, Abruzzo LV, Heerema NA, Li S, Dillon T, Rodriguez R, Coombes KR, Payne PRO. CytoGPS: a web-enabled karyotype analysis tool for cytogenetics. Bioinformatics. 2019 Dec 15;35(24):5365-5366.
Abrams ZB, Tally DG, Zhang L, Coombes CE, Payne PRO, Abruzzo LV, Coombes KR. Pattern recognition in lymphoid malignancies using CytoGPS and Mercator. In Press.
Abrams ZB, Tally DG, Coombes KR. RCytoGPS: An R Package for Analyzing and Visualizing Cytogenetic Data. In preparation.
1 2 3 4 5
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.