Modern biological experiments are increasingly producing interesting binary matrices. These may represent the presence or absence of specific gene mutations, copy number variants, microRNAs, or other molecular or clinical phenomena. We recently developed a tool, CytoGPS [^Abrams and colleagues], that converts conventional karyotypes from the standard text-based notation (the International Standard for Human Cytogenetic/Cytogenomic Nomenclature; ISCN) into a binary vector with three bits (loss, gain, or fusion) per cytoband, which we call the "LGF model".
The CytoGPS tool is available at the web site http://cytogps.org, where the LGF results of procvessing karyotype data are returned in JSON format. To complement the web site, we have developed RCytoGPS, an R package to extract, format, and visualize genetic data at the resol;ution of cytobands. RCytoGPS can parse any JSON file (or set of files) produced by CytoGPS.org.
In order to extract LGF data from JSON files, you must first load the package.
We have included a pair of JSON files produced at CytoGPS.org as examples in the package. These are found in the following directory:
wd <- system.file("Examples/JSONfiles", package = "RCytoGPS") dir(wd)
The two text files contain the inputs that were uploaded to the web site; the two JSON files contain the outputs. You can specify the files and the folder that you want to read. The simplest application is to omit the files variable and read all filed in teh specified folder (which defaults to the current working directory).
temp <- readLGF(folder = wd) rm(wd)
The return value is a list of five elements.
The source element documents which JSON file(s) were read.
The size element lists the number of rows returned from each file; each row represents a distinct clone.
The CL element is a data frame describing the chromosomal locations of each cytoband.
The raw element is itself a list, containing the binary LGF data for each JSON file processed. Each file produces a "Status" output along with the LGF data. The Status includes both the input karyotype (in ISCN format) and an indicator of whether CytoGPS could successfully process it. In this example, the first karyotype contained an error. As a result, the LGF component does not contain any rows derived from that karyotype. It does, however. contain three rows derived from the second karyotype, since the "forward slashes" separate the decriptions of three different clones that were detected in that sample.
names(temp$raw) R <- temp$raw[] names(R) R$Status dim(R$LGF) rownames(R$LGF) rm(R)
Finally, the frequency element contains summary data from each file read. These summaries consist of the frequencies of loss, gain, and fusion events. Each row of this data frame represents a cytoband. There are three columns from each JSON file, one each for loss, gain, and fusion
F <- temp$frequency class(F) dim(F) colnames(F)
In order to be able to work with the cytoband-level frequency data, we must combine it with the cytoband location data. Here we assemble them into a single data frame.
cytoData <- data.frame(temp[["CL"]], temp[["frequency"]])
Next, we transfrom the CytoData data frame into an S4 object using the function
bandData <- CytobandData(cytoData)
The first graphs (using barplot ]) summarizes the frequency data from one data column along the genome. This provides a broad overview of the changes, and can be used to visually contrast the locations of changes in different data sets. Here we use barplot twice, showing losses and gains from the first file.
opar <- par(mfrow=c(2,1)) barplot(bandData, what = "CytoGPS_Result1.Loss", col = "forestgreen") barplot(bandData, what = "CytoGPS_Result1.Gain", col = "orange") par(opar)
The next graph allows you to simultaneously compare multiple cytogenetic events one chromosome at a time.
datacolumns <- names(temp[["frequency"]]) datacolumns image(bandData, what = datacolumns[1:3], chr = 2, labels = TRUE)
By adding the parameter horix=TRUE, you can rotate this graph 90 degrees. For more details about the parameters of the image method, see the manual pages and the "gallery" vignette.
We can assemble all of the single-chromosome plots into a single "idiogram" graph that shows all chromosomes at once.
The purpose of this graph is to visualize the chromosomes as well as a barplot of the cytogenetic abnormalities in orderto observe and possibly identify patterns.
image(bandData, what = datacolumns, chr = "all", pal = "orange")
This graph allows the user to compare and contrast two or more cytogenetic events simultaneously. Here we show loss (orange), gain (green), and fusion (purple) events from the Type 1 samples.
image(bandData, what = datacolumns[1:3], chr = "all", pal=c("orange", "forestgreen", "purple"), horiz=TRUE)
To see all possible visuals please go to our gallery for images.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.