RCytoGPS: Working With LGF-Models of Karyotype in R"

knitr::opts_chunk$set(fig.width=8, fig.height=5)
cat('
<style type="text/css">
.figure { text-align: center; }
.caption { font-weight: bold; }
</style>
')

Introduction

Modern biological experiments are increasingly producing interesting binary matrices. These may represent the presence or absence of specific gene mutations, copy number variants, microRNAs, or other molecular or clinical phenomena. We recently developed a tool, CytoGPS [^Abrams and colleagues], that converts conventional karyotypes from the standard text-based notation (the International Standard for Human Cytogenetic/Cytogenomic Nomenclature; ISCN) into a binary vector with three bits (loss, gain, or fusion) per cytoband, which we call the "LGF model".

The CytoGPS tool is available at the web site http://cytogps.org, where the LGF results of processing karyotype data are returned in JSON format. To complement the web site, we have developed RCytoGPS, an R package to extract, format, and visualize genetic data at the resolution of cytobands. RCytoGPS can parse any JSON file (or set of files) produced by CytoGPS.org.

Setup

In order to extract LGF data from JSON files, you must first load the package.

library(RCytoGPS)

Extracting JSON data and formatting to LGF model

We have included a pair of JSON files produced at CytoGPS.org as examples in the package. These are found in the following directory:

wd <-  system.file("Examples/JSONfiles", package = "RCytoGPS")
dir(wd)

The two text files contain the inputs that were uploaded to the web site; the two JSON files contain the outputs. You can specify the files and the folder that you want to read. The simplest application is to omit the files variable and read all filed in teh specified folder (which defaults to the current working directory).

temp <- readLGF(folder = wd)
rm(wd)

The return value is a list of five elements.

class(temp)
names(temp)

The source element documents which JSON file(s) were read.

temp$source

The size element lists the number of rows returned from each file; each row represents a distinct clone.

temp$size

The CL element is a data frame describing the chromosomal locations of each cytoband.

summary(temp$CL)

The raw element is itself a list, containing the binary LGF data for each JSON file processed. Each file produces a "Status" output along with the LGF data. The Status includes both the input karyotype (in ISCN format) and an indicator of whether CytoGPS could successfully process it. In this example, the first karyotype contained an error. As a result, the LGF component does not contain any rows derived from that karyotype. It does, however. contain three rows derived from the second karyotype, since the "forward slashes" separate the decriptions of three different clones that were detected in that sample.

names(temp$raw)
R <- temp$raw[[2]]
names(R)
R$Status
dim(R$LGF)
rownames(R$LGF)
rm(R)

Finally, the frequency element contains summary data from each file read. These summaries consist of the frequencies of loss, gain, and fusion events. Each row of this data frame represents a cytoband. There are three columns from each JSON file, one each for loss, gain, and fusion

F <- temp$frequency
class(F)
dim(F)
colnames(F)

Extracting the cytoband locations, and the frequency data

In order to be able to work with the cytoband-level frequency data, we must combine it with the cytoband location data. Here we assemble them into a single data frame.

cytoData <- data.frame(temp[["CL"]], temp[["frequency"]])

Turning CytoData into an S4 Object

Next, we transfrom the CytoData data frame into an S4 object using the function . The newly acquired object will then be used to generatie plots and will be available for further analyses.

bandData <- CytobandData(cytoData)

Generating Graphs

Plotting Cytoband Data Along the Genome

The first graphs (using barplot ]) summarizes the frequency data from one data column along the genome. This provides a broad overview of the changes, and can be used to visually contrast the locations of changes in different data sets. Here we use barplot twice, showing losses and gains from the first file.

opar <- par(mfrow=c(2,1))
barplot(bandData, what = "CytoGPS_Result1.Loss", col = "forestgreen")
barplot(bandData, what = "CytoGPS_Result1.Gain", col = "orange")
par(opar)

Plotting Cytoband-Level Data Along One Chromosome

The next graph allows you to simultaneously compare multiple cytogenetic events one chromosome at a time.

datacolumns <- names(temp[["frequency"]])
datacolumns
image(bandData, what = datacolumns[1:3], chr = 2, labels = TRUE)

By adding the parameter horix=TRUE, you can rotate this graph 90 degrees. For more details about the parameters of the image method, see the manual pages and the "gallery" vignette.

Idiograms

We can assemble all of the single-chromosome plots into a single "idiogram" graph that shows all chromosomes at once.

One Data Column

The purpose of this graph is to visualize the chromosomes as well as a barplot of the cytogenetic abnormalities in orderto observe and possibly identify patterns.

image(bandData, what = datacolumns[1], chr = "all", pal = "orange")

More Data Columns

This graph allows the user to compare and contrast two or more cytogenetic events simultaneously. Here we show loss (orange), gain (green), and fusion (purple) events from the Type 1 samples.

image(bandData, what = datacolumns[1:3], chr = "all", 
      pal=c("orange", "forestgreen", "purple"), horiz=TRUE)

Gallery

To see all possible visuals please go to our gallery for images.

Appendix

sessionInfo()


Try the RCytoGPS package in your browser

Any scripts or data that you put into this service are public.

RCytoGPS documentation built on Feb. 12, 2024, 3 p.m.