knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Progenetix is an open data resource that provides curated individual cancer copy number variation (CNV) profiles along with associated metadata sourced from published oncogenomic studies and various data repositories. Progenetix uses the ".pgxseg" data format to store variant data, which encompasses CNV (Copy Number Variation) and SNV (Single Nucleotide Variant), as well as the metadata of associated samples. This vignette describes how to work with local ".pgxseg" files using this package. For more details about the ".pgxseg" file format, please refer to the the documentation.

Load library

library(pgxRpi)
library(GenomicRanges) # for pgxfreq object

pgxSegprocess function

This function extracts segment variants, CNV frequency, and metadata from local "pgxseg" files. Additionally, it supports survival data visualization if survival data is available within the file.

The parameters of this function used in this tutorial:

Extract segment data

# specify the location of the example file
file_name <- system.file("extdata", "example.pgxseg",package = 'pgxRpi')

# extract segment data
seg <- pgxSegprocess(file=file_name,return_seg = TRUE)

The segment data looks like this

head(seg)

Extract metadata

meta <- pgxSegprocess(file=file_name,return_metadata = TRUE)

The metadata looks like this

head(meta)

Visualize survival data in metadata

The KM plot is plotted from samples with available followup state and followup time. The default grouping is "group_id" column in metadata.

pgxSegprocess(file=file_name,show_KM_plot = TRUE)

You can try different grouping by group_id parameter

pgxSegprocess(file=file_name,show_KM_plot = TRUE,group_id = 'histological_diagnosis_id')

You can specify more parameters to modify this plot (see parameter ... in documentation)

pgxSegprocess(file=file_name,show_KM_plot = TRUE,pval=TRUE,palette='npg')

Calculate CNV frequency

The CNV frequency is calculated from segments of samples with the same group id. The group id is specified in group_id parameter. More details about CNV frequency see the vignette Introduction_3_loadfrequency.

# Default is "group_id" in metadata
frequency <- pgxSegprocess(file=file_name,return_frequency = TRUE) 
# Use different ids for grouping
frequency_2 <- pgxSegprocess(file=file_name,return_frequency = TRUE, 
                             group_id ='icdo_morphology_id')
frequency

The returned object is same as the CNV frequency object with "pgxfreq" format returned by pgxLoader function. The CNV frequency is calculated from groups which exist in both metadata and segment data. It is noted that not all groups in metadata must exist in segment data (e.g. some samples don't have CNV calls).

head(frequency[["pgx:icdot-C16.9"]])

The associated metadata in CNV frequency objects looks like this

mcols(frequency)
mcols(frequency_2)

You can visualize the CNV frequency of the interesting group using pgxFreqplot function. For more details on this function, see the vignette Introduction_3_loadfrequency.

pgxFreqplot(frequency, filters="pgx:icdot-C16.9")
pgxFreqplot(frequency, filters="pgx:icdot-C16.9",chrom = c(1,8,14), layout = c(3,1))
pgxFreqplot(frequency, filters=c("pgx:icdot-C16.9","pgx:icdot-C73.9"),circos = TRUE)

Extract all data

If you want to extract different types of data, such as segment variants, metadata, CNV frequency and visualize the survival data simultaneously, you can set the corresponding parameters to TRUE. The returned data will be an object that includes all specified data. Note that in this case, the CNV frequency and KM plot will use the same group_id.

info <- pgxSegprocess(file=file_name,show_KM_plot = TRUE, return_seg = TRUE, 
                      return_metadata = TRUE, return_frequency = TRUE)
names(info)

Session Info

sessionInfo()


progenetix/pgxRpi documentation built on Sept. 14, 2024, 2:21 p.m.