knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Progenetix is an open data resource that provides curated individual cancer copy number variation (CNV) profiles along with associated metadata sourced from published oncogenomic studies and various data repositories. This vignette provides a comprehensive guide on accessing and utilizing metadata for samples or their corresponding individuals within the Progenetix database.

If your focus lies in cancer cell lines, you can access data from cancercelllines.org by setting the domain parameter to "cancercelllines.org" in pgxLoader function. This data repository originates from CNV profiling data of cell lines initially collected as part of Progenetix and currently includes additional types of genomic mutations.

Load library

library(pgxRpi)

pgxLoader function

This function loads various data from Progenetix database via the Beacon v2 API with some extensions (BeaconPlus).

The parameters of this function used in this tutorial:

Retrieve biosamples information

Search by filters

Filters are a significant enhancement to the Beacon query API, providing a mechanism for specifying rules to select records based on their field values. To learn more about how to utilize filters in Progenetix, please refer to the documentation.

The following example demonstrates how to access all available filters in Progenetix:

all_filters <- pgxLoader(type="filtering_terms")
head(all_filters)

If you're interested in filters related to a specific disease or phenotype, you can use the filter_pattern argument to narrow down the list. For example, to search for filters related to retinoblastoma:

query_filter <- pgxLoader(type="filtering_terms",filter_pattern="retinoblastoma")
query_filter

To retrieve biosamples associated with a specific disease, use appropriate filter terms. In this example, we use an NCIt code corresponding to retinoblastoma (NCIT:C7541):

biosamples <- pgxLoader(type="biosamples", filters = "NCIT:C7541")
# data looks like this
biosamples[1:5,]

The data contains many columns representing different aspects of sample information.

Search by biosample id and individual id

In the Beacon v2 specification, biosample id and individual id are unique identifiers for biosamples and their corresponding individuals, respectively. These identifiers can be obtained through metadata searches using filters as described above or by querying the Progenetix search interface, which provides access to the IDs used in the Progenetix database.

biosamples_2 <- pgxLoader(type="biosamples", biosample_id = "pgxbs-kftvki7h",individual_id = "pgxind-kftx6ltu")

biosamples_2

It's also possible to query by a combination of filters, biosample id, and individual id.

Access a subset of samples

By default, it returns all related samples (limit=0). You can access a subset of them via the parameter limit and skip. For example, if you want to access the first 10 samples , you can set limit = 10, skip = 0.

biosamples_3 <- pgxLoader(type="biosamples", filters = "NCIT:C7541",skip=0, limit = 10)
# Dimension: Number of samples * features
print(dim(biosamples))
print(dim(biosamples_3))

Parameter codematches use

Some filters, such as NCIt codes, are hierarchical. As a result, retrieved samples may include not only the specified filters but also their child terms.

unique(biosamples$histological_diagnosis_id)

Setting codematches as TRUE allows this function to only return biosamples that exactly match the specified filter, excluding child terms.

biosamples_4 <- pgxLoader(type="biosamples", filters = "NCIT:C7541",codematches = TRUE)
unique(biosamples_4$histological_diagnosis_id)

Retrieve individuals information

If you want to query details of individuals (e.g. clinical data) where the samples of interest come from, set the parameter type to "individuals" and follow the same steps as above.

individuals <- pgxLoader(type="individuals",individual_id = "pgxind-kftx26ml",filters="NCIT:C7541")
# data looks like this
tail(individuals,2)

Retrieve analyses information

If you want to know more details about data analyses, set the parameter type to "analyses". The other steps are the same, except the parameter codematches is not available because analyses data do not include filter information, even though it can be searched by filters.

analyses <- pgxLoader(type="analyses",biosample_id = c("pgxbs-kftvik5i","pgxbs-kftvik96"))

analyses

Retrieve the number of results for a specific filter

To retrieve the number of results for a specific filter in Progenetix, set the type parameter to "counts". You can query different Beacon v2 resources by setting the domain and entry_point parameters accordingly.

pgxLoader(type="counts",filters = "NCIT:C7541")

Query from multiple data resources

You can query data from multiple resources via the Beacon v2 API by setting the domain and entry_point parameters accordingly. To speed up the process, use the num_cores parameter to enable parallel processing across different domains. For resources that only support http (e.g., local or internal network instances), set use_https = FALSE to avoid connection issues.

record_counts <- pgxLoader(type="counts",filters = "NCIT:C9245",domain=c("progenetix.org","cancercelllines.org"), entry_point=c("beacon","beacon"))

record_counts

Visualization of survival data

Suppose you want to investigate whether there are survival differences associated with a particular disease, for example, between younger and older patients, or based on other variables. You can query and visualize the relevant information using the pgxMetaplot function.

pgxMetaplot function

This function generates a survival plot using metadata of individuals obtained by the pgxLoader function.

The parameters of this function:

Example usage

# query metadata of individuals with lung adenocarcinoma
luad_inds <- pgxLoader(type="individuals",filters="NCIT:C3512")
# use 70 years old as the splitting condition
pgxMetaplot(data=luad_inds, group_id="age_iso", condition="P70Y", pval=TRUE)

It's noted that not all individuals have available survival data. If you set return_data to TRUE, the function will return the metadata of individuals used for the plot.

Session Info

sessionInfo()


progenetix/pgxRpi documentation built on June 1, 2025, 1:06 p.m.