CELLector.Build_Search_Space: CELLector search space construction

View source: R/CELLector.R

CELLector.Build_Search_SpaceR Documentation

CELLector search space construction

Description

This function assembles a user defined CELLector search space analysing genomic data from a larg cohort of cancer patients (specified in input). It identifies recurrent subtypes with matched genomic signatures (as combination of cancer functional events (CFEs), defined in [1]), linking them into a hierarchical structure shaped as a a binary three with a corresponding navigable table, as detailed in [2].

Usage

CELLector.Build_Search_Space<-function(ctumours,
                                       cancerType,
                                       minlen=1,
                                       verbose=TRUE,
                                       mutOnly=FALSE,
                                       cnaOnly=FALSE,
                                       includeHMS=FALSE,
                                       minGlobSupp=0.01,
                                       FeatureToExclude=NULL,
                                       pathway_CFEs = NULL,
                                       pathwayFocused=NULL,
                                       subCohortDefinition=NULL,
                                       NegativeDefinition=FALSE,
                                       cnaIdMap=NULL,
                                       cnaIdDecode=NULL,
                                       hmsIdDecode=NULL,
                                       cdg=NULL,
                                       UD_genomics=FALSE)

Arguments

ctumours

A binary event matrix (BEM) modeling a cohort of cancer patients. With cancer functional events (CFEs) on the columns and sample identifers on the rows. See CELLector.PrimTum.BEMs for further details

cancerType

The cancer type under consideration (specified via a TCGA label): currently available types = BLCA, BRCA, COREAD, GBM, HNSC, KIRC, LAML, LGG, LUAD, LUSC, OV, PRAD, SKCM, STAD, THCA, UCEC

minlen

The minimal length of the genomic signatures (how many indivudal CFEs it is made of) in order to be considered in the analysis (1 by default)

verbose

A boolean argument specifying whether step-by-step information on the algorithm progression should be displayed run-time

mutOnly

A boolean argument specifying whether only CFEs involving somatic mutations should be considered in the analysis. If the cnaOnly argument is equal to TRUE then this must be FALSE (default value)

cnaOnly

A boolean argument specifying whether only CFEs involving copy number alterations (CNAs) of chromosomal segments that are recurrently CN altered should be considered in the analysis. If the mutOnly argument is equal to TRUE then this must be FALSE (default value)

includeHMS

A boolean argument specitying whether methylation data should be considered while building the searching space (FALSE by default).

minGlobSupp

Minimal size of the outpputted subtypes, as ratio of the number patients included in the whole cohort (1% by default).

FeatureToExclude

A string (or a vector of strings) with identifiers of CFEs that should be ignored

pathway_CFEs

A named list of string vectors, whose elements are CFEs involving genes in a biological pathway (specified by the name of the corresponding entry). A list for 14 key cancer pathways is contained in the CELLector.Pathway_CFEs data object (see corresponding help page for further deatails)

pathwayFocused

If different from NULL (default value), it should be a vector of strings. In this case the analysis will consider only CFEs involving genes in a set of pathways, whose names are contained in this argument and must be present as names of the pathway_CFEs argument

subCohortDefinition

If different from NULL (default value), it should be a string containing the identifier of a CFE. In this case the analysis will consider only the primary tumour samples harbouring (or not harbouring, depending on the NegativeDefinition argument) the specified CFE

NegativeDefinition

If the subCohortDefinition argument is not NULL then this paramenter determines whether to consider primary tumour samples that harbour (if equal to FALSE, default value) or not (if equal to TRUE) the specified CFE

cnaIdMap

A data frame mapping chromosomal regions of recurrent copy number amplifications/deletions in cancer (RACSs, as defined in [1]) identified via ADMIRE [3] in the context of specific cancer types to PanCancer RACSs. The built-in object CELLector.CFEs.CNAid_mapping (or an alternative data frame with the same format) should be used.

cnaIdDecode

A table with identifiers of cancer functional events (CFEs) involving chromosomal regions of recurrent copy number alterations (RACSs, as defined by [1], i.e. identified throgh ADMIRE [3]) and their annotation. The built-in object CELLector.CFEs.CNAid_decode (or an alternative data frame with the same format) should be used.

hmsIdDecode

Data frame containing annotation for the hypermethylated gene promoters CFEs. The format should be the same of the CELLector.CFEs.HMSid_decode object.

UD_genomics

A boolean argument specifying whether the analysis is performed on user defined genomic data (TRUE) or CELLector buit-in genomic data (FALSE, default value).

cdg

A list of genes that are used when decoding the identifiers of cancer functional events (CFEs) involving chromosomal regions of recurrent copy number alterations (RACSs, as defined by [1]). These will be visualised in the signatures containing the RACSs including them. A predefined list of high confidence cancer driver genes (from [1]) is provided as built-in data object (CELLector.HCCancerDrivers)

Details

Starting from an initial cohort of patients affected by a given cancer type and modeled by the inputted binary event matrix (BEM), the most frequent alteration or set of molecular alterations (depending on the minlen argument) with the largest support (the subpopulation of patients in which these alterations occur simultaneously) is identified using the eclat function of the arules R package.

Based on this, the cohort of patients is split into two subpopulations depending on the collective presence or absence of the identified alterations. This process is then executed recursively on the two resulting subpopulations and it continues until all the alteration sets (with a support of minimal size, as specified in the minGlobSupp argument) are identified.

Each of the alterations sets identified through this recursive process is stored in a tree node. Linking nodes identified in adjacent recursions yields a binary tree: the CELLector search space. Each individual path (from the root to a node) of this tree defines a rule (signature), represented as a logic AND of multiple terms (or their negation), one per each node in the path. If the genome of a given patient in the analysed cohort satisfies the rule then it is contained in the subpopulation represented by the terminal node of that path. Collectively, all the paths in the search space provide a representation of the spectrum of combinations of molecular alterations observed in a given cancer type, and their clinical prevalence in the analysed patient population.

Value

A named list with the CELLector search space stored as a data.tree object in the TreeRoot field and as a navigable table: a data frame with a row for each node of the tree and the following columns

Idx

A numerical index for the node

Item

The most supported CFE (or a combination of CFE), identified at the iteration in which the node has been added to the three, (i) in the whole cohort of patients (for the Root), (ii) in the sub population that satisfies the parent node rule (for Left.Child nodes) or (iii) its complement (for Right.Child nodes)

ItemsDecoded

Same as Item but with identifiers of RACSs decoded, i.e. with loci and included driver genes (inputted in the cdg argument), indicated among brackets

Type

The node type: Root (first node added), Right.Child (a node resulting from the analyses of the complementar population of patients with respect to that satisfifying the Parent node rule), Left.Child (a node resulting from refining the population of patients satisfifying the Parent node rule)

Parent.Idx

The numerical index of the parent node (0 for the Root)

AbsSupport

The number of patients satisfying the node rule

CurrentTotal

The number of patients included in the population under consideration at the iteration time of the node inclusion in the tree, this is the same of the parent's AbsSupport for Left.Child nodes

PercSupport

The ratio of patients collectively harbouring the combination of CFEs specified in Items within the subpopulation under consideration at the iteration time of the node inclusion in the tree (whose size is specified in CurrentTotal)

GlobalSupport

The ratio of patients satisfying the node rule with respect to the total number of patients in the whole cohort

Left.Child.Index

Numerical index of the left child node (0 indicates absence of a left child node)

Right.Child.Index

Numerical index of the right child node (0 indicates absence of a right child node)

currentPoints

The identifiers of the patients in the sub-population under consideration at the iteration time of the node inclusion in the tree

currentFeatures

The CFEs considered at the at the iteration time of the node inclusion in the tree

positivePoints

The identifiers of the patients satisfying the node rule

COLORS

A vector of strings containing hexadecimal color identifiers: one for each node. These are used by the visualisation functions (CELLector.visualiseSearchingSpace, and CELLector.visualiseSearchingSpace_sunBurst

, and can be changed using the CELLector.changeSScolors function.

Author(s)

Hanna Najgebauer and Francesco Iorio

References

[1] Iorio, F. et al. A Landscape of Pharmacogenomic Interactions in Cancer. Cell 166, 740–754 (2016).

[2] Najgebauer, H. et al. Genomics Guided Selection of Cancer in vitro Models.

https://doi.org/10.1101/275032

[3] van Dyk, E., Reinders, M. J. T. & Wessels, L. F. A. A scale-space method for detecting recurrent DNA copy number changes with analytical false discovery rate control. Nucleic Acids Res. 41, e100 (2013).

See Also

CELLector.PrimTum.BEMs,

CELLector.Pathway_CFEs,

CELLector.CFEs.CNAid_mapping,

CELLector.CFEs.CNAid_decode,

CELLector.HCCancerDrivers,

CELLector.visualiseSearchingSpace,

CELLector.visualiseSearchingSpace_sunBurst,

CELLector.changeSScolors

Examples

data(CELLector.PrimTum.BEMs)
data(CELLector.Pathway_CFEs)
data(CELLector.CFEs.CNAid_mapping)
data(CELLector.CFEs.CNAid_decode)
data(CELLector.HCCancerDrivers)

### Change the following two lines to work with a different cancer type
tumours_BEM<-CELLector.PrimTum.BEMs$COREAD

### unicize the sample identifiers for the tumour data
tumours_BEM<-CELLector.unicizeSamples(tumours_BEM)

### building a CELLector searching space focusing on three pathways
### and TP53 wild-type patients only
CSS<-CELLector.Build_Search_Space(ctumours = t(tumours_BEM),
                                  verbose = FALSE,
                                  minGlobSupp = 0.05,
                                  cancerType = 'COREAD',
                                  pathwayFocused = c("RAS-RAF-MEK-ERK / JNK signaling",
                                                     "PI3K-AKT-MTOR signaling",
                                                     "WNT signaling"),
                                  pathway_CFEs = CELLector.Pathway_CFEs,
                                  cnaIdMap = CELLector.CFEs.CNAid_mapping,
                                  cnaIdDecode = CELLector.CFEs.CNAid_decode,
                                  cdg = CELLector.HCCancerDrivers,
                                  subCohortDefinition='TP53',
                                  NegativeDefinition=TRUE)

### visualising the CELLector searching space as a binary tree
CSS$TreeRoot

### visualising the first attributes of the tree nodes
CSS$navTable[,1:11]

### visualising the sub-cohort of patients whose genome satisfies the rule of the 4th node
str_split(CSS$navTable$positivePoints[4],',')


######################################################################
### Rebuilding the search space but considering also methylation data

### important!!!: second version of primary tumours' genomic dataset
### (including methylation data should be loaded)
data(CELLector.PrimTum.BEMs_v2)

### Change the following two lines to work with a different cancer type
tumours_BEM<-CELLector.PrimTum.BEMs_v2$COREAD

### unicize the sample identifiers for the tumour data
tumours_BEM<-CELLector.unicizeSamples(tumours_BEM)

### loading decoding table for hypermethylation CFE identifiers
data(CELLector.CFEs.HMSid_decode)

### building a CELLector searching space
CSS<-CELLector.Build_Search_Space(ctumours = t(tumours_BEM),
                                  verbose = FALSE,
                                  minGlobSupp = 0.05,
                                  cancerType = 'COREAD',
                                  pathway_CFEs = CELLector.Pathway_CFEs,
                                  cnaIdMap = CELLector.CFEs.CNAid_mapping,
                                  cnaIdDecode = CELLector.CFEs.CNAid_decode,
                                  hmsIdDecode = CELLector.CFEs.HMSid_decode,
                                  cdg = CELLector.HCCancerDrivers,
                                  subCohortDefinition='TP53',
                                  NegativeDefinition=TRUE,
                                  includeHMS = TRUE)

### visualising the CELLector searching space as a binary tree
CSS$TreeRoot

### visualising the first attributes of the tree nodes
CSS$navTable[,1:11]

### visualising the sub-cohort of patients whose genome satisfies the rule of the 4th node
str_split(CSS$navTable$positivePoints[4],',')


najha/CELLector documentation built on Feb. 8, 2023, 5:35 a.m.