CELLector.Build_Search_Space | R Documentation |
This function assembles a user defined CELLector search space analysing genomic data from a larg cohort of cancer patients (specified in input). It identifies recurrent subtypes with matched genomic signatures (as combination of cancer functional events (CFEs), defined in [1]), linking them into a hierarchical structure shaped as a a binary three with a corresponding navigable table, as detailed in [2].
CELLector.Build_Search_Space<-function(ctumours, cancerType, minlen=1, verbose=TRUE, mutOnly=FALSE, cnaOnly=FALSE, includeHMS=FALSE, minGlobSupp=0.01, FeatureToExclude=NULL, pathway_CFEs = NULL, pathwayFocused=NULL, subCohortDefinition=NULL, NegativeDefinition=FALSE, cnaIdMap=NULL, cnaIdDecode=NULL, hmsIdDecode=NULL, cdg=NULL, UD_genomics=FALSE)
ctumours |
A binary event matrix (BEM) modeling a cohort of cancer patients. With cancer functional events (CFEs) on the columns and sample identifers on the rows. See |
cancerType |
The cancer type under consideration (specified via a TCGA label): currently available types = BLCA, BRCA, COREAD, GBM, HNSC, KIRC, LAML, LGG, LUAD, LUSC, OV, PRAD, SKCM, STAD, THCA, UCEC |
minlen |
The minimal length of the genomic signatures (how many indivudal CFEs it is made of) in order to be considered in the analysis (1 by default) |
verbose |
A boolean argument specifying whether step-by-step information on the algorithm progression should be displayed run-time |
mutOnly |
A boolean argument specifying whether only CFEs involving somatic mutations should be considered in the analysis. If the |
cnaOnly |
A boolean argument specifying whether only CFEs involving copy number alterations (CNAs) of chromosomal segments that are recurrently CN altered should be considered in the analysis. If the |
includeHMS |
A boolean argument specitying whether methylation data should be considered while building the searching space ( |
minGlobSupp |
Minimal size of the outpputted subtypes, as ratio of the number patients included in the whole cohort (1% by default). |
FeatureToExclude |
A string (or a vector of strings) with identifiers of CFEs that should be ignored |
pathway_CFEs |
A named list of string vectors, whose elements are CFEs involving genes in a biological pathway (specified by the name of the corresponding entry). A list for 14 key cancer pathways is contained in the |
pathwayFocused |
If different from |
subCohortDefinition |
If different from |
NegativeDefinition |
If the |
cnaIdMap |
A data frame mapping chromosomal regions of recurrent copy number amplifications/deletions in cancer (RACSs, as defined in [1]) identified via ADMIRE [3] in the context of specific cancer types to PanCancer RACSs. The built-in object |
cnaIdDecode |
A table with identifiers of cancer functional events (CFEs) involving chromosomal regions of recurrent copy number alterations (RACSs, as defined by [1], i.e. identified throgh ADMIRE [3]) and their annotation. The built-in object |
hmsIdDecode |
Data frame containing annotation for the hypermethylated gene promoters CFEs. The format should be the same of the |
UD_genomics |
A boolean argument specifying whether the analysis is performed on user defined genomic data ( |
cdg |
A list of genes that are used when decoding the identifiers of cancer functional events (CFEs) involving chromosomal regions of recurrent copy number alterations (RACSs, as defined by [1]). These will be visualised in the signatures containing the RACSs including them. A predefined list of high confidence cancer driver genes (from [1]) is provided as built-in data object ( |
Starting from an initial cohort of patients affected by a given cancer type and modeled by the inputted binary event matrix (BEM), the most frequent alteration or set of molecular alterations (depending on the minlen
argument) with the largest support (the subpopulation of patients in which these alterations occur simultaneously) is identified using the eclat
function of the arules
R package.
Based on this, the cohort of patients is split into two subpopulations depending on the collective presence or absence of the identified alterations. This process is then executed recursively on the two resulting subpopulations and it continues until all the alteration sets (with a support of minimal size, as specified in the minGlobSupp
argument) are identified.
Each of the alterations sets identified through this recursive process is stored in a tree node. Linking nodes identified in adjacent recursions yields a binary tree: the CELLector search space. Each individual path (from the root to a node) of this tree defines a rule (signature), represented as a logic AND of multiple terms (or their negation), one per each node in the path. If the genome of a given patient in the analysed cohort satisfies the rule then it is contained in the subpopulation represented by the terminal node of that path. Collectively, all the paths in the search space provide a representation of the spectrum of combinations of molecular alterations observed in a given cancer type, and their clinical prevalence in the analysed patient population.
A named list with the CELLector search space stored as a data.tree
object in the TreeRoot
field
and as a navigable table: a data frame with a row for each node of the tree and the following columns
Idx
A numerical index for the node
Item
The most supported CFE (or a combination of CFE), identified at the iteration in which the node has been added to the three, (i) in the whole cohort of patients (for the Root
), (ii) in the sub population that satisfies the parent node rule (for Left.Child
nodes) or (iii) its complement (for Right.Child
nodes)
ItemsDecoded
Same as Item
but with identifiers of RACSs decoded, i.e. with loci and included driver genes (inputted in the cdg
argument), indicated among brackets
Type
The node type: Root (first node added), Right.Child (a node resulting from the analyses of the complementar population of patients with respect to that satisfifying the Parent node rule), Left.Child (a node resulting from refining the population of patients satisfifying the Parent node rule)
Parent.Idx
The numerical index of the parent node (0 for the Root
)
AbsSupport
The number of patients satisfying the node rule
CurrentTotal
The number of patients included in the population under consideration at the iteration time of the node inclusion in the tree, this is the same of the parent's AbsSupport
for Left.Child
nodes
PercSupport
The ratio of patients collectively harbouring the combination of CFEs specified in Items
within the subpopulation under consideration at the iteration time of the node inclusion in the tree (whose size is specified in CurrentTotal
)
GlobalSupport
The ratio of patients satisfying the node rule with respect to the total number of patients in the whole cohort
Left.Child.Index
Numerical index of the left child node (0 indicates absence of a left child node)
Right.Child.Index
Numerical index of the right child node (0 indicates absence of a right child node)
currentPoints
The identifiers of the patients in the sub-population under consideration at the iteration time of the node inclusion in the tree
currentFeatures
The CFEs considered at the at the iteration time of the node inclusion in the tree
positivePoints
The identifiers of the patients satisfying the node rule
COLORS
A vector of strings containing hexadecimal color identifiers: one for each node. These are used by the visualisation functions (CELLector.visualiseSearchingSpace
, and CELLector.visualiseSearchingSpace_sunBurst
, and can be changed using the CELLector.changeSScolors
function.
Hanna Najgebauer and Francesco Iorio
[1] Iorio, F. et al. A Landscape of Pharmacogenomic Interactions in Cancer. Cell 166, 740–754 (2016).
[2] Najgebauer, H. et al. Genomics Guided Selection of Cancer in vitro Models.
https://doi.org/10.1101/275032
[3] van Dyk, E., Reinders, M. J. T. & Wessels, L. F. A. A scale-space method for detecting recurrent DNA copy number changes with analytical false discovery rate control. Nucleic Acids Res. 41, e100 (2013).
CELLector.PrimTum.BEMs
,
CELLector.Pathway_CFEs
,
CELLector.CFEs.CNAid_mapping
,
CELLector.CFEs.CNAid_decode
,
CELLector.HCCancerDrivers
,
CELLector.visualiseSearchingSpace
,
CELLector.visualiseSearchingSpace_sunBurst
,
CELLector.changeSScolors
data(CELLector.PrimTum.BEMs) data(CELLector.Pathway_CFEs) data(CELLector.CFEs.CNAid_mapping) data(CELLector.CFEs.CNAid_decode) data(CELLector.HCCancerDrivers) ### Change the following two lines to work with a different cancer type tumours_BEM<-CELLector.PrimTum.BEMs$COREAD ### unicize the sample identifiers for the tumour data tumours_BEM<-CELLector.unicizeSamples(tumours_BEM) ### building a CELLector searching space focusing on three pathways ### and TP53 wild-type patients only CSS<-CELLector.Build_Search_Space(ctumours = t(tumours_BEM), verbose = FALSE, minGlobSupp = 0.05, cancerType = 'COREAD', pathwayFocused = c("RAS-RAF-MEK-ERK / JNK signaling", "PI3K-AKT-MTOR signaling", "WNT signaling"), pathway_CFEs = CELLector.Pathway_CFEs, cnaIdMap = CELLector.CFEs.CNAid_mapping, cnaIdDecode = CELLector.CFEs.CNAid_decode, cdg = CELLector.HCCancerDrivers, subCohortDefinition='TP53', NegativeDefinition=TRUE) ### visualising the CELLector searching space as a binary tree CSS$TreeRoot ### visualising the first attributes of the tree nodes CSS$navTable[,1:11] ### visualising the sub-cohort of patients whose genome satisfies the rule of the 4th node str_split(CSS$navTable$positivePoints[4],',') ###################################################################### ### Rebuilding the search space but considering also methylation data ### important!!!: second version of primary tumours' genomic dataset ### (including methylation data should be loaded) data(CELLector.PrimTum.BEMs_v2) ### Change the following two lines to work with a different cancer type tumours_BEM<-CELLector.PrimTum.BEMs_v2$COREAD ### unicize the sample identifiers for the tumour data tumours_BEM<-CELLector.unicizeSamples(tumours_BEM) ### loading decoding table for hypermethylation CFE identifiers data(CELLector.CFEs.HMSid_decode) ### building a CELLector searching space CSS<-CELLector.Build_Search_Space(ctumours = t(tumours_BEM), verbose = FALSE, minGlobSupp = 0.05, cancerType = 'COREAD', pathway_CFEs = CELLector.Pathway_CFEs, cnaIdMap = CELLector.CFEs.CNAid_mapping, cnaIdDecode = CELLector.CFEs.CNAid_decode, hmsIdDecode = CELLector.CFEs.HMSid_decode, cdg = CELLector.HCCancerDrivers, subCohortDefinition='TP53', NegativeDefinition=TRUE, includeHMS = TRUE) ### visualising the CELLector searching space as a binary tree CSS$TreeRoot ### visualising the first attributes of the tree nodes CSS$navTable[,1:11] ### visualising the sub-cohort of patients whose genome satisfies the rule of the 4th node str_split(CSS$navTable$positivePoints[4],',')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.