CELLector.Build_Search_Space_Partitioned | R Documentation |
This function assembles a user defined CELLector search space analysing genomic data from a larg cohort of cancer patients (specified in input). It identifies recurrent subtypes with matched genomic signatures (as combination of cancer functional events (CFEs), defined in [1]), linking them into a hierarchical structure shaped as a a binary three with a corresponding navigable table, as detailed in [2] and converting it into K non-overlapping groups defined as partitioned version (see details).
CELLector.Build_Search_Space_Partitioned(ctumours, cancerType, minlen=1, verbose=TRUE, mutOnly=FALSE, cnaOnly=FALSE, includeHMS=FALSE, minGlobSupp=0.01, FeatureToExclude=NULL, pathway_CFEs = NULL, pathwayFocused=NULL, subCohortDefinition=NULL, NegativeDefinition=FALSE, cnaIdMap=NULL, cnaIdDecode=NULL, hmsIdDecode=NULL, cdg=NULL, UD_genomics=FALSE)
ctumours |
A binary event matrix (BEM) modeling a cohort of cancer patients. With cancer functional events (CFEs) on the columns and sample identifers on the rows. See |
cancerType |
The cancer type under consideration (specified via a TCGA label): currently available types = BLCA, BRCA, COREAD, GBM, HNSC, KIRC, LAML, LGG, LUAD, LUSC, OV, PRAD, SKCM, STAD, THCA, UCEC |
minlen |
The minimal length of the genomic signatures (how many indivudal CFEs it is made of) in order to be considered in the analysis (1 by default) |
verbose |
A boolean argument specifying whether step-by-step information on the algorithm progression should be displayed run-time |
mutOnly |
A boolean argument specifying whether only CFEs involving somatic mutations should be considered in the analysis. If the |
cnaOnly |
A boolean argument specifying whether only CFEs involving copy number alterations (CNAs) of chromosomal segments that are recurrently CN altered should be considered in the analysis. If the |
includeHMS |
A boolean argument specitying whether methylation data should be considered while building the searching space ( |
minGlobSupp |
Minimal size of the outpputted subtypes, as ratio of the number patients included in the whole cohort (1% by default). |
FeatureToExclude |
A string (or a vector of strings) with identifiers of CFEs that should be ignored |
pathway_CFEs |
A named list of string vectors, whose elements are CFEs involving genes in a biological pathway (specified by the name of the corresponding entry). A list for 14 key cancer pathways is contained in the |
pathwayFocused |
If different from |
subCohortDefinition |
If different from |
NegativeDefinition |
If the |
cnaIdMap |
A data frame mapping chromosomal regions of recurrent copy number amplifications/deletions in cancer (RACSs, as defined in [1]) identified via ADMIRE [3] in the context of specific cancer types to PanCancer RACSs. The built-in object |
cnaIdDecode |
A table with identifiers of cancer functional events (CFEs) involving chromosomal regions of recurrent copy number alterations (RACSs, as defined by [1], i.e. identified throgh ADMIRE [3]) and their annotation. The built-in object |
hmsIdDecode |
Data frame containing annotation for the hypermethylated gene promoters CFEs. The format should be the same of the |
UD_genomics |
A boolean argument specifying whether the analysis is performed on user defined genomic data ( |
cdg |
A list of genes that are used when decoding the identifiers of cancer functional events (CFEs) involving chromosomal regions of recurrent copy number alterations (RACSs, as defined by [1]). These will be visualised in the signatures containing the RACSs including them. A predefined list of high confidence cancer driver genes (from [1]) is provided as built-in data object ( |
The function builds cancer patient partition, creating K non-overlapping groups from the hierarchical division. It first constructs the CELLector space space (CSS) in the form of binary tree and navigable table for a assigned value of minGlobSupp
. Each node of the binary tree starting from the root can include a left child, i.e. the subset of samples that have the feature described in the parent node AND the feature described in the left child node and a right child, i.e. the subset of samples that DO NOT have the feature described in the parent node AND have the feature described in the right child node (considered as complementary).
The partitioned structure is obtained from the hierarchical binary tree as follows. For each node express as a row in the navigable table, let U be the set of samples in the considered node and S its signature.
If the node has a left child U_l (that includes feature F_l), the function first defines the set of samples U_rm to be removed from U as U_rm <- U_l. If U_l has a right child U_r (with feature F_r), then U_rm is updated as the union of U_rm and U_r. If U_r has in turn another right child, this procedure is repeated and U_rm is updated as descrived above until the considered node does not have a right child. Finally, the new set of samples is defined as U_new = U \ U_rm and the associated feature is S_new = S, ~ F_l, ~ F_r, ....
Instead, if the node U does not have a left child, the iterative procedure move to the next node and U together with the associated signature S are kept as they are. Finally, a last node is creating as the remaining samples that are described by any CELLector signature detected in the hierarchical version. The corresponding signature of this group is created from the negation of the root node and all the right children (recursively as before).
Note that the newly created groups can be composed of a fraction of samples lower than the predefined minGlobSupp
value.
In this way, K non-overlapping groups are created with K equals to the number of signatures defined from the hierarchical version + 1.
A named list with the CELLector search space output for both hierarchical and partitioned output. The first is stored in hierarchical
field and is the output of CELLector.Build_Search_Space
function. The second is stored in partitioned
and is navigable table in the form of a data frame, output of CELLector.from_Hierarchical_to_Partition
function. Each row represent a group of patients, with the columns indicating
Idx
A numerical index for the group
Signature
The combination of presence or absence of CFE, identified from the hierarchical strucutre as described in details
SignatureDecoded
Same as Signature
but with identifiers of RACSs decoded, i.e. with loci and included driver genes (inputted in the cdg
argument), indicated among brackets
Points
The identifiers of the patients in the group satisfying the signature rule
Total
Number of patients satisfying the signature rule
Support
Fraction of patients satisfying the signature rule compared to the total cohort
COLORS
A vector of strings containing hexadecimal color identifiers: one for each node. These are used by the visualisation functions (CELLector.visualiseSearchingSpace
, and CELLector.visualiseSearchingSpace_sunBurst
, and can be changed using the CELLector.changeSScolors
function.
Lucia Trastulla and Francesco Iorio
[1] Iorio, F. et al. A Landscape of Pharmacogenomic Interactions in Cancer. Cell 166, 740–754 (2016).
[2] Najgebauer, H. et al. Genomics Guided Selection of Cancer in vitro Models.
https://doi.org/10.1016/j.cels.2020.04.007
CELLector.Build_Search_Space
CELLector.from_Hierarchical_to_Partition
data(CELLector.PrimTum.BEMs_v2) ### Change the following two lines to work with a different cancer type tumours_BEM<-CELLector.PrimTum.BEMs_v2$COREAD ### unicize the sample identifiers for the tumour data tumours_BEM<-CELLector.unicizeSamples(tumours_BEM) ### loading decoding table for hypermethylation CFE identifiers data(CELLector.CFEs.HMSid_decode) ### building a CELLector searching space CSS_p <- CELLector.Build_Search_Space_Partitioned(ctumours = t(tumours_BEM), verbose = FALSE, minGlobSupp = 0.05, cancerType = 'COREAD', pathway_CFEs = CELLector.Pathway_CFEs, cnaIdMap = CELLector.CFEs.CNAid_mapping, cnaIdDecode = CELLector.CFEs.CNAid_decode, hmsIdDecode = CELLector.CFEs.HMSid_decode, cdg = CELLector.HCCancerDrivers, subCohortDefinition='TP53', NegativeDefinition=TRUE, includeHMS = TRUE) ### visualising partitioned patients and group-specific feature CSS_p$partitioned
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.