processOneStudy: Check Expression/methylation Profile for various subgroups of...

Description Usage Arguments Details Value Author(s) Examples

View source: R/cbaf-processOneStudy.R


This function Obtains the requested data for the given genes across multiple subgroups of a cancer. It can check whether or not all genes are included in subgroups of a cancer study and, if not, looks for the alternative gene names. Then it calculates frequency percentage, frequency ratio, mean value and median value of samples greather than specific value in the selected subgroups of the cancer. Furthermore, it looks for the five genes that comprise the highest values in each cancer study subgroup.


processOneStudy(genesList, submissionName, studyName, desiredTechnique
  , desiredCaseList = FALSE, validateGenes = TRUE, calculate =
  c("frequencyPercentage", "frequencyRatio", "meanValue"), cutoff=NULL,
  round=TRUE, topGenes = TRUE, shortenStudyNames = TRUE, geneLimit = FALSE,
  rankingMethod = "variation", heatmapFileFormat = "TIFF", resolution = 600,
  RowCex = "auto", ColCex = "auto", heatmapMargines = "auto",
  rowLabelsAngle = 0, columnLabelsAngle = 45, heatmapColor = "RdBu",
  reverseColor = TRUE, transposedHeatmap = FALSE, simplifyBy = FALSE,
  genesToDrop = FALSE, transposeResults = FALSE)



a list that contains at least one gene group


a character string containing name of interest. It is used for naming the process.


a character string showing the desired cancer name. It is an standard cancer study name that can be found on, such as "Acute Myeloid Leukemia (TCGA, NEJM 2013)".


a character string that is one of the following techniques: "RNA-Seq", "microRNA-Seq", "microarray.mRNA" , "microarray.microRNA" or "methylation".


a numeric vector that contains the index of desired cancer subgroups, assuming the user knows index of desired subgroups. If not, desiredCaseList is set to "none", function will show the available subgroups and ask the user to enter the desired ones during the process. The default value is "none".


a logical value that, if set to be TRUE, causes the function to check each cancer study to find whether or not each gene has a record. If a cancer doesn't have a record for specific gene, function looks for alternative gene names that cbioportal might use instead of the given gene name.


a character vector that containes the statistical procedures users prefer the function to compute. The complete results can be obtained by c("frequencyPercentage", "frequencyRatio", "meanValue", "medianValue"). This will tell the function to compute the following: "frequencyPercentage", which is the percentge of samples having the value greather than specific cutoff divided by the total sample size for every study / study subgroup; "frequency ratio", which shows the number of selected samples divided by the total number of samples that give the frequency percentage for every study / study subgroup. It shows the selected and total sample sizes.; "Mean Value", that contains mean value of selected samples for each study; "Median Value", which shows the median value of selected samples for each study. The default input is calculate = c("frequencyPercentage", "frequencyRatio", "meanValue").


a number used to limit samples to those that are greather than specific number (cutoff). The default value for methylation data is 0.6 while gene expression studies use default value of 2. For methylation studies, it is average of relevant locations, for the rest, it is "log z-score". To change the cutoff to any desired number, change the option to cutoff = desiredNumber, in which desiredNumber is the number of interest.


a logical value that, if set to be TRUE, will force the function to round all the calculated values to two decimal places. The default value is TRUE.


a logical value that, if set as TRUE, causes the function to create three dataframes that contain the five top genes for each cancer. To get all the three dataframes, "frequencyPercentage", "meanValue" and "medianValue" must have been included for "calculate".


a logical vector. If the value is set as TRUE, function will try to remove the last part of the cancer names aiming to shorten them. The removed segment usually contains the name of scientific group that has conducted the experiment.


if large number of genes exist in at least one gene group, this option can be used to limit the number of genes that are shown on heatmap. For instance, geneLimit=50 will limit the heatmap to 50 genes showing the most variation across multiple study / study subgroups. The default value is none.


a character value that determines how genes will be ranked prior to drawing heatmap. "variation" orders the genes based on unique values in one or few cancer studies while "highValue" ranks the genes when they cotain high values in multiple / many cancer studies. This option is useful when number of genes are too much so that user has to limit the number of genes on heatmap by geneLimit.


This option enables the user to select the desired image file format of the heatmaps. The default value is "TIFF". Other suppoeted formats include "PNG", "BMP", and "JPG".


a number. This option can be used to adjust the resolution of the output heatmaps as 'dot per inch'. The defalut value is 600.


a number that specifies letter size in heatmap row names, which ranges from 0 to 2. If RowCex = "auto", the function will automatically determine the best RowCex.


a number that specifies letter size in heatmap column names, which ranges from 0 to 2. If ColCex = "auto", the function will automatically determine the best ColCex.


a numeric vector that is used to set heatmap margins. If heatmapMargines = "auto", the function will automatically determine the best possible margines. Otherwise, enter the desired margine as e.g. c(10,10.)


a number that determines the angle with which the gene names are shown in heatmaps. The default value is 0 degree.


a number that determines the angle with which the studies/study subgroups names are shown in heatmaps. The default value is 45 degree.


a character string that defines heatmap color. The default value is 'RdBu'. 'RdGr' is also a popular color in genomic studies. To see the rest of colors, please type library(RColorBrewer) and then display.brewer.all().


a logical value that reverses the color gradiant for heatmap(s).


a logical value that transposes heatmap rows to columns and vice versa.


a number that tells the function to change the values smaller than that to zero. The purpose behind this option is to facilitate recognizing candidate genes. Therefore, it is not suited for publications. It has the same unit as cutoff.


a character vector. Gene names within this vector will be omitted from heatmap.The default value is FALSE.


a logical value that enables the function to replace the columns and rows of data.


Package: cbaf
Type: Package
Version: 1.12.1
Date: 2020-12-07
License: Artistic-2.0


a BiocFileCache object that containes some or all of the following groups, based on what user has chosen: ObtainedData, validationResults, frequencyPercentage, Top.Genes.of.Frequency.Percentage, frequencyRatio, meanValue, Top.Genes.of.Mean.Value, medianValue, Top.Genes.of.Median.Value. It also saves these results in one excel files for convenience. Based on preference, three heatmaps for frequency percentage, mean value and median can be generated. If more than one group of genes is entered, output for each group will be strored in a separate sub-directory.


Arman Shahrisa, [maintainer, copyright holder]

Maryam Tahmasebi Birgani,


genes <- list(K.demethylases = c("KDM1A", "KDM1B", "KDM2A", "KDM2B", "KDM3A",
 "KDM3B", "JMJD1C", "KDM4A"), K.methyltransferases = c("SUV39H1", "SUV39H2",
 "EHMT1", "EHMT2", "SETDB1", "SETDB2", "KMT2A", "KMT2A"))

processOneStudy(genes, "test", "Breast Invasive Carcinoma (TCGA, Cell 2015)",
"RNA-Seq", desiredCaseList = c(2,3,4,5), calculate = c("frequencyPercentage",
"frequencyRatio"), heatmapMargines = c(16, 10), RowCex = 1, ColCex = 1)

cbaf documentation built on Dec. 9, 2020, 2:02 a.m.