processMultipleStudies: Check Expression/methylation Profile for various cancer...

View source: R/cbaf-processMultipleStudies.R

processMultipleStudiesR Documentation

Check Expression/methylation Profile for various cancer studies.

Description

This function Obtains the requested data for the given genes across multiple cancer studie. It can check whether or not all genes are included in cancer studies and and, if not, looks for the alternative gene names. Then it calculates frequency percentage, frequency ratio, mean value and median value of samples greather than specific value in the selected cancer studies. Furthermore, it looks for the five genes that comprise the highest values in each cancer study.

Usage

processMultipleStudies(genesList, submissionName, studiesNames,
  desiredTechnique, cancerCode = FALSE, validateGenes = TRUE, calculate =
  c("frequencyPercentage", "frequencyRatio", "meanValue"), cutoff=NULL,
  round=TRUE, topGenes = TRUE, shortenStudyNames = TRUE, geneLimit = 50,
  rankingMethod = "variation", heatmapFileFormat = "TIFF", resolution = 600,
  RowCex = "auto", ColCex = "auto", heatmapMargines = "auto",
  rowLabelsAngle = 0, columnLabelsAngle = 45, heatmapColor = "RdBu",
  reverseColor = TRUE, transposedHeatmap = FALSE, simplifyBy = FALSE,
  genesToDrop = FALSE, transposeResults = FALSE)

Arguments

genesList

a list that contains at least one gene group

submissionName

a character string containing name of interest. It is used for naming the process.

studiesNames

a character vector or a matrix that containes desired cancer names. The character vector containes standard names of cancer studies that can be found on cbioportal.org, such as "Acute Myeloid Leukemia (TCGA, NEJM 2013)". Alternatively, a matrix can be used if users prefer user-defined cancer names. In this case, the first column of matrix comprises the standard cancer names while the second column must contain the desired cancer names.

desiredTechnique

a character string that is one of the following techniques: "RNA-Seq", "RNA-SeqRTN", "microRNA-Seq", "microarray.mRNA", "microarray.microRNA" or "methylation".

cancerCode

a logical value that tells the function to use cbioportal abbreviated cancer names instead of complete cancer names, if set to be "TRUE". For example, "laml_tcga_pub" is the abbreviated name for "Acute Myeloid Leukemia (TCGA, NEJM 2013)".

validateGenes

a logical value that, if set to be TRUE, causes the function to check each cancer study to find whether or not each gene has a record. If a cancer doesn't have a record for specific gene, function looks for alternative gene names that cbioportal might use instead of the given gene name.

calculate

a character vector that containes the statistical procedures users prefer the function to compute. The complete results can be obtained by c("frequencyPercentage", "frequencyRatio", "meanValue", "medianValue"). This will tell the function to compute the following: "frequencyPercentage", which is the percentge of samples having the value greather than specific cutoff divided by the total sample size for every study / study subgroup; "frequency ratio", which shows the number of selected samples divided by the total number of samples that give the frequency percentage for every study / study subgroup. It shows the selected and total sample sizes.; "Mean Value", that contains mean value of selected samples for each study; "Median Value", which shows the median value of selected samples for each study. The default input is calculate = c("frequencyPercentage", "frequencyRatio", "meanValue").

cutoff

a number used to limit samples to those that are greather than this number (cutoff). The default value for methylation data is 0.8 while gene expression studies use default value of 2. For methylation studies, it is average of relevant locations, for the rest, it is "log z-score". To change the cutoff to any desired number, change the option to cutoff = desiredNumber in which desiredNumber is the number of interest.

round

a logical value that, if set to be TRUE, will force the function to round all the calculated values to two decimal places. The default value is TRUE.

topGenes

a logical value that, if set as TRUE, causes the function to create three dataframes that contain the five top genes for each cancer. To get all the three data.frames, "frequencyPercentage", "meanValue" and "medianValue" must have been included for calculate.

shortenStudyNames

a logical vector. If the value is set as TRUE, function will try to remove the last part of the cancer names aiming to shorten them. The removed segment usually contains the name of scientific group that has conducted the experiment.

geneLimit

if large number of genes exist in at least one gene group, this option can be used to limit the number of genes that are shown on heatmap. For instance, geneLimit=50 will limit the heatmap to 50 genes showing the most variation across multiple study / study subgroups. The default value is 50.

rankingMethod

a character value that determines how genes will be ranked prior to drawing heatmap. "variation" orders the genes based on unique values in one or few cancer studies while "highValue" ranks the genes when they contain high values in multiple / many cancer studies. This option is useful when number of genes are too much so that user has to limit the number of genes on heatmap by geneLimit.

heatmapFileFormat

This option enables the user to select the desired image file format of the heatmaps. The default value is "TIFF". Other supported formats include "JPG", "BMP", "PNG", and "PDF".

resolution

a number. This option can be used to adjust the resolution of the output heatmaps as 'dot per inch'. The defalut value is 600.

RowCex

a number that specifies letter size in heatmap row names, which ranges from 0 to 2. If RowCex = "auto", the function will automatically determine the best RowCex.

ColCex

a number that specifies letter size in heatmap column names, which ranges from 0 to 2. If ColCex = "auto", the function will automatically determine the best ColCex.

heatmapMargines

a numeric vector that is used to set heatmap margins. If heatmapMargines = "auto", the function will automatically determine the best possible margines. Otherwise, enter the desired margine as e.g. c(10,10.)

rowLabelsAngle

a number that determines the angle with which the gene names are shown in heatmaps. The default value is 0 degree.

columnLabelsAngle

a number that determines the angle with which the studies/study subgroups names are shown in heatmaps. The default value is 45 degree.

heatmapColor

a character string that defines heatmap color. The default value is 'RdBu'. 'RdGr' is also a popular color in genomic studies. To see the rest of colors, please type library(RColorBrewer) and then display.brewer.all().

reverseColor

a logical value that reverses the color gradiant for heatmap(s).

transposedHeatmap

a logical value that transposes heatmap rows to columns and vice versa.

simplifyBy

a number that tells the function to change the values smaller than that to zero. The purpose behind this option is to facilitate recognizing candidate genes. Therefore, it is not suited for publications. It has the same unit as cutoff.

genesToDrop

a character vector. Gene names within this vector will be omitted from heatmap.The default value is FALSE.

transposeResults

a logical value that enables the function to replace the columns and rows of data.

Details

Package: cbaf
Type: Package
Version: 1.19.5
Date: 2022-07-19
License: Artistic-2.0

Value

a BiocFileCache object that containes some or all of the following groups, based on what user has chosen: obtainedData, validationResults, frequencyPercentage, Top.Genes.of.Frequency.Percentage, frequencyRatio, meanValue, Top.Genes.of.Mean.Value, medianValue, Top.Genes.of.Median.Value. It also saves these results in one excel files for convenience. Based on preference, three heatmaps for frequency percentage, mean value and median can be generated. If more than one group of genes is entered, output for each group will be strored in a separate sub-directory.

Author(s)

Arman Shahrisa, shahrisa.arman@hotmail.com [maintainer, copyright holder]

Maryam Tahmasebi Birgani, tahmasebi-ma@ajums.ac.ir

Examples

genes <- list(K.demethylases = c("KDM1A", "KDM1B", "KDM2A", "KDM2B", "KDM3A",
 "KDM3B", "JMJD1C", "KDM4A"), K.methyltransferases = c("SUV39H1", "SUV39H2",
 "EHMT1", "EHMT2", "SETDB1", "SETDB2", "KMT2A", "KMT2A"))

studies <- c("Acute Myeloid Leukemia (TCGA, Provisional)",
"Adrenocortical Carcinoma (TCGA, Provisional)",
"Bladder Urothelial Carcinoma (TCGA, Provisional)",
"Brain Lower Grade Glioma (TCGA, Provisional)",
"Breast Invasive Carcinoma (TCGA, Provisional)")

processMultipleStudies(genes, "test2", studies, "RNA-Seq",
calculate = c("frequencyPercentage", "frequencyRatio"), heatmapMargines =
c(16,10), RowCex = 1, ColCex = 1)


armanshahrisa/cbaf documentation built on Nov. 5, 2022, 3:21 a.m.