groups_identification_mclust: Separate patients in groups

Description Usage Arguments Value Examples

View source: R/function_noMclust.R

Description

groups_identification_mclust is a function designed to separate patients in groups, powered by mclust.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
groups_identification_mclust(
  data_type,
  group_number = "",
  modelName = NULL,
  uncertainty_cutoff = 0.05,
  rerun_plots = FALSE,
  n_breaks = 55,
  width = 2000,
  height = 1500,
  res = 300,
  unit = "px",
  image_format = "png",
  save_data = TRUE,
  env,
  tumor,
  data_base,
  work_dir = "~/Desktop",
  name
)

Arguments

data_type

Type of data. It could be "methylation", "mutation", "clinical_supplement", "biospecimen", "gene", or "clinical"(biotab).

  • Only present in "Legacy" database:"protein", "Exon quantification", "miRNA gene quantification", "miRNA isoform quantification", "isoform", and "image".

  • Only present in "GDC" database:"miRNA Expression Quantification", and "Isoform Expression Quantification" (miRNA).

group_number

Numerical value indicating how many groups should be generated.

modelName

A character string indicating which mclust model name will be used. For more details please check mclustModelNames help file.

uncertainty_cutoff

Numerical value indicating which uncertainty value for the separation should be tolerated. Patients over this threshold will be removed from the analysis.

rerun_plots

Logical value where TRUE indicate that the function should run the step of group generation using the uncertainty_cutoff parameter for filtering the data. The default is FALSE.

n_breaks

Numerical value giving the number of cells for the hist bars. As default n_breaks = 55.

width, height, res, unit

Graphical parameters. See par for more details. As default width = 2000, height = 1500, res = 300 and unit = "px".

image_format

A character string indicating which image_format will be used. It could be "png" or "svg". The only unit available in "svg" is inches ('in'). The default is "png".

save_data

Logical value where TRUE indicates that the concatenate and filtered matrix should be saved in local storage. The default is FALSE.

env

A character string containing the environment name that should be used. If none has been set yet, the function will create one in global environment following the standard criteria:

  • 'tumor_data_base_data_type_tumor_data' or

  • 'tumor_data_base_data_type_both_data' (for tumor and not tumor data in separated matrices).

tumor

A character string contaning one of the 33 tumors available in the TCGA project. For instance, the "BRCA" stands for breast cancer.

data_base

A character string specifying "GDC" for GDC Data Portal or "legacy" for GDC Legacy Archive.

work_dir

A character string specifying the path to work directory.

name

A character string indicating the desired values to be used in next analysis. For instance, "HIF3A" in the legacy gene expression matrix, "mir-1307" in the miRNA quantification matrix, or "HER2" in the protein quantification matrix.

Value

the groups generated after using the mclust analysis.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# data already downloaded using the 'download_gdc' function
concatenate_expression("gene",
   name = "HIF3A",
   data_base = "legacy",
   tumor = "CHOL",
   work_dir = "~/Desktop"
)

# separating gene HIF3A expression data patients in two groups
groups_identification_mclust("gene", 2,
   name = "HIF3A",
   modelName = "E",
   env = CHOL_LEGACY_gene_tumor_data,
   tumor = "CHOL"
)

Facottons/DOAGDC documentation built on April 7, 2020, 3:17 a.m.