get_mutation_tables: Produce Training, Validation and Test Matrices

Description Usage Arguments Value Examples

View source: R/maf_to_tables.R

Description

This function allows for i) separation of a mutation dataset into training, validation and testing components, and ii) conversion from annotated mutation format to sparse mutation matrices, as described in the function get_table_from_maf().

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
get_mutation_tables(
  maf,
  split = c(train = 0.7, val = 0.15, test = 0.15),
  sample_list = NULL,
  gene_list = NULL,
  acceptable_genes = NULL,
  for_biomarker = "TIB",
  include_synonymous = TRUE,
  dictionary = NULL,
  seed_id = 1234
)

Arguments

maf

(dataframe) A table of annotated mutations containing the columns 'Tumor_Sample_Barcode', 'Hugo_Symbol', and 'Variant_Classification'.

split

(double) A vector of three positive values with names 'train', 'val' and 'test'. Specifies the proportions into which to split the dataset.

sample_list

sample_list (character) Optional parameter specifying the set of samples to include in the mutation matrices.

gene_list

(character) Optional parameter specifying the set of genes to include in the mutation matrices.

acceptable_genes

(character) Optional parameter specifying a set of acceptable genes, for example those which are in an ensembl databse.

for_biomarker

(character) Used for defining a dictionary of mutations. See the function get_mutation_dictionary() for details.

include_synonymous

(logical) Optional parameter specifying whether to include synonymous mutations in the mutation matrices.

dictionary

(character) Optional parameter directly specifying the mutation dictionary to use. See the function get_mutation_dictionary() for details.

seed_id

(numeric) Input value for the function set.seed().

Value

A list of three items with names 'train', 'val' and 'test'. Each element will contain a sparse mutation matrix for the samples in that branch, alongside other information as described as the output of the function get_table_from_maf().

Examples

1
2
3
4
tables <- get_mutation_tables(example_maf_data$maf, sample_list = paste0("SAMPLE_", 1:100))

print(names(tables))
print(names(tables$train))

ICBioMark documentation built on Nov. 15, 2021, 5:09 p.m.