get_table_from_maf: Produce a Mutation Matrix from a MAF

View source: R/maf_to_tables.R

get_table_from_mafR Documentation

Produce a Mutation Matrix from a MAF

Description

A function to, given a mutation annotation dataset with columns for sample barcode, gene name and mutation type, to reformulate this as a mutation matrix, with rows denoting samples, columns denoting gene/mutation type combinations, and the individual entries giving the number of mutations observed. This will likely be very sparse, so we save it as a sparse matrix for efficiency.

Usage

get_table_from_maf(
  maf,
  sample_list = NULL,
  gene_list = NULL,
  acceptable_genes = NULL,
  for_biomarker = "TIB",
  include_synonymous = TRUE,
  dictionary = NULL
)

Arguments

maf

(dataframe) A table of annotated mutations containing the columns 'Tumor_Sample_Barcode', 'Hugo_Symbol', and 'Variant_Classification'.

sample_list

(character) Optional parameter specifying the set of samples to include in the mutation matrix.

gene_list

(character) Optional parameter specifying the set of genes to include in the mutation matrix.

acceptable_genes

(character) Optional parameter specifying a set of acceptable genes, for example those which are in an ensembl databse.

for_biomarker

(character) Used for defining a dictionary of mutations. See the function get_mutation_dictionary() for details.

include_synonymous

(logical) Optional parameter specifying whether to include synonymous mutations in the mutation matrix.

dictionary

(character) Optional parameter directly specifying the mutation dictionary to use. See the function get_mutation_dictionary() for details.

Value

A list with the following entries:

  • matrix: A mutation matrix, a sparse matrix showing the number of mutations present in each sample, gene and mutation type.

  • sample_list: A vector of characters specifying the samples included in the matrix: the rows of the mutation matrix correspond to each of these.

  • gene_list: A vector of characters specifying the the genes included in the matrix.

  • mut_types_list: A vector of characters specifying the mutation types (as grouped into an appropriate dictionary) to be included in the matrix.

  • col_names: A vector of characters identifying the columns of the mutation matrix. Each entry will be comprised of two parts separated by the character '_', the first identifying the gene in question and the second identifying the mutation type. E.g. 'GENE1_NS" where 'GENE1' is an element of gene_list, and 'NS' is an element of the dictionary vector.

Examples

# We use the preloaded maf file example_maf_data
# Now we make a mutation matrix
table <- get_table_from_maf(example_maf_data$maf, sample_list = paste0("SAMPLE_", 1:100))

print(names(table))
print(table$matrix[1:10,1:10])
print(table$col_names[1:10])

cobrbra/ICBioMark documentation built on May 4, 2023, 2:16 a.m.