get_flagged_alleles: Get Flagged Alleles

Description Usage Arguments Value Examples

View source: R/get_flagged_alleles.R

Description

Flag alleles that are present in too many samples at high variant allele frequencies as potential errors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
get_flagged_alleles(
  sample_names,
  sample_paths,
  genome,
  recurrent_mutations = NA,
  memory_saving = FALSE,
  starting_percentile = 99,
  interval = 0.001,
  MAPQcutoff = 59
)

Arguments

sample_names

Character vector with the names of the samples

sample_paths

Character vector with the paths of the samples

recurrent_mutations

VRanges object with chr, pos ref, alt of frequently mutated alleles to remove from model input. GRanges objects are also accepted, in which case filtering will occur by position.

memory_saving

Logical. Option to save memory if you have a lot of samples (e.g. >500 with a 16Gb RAM machine), but takes twice as long

starting_percentile

Lower VAF percentile to start looking for alleles to flag. Default is 99, but can use 95 if you want to flag more alleles (more conservative)

interval

VAF interval to iterate through for flagging alleles. Default is 0.001

MAPQcutoff

Minimum acceptable MAPQ score; positions below this cutoff will be excluded. Default is 59

Value

This function returns a VRanges object with the following information:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
# get list of file names
file_names <- list.files(path = "./data/", pattern = "sample")
hemeCOSMIC_3 <- load_recurrent_mutations("example_data/COSMIC_heme_freq3.txt", genome = "hg19")

# sample names are first 10 characters of file name
all_sample_names <- substr(file_names, 1, 10)

# file paths are dir/file_name
all_sample_paths <- paste0("./data/", file_names)

# get flagged alleles
flagged_alleles <- get_flagged_alleles(all_sample_names, all_sample_paths,
    recurrent_mutations = hemeCOSMIC_3, memory_saving = FALSE)

## End(Not run)

andygxzeng/ECSI documentation built on Feb. 6, 2021, 8:53 a.m.