extend_positions: Add nucleotide positions and extract sequence contexts

Description Usage Arguments

View source: R/extend_positions.R

Description

Add genomic positions to generate a dataset of mutated and non mutated nucleotide positions. Supply known variants either thruogh a data.frame using the 'mut_df' argument, or by supplying the genomic positions and variants throu the three arguments 'chrom', 'pos' and 'alt'. Currently this function will only allow positions where all contextual nucleotides are non-ambiguous.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
extend_positions(
  genome,
  N = NULL,
  N_factor = NULL,
  mut_df = NULL,
  chrom = NULL,
  pos = NULL,
  variant = NULL,
  regions.gr = NULL,
  n_max_muts = NULL,
  flank_size = 10,
  num_retries = 10,
  is_pyrbased = F,
  olap = TRUE,
  remove_ambiguous = TRUE
)

Arguments

genome

BSgenome object. The genome which the regions and annotated positions refer to

N

Integer. Number of positions to include in final dataset. Can be used instead of 'N_factor'

N_factor

Integer. Multiplied with the number of mutations (after subsetting) to reach a target size (instead of specifying directly with 'N')

mut_df

data.frame. A data frame containing positions and variant annotations in three columns: 'chrom', 'pos' and 'alt'

chrom

Character. A vector of chromosomes. Not necessary if 'mut_df' is supplied

pos

Integer. A vector of positions. Not necessary if 'mut_df' is supplied

variant

Character. A vector of variants. Not necessary if 'mut_df' is supplied

regions.gr

Optional GRanges object. Genomic regions from which to draw random positions

n_max_muts

Optional Integer. Filter mutation dataset to only include 'n_max_muts' number of mutations

flank_size

Integer. Size of region on each side of mutated and sampled positions to include as contextual regions

num_retries

Int

is_pyrbased

Bool

olap

Bool. Overlap the current mutations with regions?

remove_ambiguous

Remove positions where contextual sequences contains ambiguous nucleotides? If so, new positions will be sampled to reach 'N' positions in total


lindberg-m/contextendR documentation built on Jan. 8, 2022, 3:16 a.m.