seqkat: SeqKat

Description Usage Arguments Details Author(s) Examples

View source: R/seqkat.R

Description

Kataegis detection from SNV BED files

Usage

1
2
3
seqkat(sigcutoff = 5, mutdistance = 3.2, segnum = 4, ref.dir = NULL,
  bed.file = "./", output.dir = "./", chromosome = "all",
  chromosome.length.file = NULL, trinucleotide.count.file = NULL)

Arguments

sigcutoff

The minimum hypermutation score used to classify the windows in the sliding binomial test as significant windows. The score is calculated per window as follows: -log10(binomial test p-value). Recommended value: 5

mutdistance

The maximum intermutational distance allowed for SNVs to be grouped in the same kataegic event. Recommended value: 3.2

segnum

Minimum mutation count. The minimum number of mutations required within a cluster to be identified as kataegic. Recommended value: 4

ref.dir

Path to a directory containing the reference genome. Each chromosome should have its own .fa file and chromosomes X and Y are named as chr23 and chr24. The fasta files should contain no header

bed.file

Path to the SNV BED file. The BED file should contain the following information: Chromosome, Position, Reference allele, Alternate allele

output.dir

Path to a directory where output will be created.

chromosome

The chromosome to be analysed. This can be (1, 2, ..., 23, 24) or "all" to run sequentially on all chromosomes.

chromosome.length.file

A tab separated file containing the lengths of all chromosomes in the reference genome.

trinucleotide.count.file

A tab seprarated file containing a count of all trinucleotides present in the reference genome. This can be generated with the get.trinucleotide.counts() function in this package.

Details

The default paramters in SeqKat have been optimized using Alexanrov's "Signatures of mutational processes in human cancer" dataset. SeqKat accepts a BED file and outputs the results in TXT format. A file per chromosome is generated if a kataegic event is detected, otherwise no file is generated. SeqKat reports two scores per kataegic event, a hypermutation score and an APOBEC mediated kataegic score.

Author(s)

Fouad Yousif

Fan Fan

Christopher Lalansingh

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
example.bed.file <- paste0(
	path.package("SeqKat"),
	"/extdata/test/PD4120a-chr4-1-2000000_test_snvs.bed"
	);
example.ref.dir <- paste0(
	path.package("SeqKat"),
	"/extdata/test/ref/"
	);
example.chromosome.length.file <- paste0(
	path.package("SeqKat"),
	"/extdata/test/length_hg19_chr_test.txt"
	);
seqkat(
	5,
	3.2,
	2,
	bed.file = example.bed.file,
	output.dir = tempdir(),
	chromosome = "4",
	ref.dir = example.ref.dir,
	chromosome.length.file = example.chromosome.length.file
	);

Example output

Loading required package: foreach
Loading required package: doParallel
Loading required package: iterators
Loading required package: parallel
[1] "/usr/local/lib/R/site-library/SeqKat/extdata/test/PD4120a-chr4-1-2000000_test_snvs.bed"
Testing Chromosome 4
Warning messages:
1: In seqkat(5, 3.2, 2, bed.file = example.bed.file, output.dir = tempdir(),  :
  No trinucleotide.count.file provided, using hg19 counts by default. This file can be generated using the get.trinucleotide.counts() function in this package.
2: In ref.mut[, 2]/ref.genome[, 2] :
  longer object length is not a multiple of shorter object length

SeqKat documentation built on March 13, 2020, 1:59 a.m.