run_kmer_frequency_normalization: Provide normalized correction factors for kmer content

Description Usage Arguments Value See Also Examples

View source: R/interact_foreign.R

Description

This function is analogous to normalizeMotifs. If an analysis of mutational signatures is performed on e.g. Whole Exome Sequencing (WES) data, the signatures and exposures have to be adapted to the potentially different kmer (trinucleotide) content of the target capture. The present function takes as arguments paths to the used reference genome and target capture file. It the extracts the sequence of the target capture by calling bedtools getfasta on the system command prompt. run_kmer_frequency_normalization then calls a custom made perl script kmer_frequencies.pl also included in this package to count the occurences of the tripletts in both the whole reference genome and the created target capture sequence. These counts are used for normalization as in normalizeMotifs. Note that kmerFrequency provides a solution to approximate kmer frequencies by random sampling. As opposed to that approach, the function described here deterministically counts all occurences of the kmers in the respective genome.

Usage

1
2
3
4
5
6
7
run_kmer_frequency_normalization(
  in_ref_genome_fasta,
  in_target_capture_bed,
  in_word_length,
  project_folder,
  in_verbose = 1
)

Arguments

in_ref_genome_fasta

Path to the reference genome fasta file used.

in_target_capture_bed

Path to a bed file containing the information on the used target capture. May also be a compressed bed.

in_word_length

Integer number defining the length of the features or motifs, e.g. 3 for tripletts or 5 for pentamers

project_folder

Path where the created files, especially the fasta file with the sequence of the target capture and the count matrices, can be stored.

in_verbose

Verbose if in_verbose=1

Value

A numeric vector with correction factors

See Also

normalizeMotifs

Examples

1

YAPSA documentation built on Nov. 8, 2020, 4:59 p.m.