This function is analogous to
normalizeMotifs. If an analysis of
mutational signatures is performed on e.g. Whole Exome Sequencing (WES)
data, the signatures and exposures have to be adapted to the potentially
different kmer (trinucleotide) content of the target capture. The present
function takes as arguments paths to the used reference genome and target
capture file. It the extracts the sequence of the target capture by calling
bedtools getfasta on the system command prompt.
run_kmer_frequency_normalization then calls a custom made perl
kmer_frequencies.pl also included in this package to count the
occurences of the tripletts in both the whole reference genome and the
created target capture sequence. These counts are used for normalization as
normalizeMotifs. Note that
kmerFrequency provides a solution to
approximate kmer frequencies by random sampling. As opposed to that
approach, the function described here deterministically counts all
occurences of the kmers in the respective genome.
1 2 3
run_kmer_frequency_correction(in_ref_genome_fasta, in_target_capture_bed, in_word_length, project_folder, target_capture_fasta = "targetCapture.fa", in_verbose = 1)
Path to the reference genome fasta file used.
Path to a bed file containing the information on the used target capture. May also be a compressed bed.
Integer number defining the length of the features or motifs, e.g. 3 for tripletts or 5 for pentamers
Path where the created files, especially the fasta file with the sequence of the target capture and the count matrices, can be stored.
Name of the fasta file of the target capture to be created if not yet existent.
A list with 2 entries:
The correction factors after normalization as in
The correction factors without normalization.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.