Description Usage Arguments Value Methods (by class) Examples
This function takes a set of sequences and a k-mer of interest, map a k-mer of interest in these sequences, computes all the pairwise distances (distogram), normalize it for distance decay, and computes the resulting power spectral density of the normalized distogram.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | getPeriodicity(x, motif, ...)
## S3 method for class 'DNAStringSet'
getPeriodicity(
x,
motif,
range_spectrum = seq(1, 200),
BPPARAM = setUpBPPARAM(1),
roll = 3,
verbose = TRUE,
sample = 0,
n_shuffling = 0,
cores_shuffling = 1,
cores_computing = 1,
order = 1,
...
)
## S3 method for class 'GRanges'
getPeriodicity(x, motif, genome = "BSgenome.Celegans.UCSC.ce11", ...)
## S3 method for class 'DNAString'
getPeriodicity(x, motif, ...)
|
x |
a DNAString, DNAStringSet or GRanges object. |
motif |
a k-mer of interest |
... |
Arguments passed to S3 methods |
range_spectrum |
Numeric vector Range of the distogram to use to run the Fast Fourier Transform on (default: 1:200, i.e. all pairs of k-mers at a maximum of 200 bp from each other). |
BPPARAM |
split the workload over several processors using BiocParallel |
roll |
Integer Window to smooth the distribution of pairwise distances (default: 3, to discard the 3-bp periodicity of dinucleotides which can be very strong in vertebrate genomes) |
verbose |
Boolean |
sample |
Integer if > 0, will randomly sample this many integers from the dists vector before normalization. This ensures consistency when looking at periodicity in different genomes, since different genomes will have different GC percent |
n_shuffling |
Integer, how many times should the sequences be shuffled? (default = 0) |
cores_shuffling |
integer, Number of cores used for shuffling (used if n_shuffling > 0) |
cores_computing |
integer, split the workload over several processors using BiocParallel (used if n_shuffling > 0) |
order |
Integer, which order to take into consideration for shuffling (ushuffle python library must be installed for orders > 1) (used if n_shuffling > 0) |
genome |
genome ID, BSgenome or DNAStringSet object (optional, if x is a GRanges) |
A list containing the results of getPeriodicity function.
The dists vector is the raw vector of all distances between any possible k-mer.
The hist data.frame is the distribution of distances over range_spectrum.
The normalized_hist is the raw hist, normalized for decay over increasing distances.
The spectra object is the output of the FFT applied over normalized_hist.
The PSD data frame is the power spectral density scores over given frequencies.
The motif object is the k-mer being analysed.
The final periodicity metrics computed by getPeriodicity()
If getPeriodicity() is ran with n_shuffling > 0, the resulting list also contains PSD values computed when iterating through shuffled sequences.
DNAStringSet
: S3 method for DNAStringSet
GRanges
: S3 method for GRanges
DNAString
: S3 method for DNAString
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | data(ce11_proms_seqs)
periodicity_result <- getPeriodicity(
ce11_proms_seqs[1:100],
motif = 'TT'
)
head(periodicity_result$PSD)
plotPeriodicityResults(periodicity_result)
#
data(ce11_TSSs)
periodicity_result <- getPeriodicity(
ce11_TSSs[['Ubiq.']][1:10],
motif = 'TT',
genome = 'BSgenome.Celegans.UCSC.ce11'
)
head(periodicity_result$PSD)
plotPeriodicityResults(periodicity_result)
#
data(ce11_TSSs)
periodicity_result <- getPeriodicity(
ce11_TSSs[['Ubiq.']][1:10],
motif = 'TT',
genome = 'BSgenome.Celegans.UCSC.ce11',
n_shuffling = 10
)
head(periodicity_result$PSD)
plotPeriodicityResults(periodicity_result)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.