Oligo Profiles and Oligo Profile Correlation Plots of Nucleotide Sequences

Share:

Description

Construct a k-mer oligo profile of a nucleotide sequence and print such a profile or its reverse complement. There is also a plot function for producing plots of the profile or its reverse complement and for comparing primary and complementary strand profiles.

Usage

1
2
3
4
5
6
7
8
9
oligoProfile(x, k, content=c("dna", "rna"), 
case=c("lower", "upper", "as is"), circular=TRUE, disambiguate=TRUE, 
plot=TRUE, ...)
## S3 method for class 'OligoProfile'
plot(x, which=1L, units=c("percentage", "count", "proportion"),
 main=NULL, xlab=NULL, ylab=NULL, ...)
## S3 method for class 'OligoProfile'
print(x, which=1L, units=c("percentage", "count", "proportion"), 
digits=switch(units, percentage=3L, count=NULL, proportion=3L), ...)

Arguments

x

a character vector or an object that can be coersed to a character vector.

k

the k-mer profile to produce.

content

The content type (“dna” or “rna”) of the input sequence. oligoProfile can often detect this automatically based on the presence/absence of t's or u's, but if neither is present, the content argument is consulted. The default value is “dna”.

case

determines how labels for the array should be generated: in lowercase, in uppercase or left as is, in which case labels such as “b” and “B” will be seen as distinct symbols and counted separately.

circular

Determines if the vector should be treated as circular or not. The default is TRUE, meaning that the start and end of the sequence will be joined together for the purpose of counting.

disambiguate

if set to the default of true, makes the input sequence unambiguous before generating the profile. Otherwise, ambiguous symbols are treated like any other symbols and k-mer counts including them will be computed.

plot

should a plot of the profile be produced? The default is TRUE.

which

For print, specifies whether to display the profile for the sequence used to generate the OligoProfile object (1) or the profile of its reverse complement (2).

For the plot method, which determines what should be plotted. Values 1 and 2 cause the profile for the original sequence (primary strand) or its reverse complement (complementary strand) to be plotted, respectively. Specifying which=3 will plot a comparison of the two profiles which can be used to assess compliance with Chargaff's second parity rule.

the which argument may also be specified when calling oligoProfile, in which case it will be passed on to the plot method if the plot argument is set to TRUE.

units

The oligo profiles can be scaled according to three different units for presentation on plots: “percentage”, “count” or “proportion”. The default is “percentage”.

main

The title of the plot. See plot.default. If not specified, an appropriate title is automatically generated.

xlab

a label for the x-axis of the plot. See plot.default. If not specified, an appropriate label is automatically generated.

ylab

a label for the y-axis of the plot. See plot.default. If not specified, an appropriate label is automatically generated.

digits

The number of significant digits to print. The default is 0L when units is set to “count” and 3L otherwise.

...

arguments to be passed from or to other functions

Details

This function returns the oligo profile for a sequence in an OligoProfile object, which is printed on screen if the plot parameter is FALSE. An oligo profile is simply the counts of all k-mers in a sequence for some specified value of k.

By default, oligoProfile produces a plot of the oligo profile expressed in terms of percentages. The plot argument determines if the plot should be generated or not and plotting parameters such as main, sub, etc., may be passed as arguments to the function when plot is TRUE.

The plot method, either called directly or indirectly via the oligoProfile function, can produce either the oligo profile of x (which = 1), the oligo profile of its reverse complement (which = 2), or an interstrand k-mer correlation plot comparing the k-oligo profile ofx with that of its reverse complement (which = 3). Such

Correlation plots effectively show the relationship between k-mers on the primary and complementary strands in a DNA duplex and can be used to assess compliance with CSPR. More precisely, one would conclude that a genomic sequence complies with CSPR if all the plotted points lie on a diagonal line running from the bottom-left corner to the top-right corner of the graph.

Value

A list with class “OligoProfile” containing the following components:

name

a name to identify the source of the profile.

wordLength

the value of k used to derive the k-mer profile.

content

indicates if the profile pertains to a DNA or RNA sequence.

case

indicates how the case of letters was processed before producing the profile.

circular

indicates whether or not the sequence was considered circular for the purpose of producing the profile.

disambiguate

indicates if the sequence was made unambiguous before producing the profile.

profile

a vector containing the raw counts (frequencies) of all k-mers.

Author(s)

Andrew Hart and Servet Mart<ed>nez

References

Albrecht-Buehler, G. (2006) Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions. PNAS 103(47), 17828–17833.

See Also

pair.counts, triple.counts, quadruple.counts, cylinder.counts, array2vector, table2vector, disambiguate

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(nanoarchaeum)
#Get the 3-oligo profile of Nanoarchaeum without plotting it
nano.prof <- oligoProfile(nanoarchaeum, 3, plot=FALSE)
nano.prof #print oligo profile as percentages
print(nano.prof, units="count") #print oligo profile as counts
plot(nano.prof) #oligo profile plotted as percentages
plot(nano.prof, units="count") #plot it as counts

#plot the 2-oligo profile of Nanoarchaeum as proportions
oligoProfile(nanoarchaeum, k=3, units="proportion")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.