Description Details pcompare() and related methods order() and related methods duplicated() and unique() match() and %in% is.na() and related methods Author(s) See Also Examples
Methods for comparing and ordering the elements in one or more XStringSet objects.
Element-wise (aka "parallel") comparison of 2 XStringSet objects is based on the lexicographic order between 2 BString, DNAString, RNAString, or AAString objects.
For DNAStringSet and RNAStringSet objects, the letters in the respective alphabets (i.e. DNA_ALPHABET and RNA_ALPHABET) are ordered based on a predefined code assigned to each letter. The code assigned to each letter can be retrieved with:
1 2 3 4 5 6 | dna_codes <- as.integer(DNAString(paste(DNA_ALPHABET, collapse="")))
names(dna_codes) <- DNA_ALPHABET
rna_codes <- as.integer(RNAString(paste(RNA_ALPHABET, collapse="")))
names(rna_codes) <- RNA_ALPHABET
|
Note that this order does NOT depend on the locale in use. Also note that comparing DNA sequences with RNA sequences is supported and in that case T and U are considered to be the same letter.
For BStringSet and AAStringSet objects, the alphabetical order is defined by the C collation. Note that, at the moment, AAStringSet objects are treated like BStringSet objects i.e. the alphabetical order is NOT defined by the order of the letters in AA_ALPHABET. This might change at some point.
pcompare()
and related methodsIn the code snippets below,
x
and y
are XStringSet objects.
pcompare(x, y)
:
Performs element-wise (aka "parallel") comparison of x
and
y
, that is, returns an integer vector where the i-th element
is less than, equal to, or greater than zero if the i-th element in
x
is considered to be respectively less than, equal to, or
greater than the i-th element in y
.
If x
and y
don't have the same length, then the shortest
is recycled to the length of the longest (the standard recycling rules
apply).
x == y
, x != y
, x <= y
, x >= y
,
x < y
, x > y
:
Equivalent to pcompare(x, y) == 0
, pcompare(x, y) != 0
,
pcompare(x, y) <= 0
, pcompare(x, y) >= 0
,
pcompare(x, y) < 0
, and pcompare(x, y) > 0
, respectively.
order()
and related methodsIn the code snippets below, x
is an XStringSet object.
is.unsorted(x, strictly=FALSE)
:
Return a logical values specifying if x
is unsorted. The
strictly
argument takes logical value indicating if the check
should be for _strictly_ increasing values.
order(x, decreasing=FALSE)
:
Return a permutation which rearranges x
into ascending or
descending order.
rank(x, ties.method=c("first", "min"))
:
Rank x
in ascending order.
sort(x, decreasing=FALSE)
:
Sort x
into ascending or descending order.
duplicated()
and unique()
In the code snippets below, x
is an XStringSet object.
duplicated(x)
:
Return a logical vector whose elements denotes duplicates in x
.
unique(x)
:
Return the subset of x
made of its unique elements.
match()
and %in%
In the code snippets below,
x
and table
are XStringSet objects.
match(x, table, nomatch=NA_integer_)
:
Returns an integer vector containing the first positions of an identical
match in table
for the elements in x
.
x %in% table
:
Returns a logical vector indicating which elements in x
match
identically with an element in table
.
is.na()
and related methodsIn the code snippets below, x
is an XStringSet
object. An XStringSet
object never contains missing values
(these methods exist for compatibility).
is.na(x)
: Returns FALSE
for every element.
anyNA(x)
: Returns FALSE
.
H. Pagès
XStringSet-class,
==
,
is.unsorted
,
order
,
rank
,
sort
,
duplicated
,
unique
,
match
,
%in%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ## ---------------------------------------------------------------------
## A. SIMPLE EXAMPLES
## ---------------------------------------------------------------------
dna <- DNAStringSet(c("AAA", "TC", "", "TC", "AAA", "CAAC", "G"))
match(c("", "G", "AA", "TC"), dna)
library(drosophila2probe)
fly_probes <- DNAStringSet(drosophila2probe)
sum(duplicated(fly_probes)) # 481 duplicated probes
is.unsorted(fly_probes) # TRUE
fly_probes <- sort(fly_probes)
is.unsorted(fly_probes) # FALSE
is.unsorted(fly_probes, strictly=TRUE) # TRUE, because of duplicates
is.unsorted(unique(fly_probes), strictly=TRUE) # FALSE
## Nb of probes that are the reverse complement of another probe:
nb1 <- sum(reverseComplement(fly_probes) %in% fly_probes)
stopifnot(identical(nb1, 455L)) # 455 probes
## Probes shared between drosophila2probe and hgu95av2probe:
library(hgu95av2probe)
human_probes <- DNAStringSet(hgu95av2probe)
m <- match(fly_probes, human_probes)
stopifnot(identical(sum(!is.na(m)), 493L)) # 493 shared probes
## ---------------------------------------------------------------------
## B. AN ADVANCED EXAMPLE
## ---------------------------------------------------------------------
## We want to compare the first 5 bases with the 5 last bases of each
## probe in drosophila2probe. More precisely, we want to compute the
## percentage of probes for which the first 5 bases are the reverse
## complement of the 5 last bases.
library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe)
first5 <- narrow(probes, end=5)
last5 <- narrow(probes, start=-5)
nb2 <- sum(first5 == reverseComplement(last5))
stopifnot(identical(nb2, 17L))
## Percentage:
100 * nb2 / length(probes) # 0.0064 %
## If the probes were random DNA sequences, a probe would have 1 chance
## out of 4^5 to have this property so the percentage would be:
100 / 4^5 # 0.098 %
## With randomly generated probes:
set.seed(33)
random_dna <- sample(DNAString(paste(DNA_BASES, collapse="")),
sum(width(probes)), replace=TRUE)
random_probes <- successiveViews(random_dna, width(probes))
random_probes
random_probes <- as(random_probes, "XStringSet")
random_probes
random_first5 <- narrow(random_probes, end=5)
random_last5 <- narrow(random_probes, start=-5)
nb3 <- sum(random_first5 == reverseComplement(random_last5))
100 * nb3 / length(random_probes) # 0.099 %
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.