XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet...
In Bioconductor/Biostrings: Efficient manipulation of biological strings

XStringSet-comparison

R Documentation

Comparing and ordering the elements in one or more XStringSet objects

Description

Methods for comparing and ordering the elements in one or more XStringSet objects.

Details

Element-wise (aka "parallel") comparison of 2 XStringSet objects is based on the lexicographic order between 2 BString, DNAString, RNAString, or AAString objects.

For DNAStringSet and RNAStringSet objects, the letters in the respective alphabets (i.e. DNA_ALPHABET and RNA_ALPHABET) are ordered based on a predefined code assigned to each letter. The code assigned to each letter can be retrieved with:

  dna_codes <- as.integer(DNAString(paste(DNA_ALPHABET, collapse="")))
  names(dna_codes) <- DNA_ALPHABET

  rna_codes <- as.integer(RNAString(paste(RNA_ALPHABET, collapse="")))
  names(rna_codes) <- RNA_ALPHABET

Note that this order does NOT depend on the locale in use. Also note that comparing DNA sequences with RNA sequences is supported and in that case T and U are considered to be the same letter.

For BStringSet and AAStringSet objects, the alphabetical order is defined by the C collation. Note that, at the moment, AAStringSet objects are treated like BStringSet objects i.e. the alphabetical order is NOT defined by the order of the letters in AA_ALPHABET. This might change at some point.

`pcompare()` and related methods

In the code snippets below, x and y are XStringSet objects.

pcompare(x, y):: Performs element-wise (aka "parallel") comparison of x and y, that is, returns an integer vector where the i-th element is less than, equal to, or greater than zero if the i-th element in x is considered to be respectively less than, equal to, or greater than the i-th element in y. If x and y don't have the same length, then the shortest is recycled to the length of the longest (the standard recycling rules apply).
x == y, x != y, x <= y, x >= y, x < y, x > y:: Equivalent to pcompare(x, y) == 0, pcompare(x, y) != 0, pcompare(x, y) <= 0, pcompare(x, y) >= 0, pcompare(x, y) < 0, and pcompare(x, y) > 0, respectively.

`order()` and related methods

In the code snippets below, x is an XStringSet object.

is.unsorted(x, strictly=FALSE):: Return a logical values specifying if x is unsorted. The strictly argument takes logical value indicating if the check should be for _strictly_ increasing values.
order(x, decreasing=FALSE):: Return a permutation which rearranges x into ascending or descending order.
rank(x, ties.method=c("first", "min")):: Rank x in ascending order.
sort(x, decreasing=FALSE):: Sort x into ascending or descending order.

`duplicated()` and `unique()`

In the code snippets below, x is an XStringSet object.

duplicated(x):: Return a logical vector whose elements denotes duplicates in x.
unique(x):: Return the subset of x made of its unique elements.

`match()` and `%in%`

In the code snippets below, x and table are XStringSet objects.

match(x, table, nomatch=NA_integer_):: Returns an integer vector containing the first positions of an identical match in table for the elements in x.
x %in% table:: Returns a logical vector indicating which elements in x match identically with an element in table.

`is.na()` and related methods

In the code snippets below, x is an XStringSet object. An XStringSet object never contains missing values (these methods exist for compatibility).

is.na(x):: Returns FALSE for every element.
anyNA(x):: Returns FALSE.

Author(s)

H. Pagès

Examples

## ---------------------------------------------------------------------
## A. SIMPLE EXAMPLES
## ---------------------------------------------------------------------

dna <- DNAStringSet(c("AAA", "TC", "", "TC", "AAA", "CAAC", "G"))
match(c("", "G", "AA", "TC"), dna)

library(drosophila2probe)
fly_probes <- DNAStringSet(drosophila2probe)
sum(duplicated(fly_probes))  # 481 duplicated probes

is.unsorted(fly_probes)  # TRUE
fly_probes <- sort(fly_probes)
is.unsorted(fly_probes)  # FALSE
is.unsorted(fly_probes, strictly=TRUE)  # TRUE, because of duplicates
is.unsorted(unique(fly_probes), strictly=TRUE)  # FALSE

## Nb of probes that are the reverse complement of another probe:
nb1 <- sum(reverseComplement(fly_probes) %in% fly_probes)
stopifnot(identical(nb1, 455L))  # 455 probes

## Probes shared between drosophila2probe and hgu95av2probe:
library(hgu95av2probe)
human_probes <- DNAStringSet(hgu95av2probe)
m <- match(fly_probes, human_probes)
stopifnot(identical(sum(!is.na(m)), 493L))  # 493 shared probes

## ---------------------------------------------------------------------
## B. AN ADVANCED EXAMPLE
## ---------------------------------------------------------------------
## We want to compare the first 5 bases with the 5 last bases of each
## probe in drosophila2probe. More precisely, we want to compute the
## percentage of probes for which the first 5 bases are the reverse
## complement of the 5 last bases.

library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe)

first5 <- narrow(probes, end=5)
last5 <- narrow(probes, start=-5)
nb2 <- sum(first5 == reverseComplement(last5))
stopifnot(identical(nb2, 17L))

## Percentage:
100 * nb2 / length(probes)  # 0.0064 %

## If the probes were random DNA sequences, a probe would have 1 chance
## out of 4^5 to have this property so the percentage would be:
100 / 4^5  # 0.098 %

## With randomly generated probes:
set.seed(33)
random_dna <- sample(DNAString(paste(DNA_BASES, collapse="")),
                     sum(width(probes)), replace=TRUE)
random_probes <- successiveViews(random_dna, width(probes))
random_probes
random_probes <- as(random_probes, "XStringSet")
random_probes

random_first5 <- narrow(random_probes, end=5)
random_last5 <- narrow(random_probes, start=-5)

nb3 <- sum(random_first5 == reverseComplement(random_last5))
100 * nb3 / length(random_probes)  # 0.099 %

Bioconductor/Biostrings documentation built on June 10, 2025, 1:14 p.m.

Bioconductor/Biostrings index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Bioconductor/Biostrings
Efficient manipulation of biological strings

XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet...
In Bioconductor/Biostrings: Efficient manipulation of biological strings

Comparing and ordering the elements in one or more XStringSet objects

Description

Details

`pcompare()` and related methods

`order()` and related methods

`duplicated()` and `unique()`

`match()` and `%in%`

`is.na()` and related methods

Author(s)

See Also

Examples

Related to XStringSet-comparison in Bioconductor/Biostrings...

R Package Documentation

Browse R Packages

We want your feedback!

Bioconductor/Biostrings Efficient manipulation of biological strings

XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet... In Bioconductor/Biostrings: Efficient manipulation of biological strings

Comparing and ordering the elements in one or more XStringSet objects

Description

Details

pcompare() and related methods

order() and related methods

duplicated() and unique()

match() and %in%

is.na() and related methods

Author(s)

See Also

Examples

Related to XStringSet-comparison in Bioconductor/Biostrings...

R Package Documentation

Browse R Packages

We want your feedback!

Bioconductor/Biostrings
Efficient manipulation of biological strings

XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet...
In Bioconductor/Biostrings: Efficient manipulation of biological strings

`pcompare()` and related methods

`order()` and related methods

`duplicated()` and `unique()`

`match()` and `%in%`

`is.na()` and related methods