XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet...
In Biostrings: Efficient manipulation of biological strings

Description Details pcompare() and related methods order() and related methods duplicated() and unique() match() and %in% is.na() and related methods Author(s) See Also Examples

Methods for comparing and ordering the elements in one or more XStringSet objects.

Element-wise (aka "parallel") comparison of 2 XStringSet objects is based on the lexicographic order between 2 BString, DNAString, RNAString, or AAString objects.

For DNAStringSet and RNAStringSet objects, the letters in the respective alphabets (i.e. DNA_ALPHABET and RNA_ALPHABET) are ordered based on a predefined code assigned to each letter. The code assigned to each letter can be retrieved with:

  dna_codes <- as.integer(DNAString(paste(DNA_ALPHABET, collapse="")))
  names(dna_codes) <- DNA_ALPHABET

  rna_codes <- as.integer(RNAString(paste(RNA_ALPHABET, collapse="")))
  names(rna_codes) <- RNA_ALPHABET

Note that this order does NOT depend on the locale in use. Also note that comparing DNA sequences with RNA sequences is supported and in that case T and U are considered to be the same letter.

For BStringSet and AAStringSet objects, the alphabetical order is defined by the C collation. Note that, at the moment, AAStringSet objects are treated like BStringSet objects i.e. the alphabetical order is NOT defined by the order of the letters in AA_ALPHABET. This might change at some point.

`pcompare()` and related methods

In the code snippets below, x and y are XStringSet objects.

: pcompare(x, y): Performs element-wise (aka "parallel") comparison of x and y, that is, returns an integer vector where the i-th element is less than, equal to, or greater than zero if the i-th element in x is considered to be respectively less than, equal to, or greater than the i-th element in y. If x and y don't have the same length, then the shortest is recycled to the length of the longest (the standard recycling rules apply).
: x == y, x != y, x <= y, x >= y, x < y, x > y: Equivalent to pcompare(x, y) == 0, pcompare(x, y) != 0, pcompare(x, y) <= 0, pcompare(x, y) >= 0, pcompare(x, y) < 0, and pcompare(x, y) > 0, respectively.

`order()` and related methods

In the code snippets below, x is an XStringSet object.

: is.unsorted(x, strictly=FALSE): Return a logical values specifying if x is unsorted. The strictly argument takes logical value indicating if the check should be for _strictly_ increasing values.
: order(x, decreasing=FALSE): Return a permutation which rearranges x into ascending or descending order.
: rank(x, ties.method=c("first", "min")): Rank x in ascending order.
: sort(x, decreasing=FALSE): Sort x into ascending or descending order.

`duplicated()` and `unique()`

In the code snippets below, x is an XStringSet object.

: duplicated(x): Return a logical vector whose elements denotes duplicates in x.
: unique(x): Return the subset of x made of its unique elements.

`match()` and `%in%`

In the code snippets below, x and table are XStringSet objects.

: match(x, table, nomatch=NA_integer_): Returns an integer vector containing the first positions of an identical match in table for the elements in x.
: x %in% table: Returns a logical vector indicating which elements in x match identically with an element in table.

`is.na()` and related methods

In the code snippets below, x is an XStringSet object. An XStringSet object never contains missing values (these methods exist for compatibility).

: is.na(x): Returns FALSE for every element.
: anyNA(x): Returns FALSE.

H. Pag<c3><a8>s

XStringSet-class, ==, is.unsorted, order, rank, sort, duplicated, unique, match, %in%

## ---------------------------------------------------------------------
## A. SIMPLE EXAMPLES
## ---------------------------------------------------------------------

dna <- DNAStringSet(c("AAA", "TC", "", "TC", "AAA", "CAAC", "G"))
match(c("", "G", "AA", "TC"), dna)

library(drosophila2probe)
fly_probes <- DNAStringSet(drosophila2probe)
sum(duplicated(fly_probes))  # 481 duplicated probes

is.unsorted(fly_probes)  # TRUE
fly_probes <- sort(fly_probes)
is.unsorted(fly_probes)  # FALSE
is.unsorted(fly_probes, strictly=TRUE)  # TRUE, because of duplicates
is.unsorted(unique(fly_probes), strictly=TRUE)  # FALSE

## Nb of probes that are the reverse complement of another probe:
nb1 <- sum(reverseComplement(fly_probes) %in% fly_probes)
stopifnot(identical(nb1, 455L))  # 455 probes

## Probes shared between drosophila2probe and hgu95av2probe:
library(hgu95av2probe)
human_probes <- DNAStringSet(hgu95av2probe)
m <- match(fly_probes, human_probes)
stopifnot(identical(sum(!is.na(m)), 493L))  # 493 shared probes

## ---------------------------------------------------------------------
## B. AN ADVANCED EXAMPLE
## ---------------------------------------------------------------------
## We want to compare the first 5 bases with the 5 last bases of each
## probe in drosophila2probe. More precisely, we want to compute the
## percentage of probes for which the first 5 bases are the reverse
## complement of the 5 last bases.

library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe)

first5 <- narrow(probes, end=5)
last5 <- narrow(probes, start=-5)
nb2 <- sum(first5 == reverseComplement(last5))
stopifnot(identical(nb2, 17L))

## Percentage:
100 * nb2 / length(probes)  # 0.0064 %

## If the probes were random DNA sequences, a probe would have 1 chance
## out of 4^5 to have this property so the percentage would be:
100 / 4^5  # 0.098 %

## With randomly generated probes:
set.seed(33)
random_dna <- sample(DNAString(paste(DNA_BASES, collapse="")),
                     sum(width(probes)), replace=TRUE)
random_probes <- successiveViews(random_dna, width(probes))
random_probes
random_probes <- as(random_probes, "XStringSet")
random_probes

random_first5 <- narrow(random_probes, end=5)
random_last5 <- narrow(random_probes, start=-5)

nb3 <- sum(random_first5 == reverseComplement(random_last5))
100 * nb3 / length(random_probes)  # 0.099 %

Biostrings documentation built on Nov. 8, 2020, 11:12 p.m.

Biostrings index

README.md A short presentation of the basic classes defined in Biostrings 2 Biostrings Quick Overview Handling probe sequence information Multiple Alignments Pairwise Sequence Alignments

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Biostrings
Efficient manipulation of biological strings

XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet...
In Biostrings: Efficient manipulation of biological strings

Description

Details

`pcompare()` and related methods

`order()` and related methods

`duplicated()` and `unique()`

`match()` and `%in%`

`is.na()` and related methods

Author(s)

See Also

Examples

Related to XStringSet-comparison in Biostrings...

R Package Documentation

Browse R Packages

We want your feedback!

Biostrings Efficient manipulation of biological strings

XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet... In Biostrings: Efficient manipulation of biological strings

Description

Details

pcompare() and related methods

order() and related methods

duplicated() and unique()

match() and %in%

is.na() and related methods

Author(s)

See Also

Examples

Related to XStringSet-comparison in Biostrings...

R Package Documentation

Browse R Packages

We want your feedback!

Biostrings
Efficient manipulation of biological strings

XStringSet-comparison: Comparing and ordering the elements in one or more XStringSet...
In Biostrings: Efficient manipulation of biological strings

`pcompare()` and related methods

`order()` and related methods

`duplicated()` and `unique()`

`match()` and `%in%`

`is.na()` and related methods