View source: R/encodeSequences.R
encodeSequences | R Documentation |
Encode short nucleotide sequences into integers with a 2-bit encoding.
encodeSequences(sequences)
sequences |
A character vector of short nucleotide sequences, e.g., UMIs or cell barcodes. |
Each pair of bits encodes a nucleotide - 00 is A, 01 is C, 10 is G and 11 is T. The least significant byte contains the 3'-most nucleotides, and the remaining bits are set to zero. Thus, the sequence “CGGACT” is converted to the binary form:
01 10 10 00 01 11
... which corresponds to the integer 1671.
A consequence of R's use of 32-bit integers means that no element of sequences
can be more than 15 nt long.
Otherwise, integer overflow will occur.
An integer vector containing the encoded sequences.
Aaron Lun
10X Genomics (2017). Molecule info. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/molecule_info
encodeSequences("CGGACT")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.