View source: R/twobit_seqstats.R
| twobit_seqstats | R Documentation |
Extract the lengths and letter counts of the DNA sequences stored
in a .2bit file.
twobit_seqstats(filepath)
twobit_seqlengths(filepath)
filepath |
A single string (character vector of length 1) containing a path
to a |
twobit_seqlengths(filepath) is a shortcut for
twobit_seqstats(filepath)[ , "seqlengths"] that is also a
much more efficient way to get the sequence lengths as it does not
need to load the sequence data in memory.
For twobit_seqstats(): An integer matrix with one row per sequence
in the .2bit file and 6 columns. The rownames on the matrix are the
sequence names and the colnames are: seqlengths, A, C,
G, T, N. Columns A, C, G, T,
and N contain the letter count for each sequence.
For twobit_seqlengths(): A named integer vector where the names
are the sequence names and the values the corresponding lengths.
A quick overview of the 2bit format: https://genome.ucsc.edu/FAQ/FAQformat.html#format7
twobit_read and twobit_write to read/write a
character vector representing DNA sequences from/to a file in 2bit
format.
filepath <- system.file(package="Rtwobitlib", "extdata", "sacCer2.2bit")
twobit_seqstats(filepath)
twobit_seqlengths(filepath)
## Sanity checks:
sacCer2_seqstats <- twobit_seqstats(filepath)
stopifnot(
identical(sacCer2_seqstats[ , 1], twobit_seqlengths(filepath)),
all.equal(rowSums(sacCer2_seqstats[ , -1]), sacCer2_seqstats[ , 1])
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.