BSgenomeForge: The BSgenomeForge functions
In BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs

Description Usage Arguments Details Author(s) Examples

A set of functions for making a BSgenome data package.

## Top-level BSgenomeForge function:

forgeBSgenomeDataPkg(x, seqs_srcdir=".", destdir=".", verbose=TRUE)

## Low-level BSgenomeForge functions:

forgeSeqlengthsRdsFile(seqnames, prefix="", suffix=".fa",
                       seqs_srcdir=".", seqs_destdir=".",
                       genome=NA_character_, verbose=TRUE)

forgeSeqlengthsRdaFile(seqnames, prefix="", suffix=".fa",
                       seqs_srcdir=".", seqs_destdir=".",
                       genome=NA_character_, verbose=TRUE)

forgeSeqFiles(provider, genome,
              seqnames, mseqnames=NULL,
              seqfile_name=NA, prefix="", suffix=".fa",
              seqs_srcdir=".", seqs_destdir=".",
              ondisk_seq_format=c("2bit", "rds", "rda", "fa.rz", "fa"),
              verbose=TRUE)

forgeMasksFiles(seqnames, nmask_per_seq,
                seqs_destdir=".",
                ondisk_seq_format=c("2bit", "rda", "fa.rz", "fa"),
                masks_srcdir=".", masks_destdir=".",
                AGAPSfiles_type="gap", AGAPSfiles_name=NA,
                AGAPSfiles_prefix="", AGAPSfiles_suffix="_gap.txt",
                RMfiles_name=NA, RMfiles_prefix="", RMfiles_suffix=".fa.out",
                TRFfiles_name=NA, TRFfiles_prefix="", TRFfiles_suffix=".bed",
                verbose=TRUE)

`x`	A BSgenomeDataPkgSeed object or the name of a BSgenome data package seed file. See the BSgenomeForge vignette in this package for more information.
`seqs_srcdir, masks_srcdir`	Single strings indicating the path to the source directories i.e. to the directories containing the source data files. Only read access to these directories is needed. See the BSgenomeForge vignette in this package for more information.
`destdir`	A single string indicating the path to the directory where the source tree of the target package should be created. This directory must already exist. See the BSgenomeForge vignette in this package for more information.
`verbose`	`TRUE` or `FALSE`.
`provider`	The provider of the sequence data files e.g. `"UCSC"`, `"NCBI"`, `"BDGP"`, `"FlyBase"`, etc...
`genome`	The name of the genome. Typically the name of an NCBI assembly (e.g. `"GRCh38.p12"`, `"WBcel235"`, `"TAIR10.1"`, `"ARS-UCD1.2"`, etc...) or UCSC genome (e.g. `"hg38"`, `"bosTau9"`, `"galGal6"`, `"ce11"`, etc...).
`seqnames, mseqnames`	A character vector containing the names of the single (for `seqnames`) and multiple (for `mseqnames`) sequences to forge. See the BSgenomeForge vignette in this package for more information.
`seqfile_name, prefix, suffix`	See the BSgenomeForge vignette in this package for more information, in particular the description of the `seqfile_name`, `seqfiles_prefix` and `seqfiles_suffix` fields of a BSgenome data package seed file.
`seqs_destdir, masks_destdir`	During the forging process the source data files are converted into serialized Biostrings objects. `seqs_destdir` and `masks_destdir` must be single strings indicating the path to the directories where these serialized objects should be saved. These directories must already exist. Both `forgeSeqlengthsRdsFile` and `forgeSeqlengthsRdaFile` will produce a single `.rds` or `.rda` file. Both `forgeSeqFiles` and `forgeMasksFiles` will produce one file per sequence (all files being either `.rds` or `.rda` files).
`ondisk_seq_format`	Specifies how the single sequences should be stored in the forged package. Can be `"2bit"`, `"rds"`, `"rda"`, `"fa.rz"`, or `"fa"`. If `"2bit"` (the default), then all the single sequences are stored in a single twoBit file. If `"rds"` or `"rda"`, then each single sequence is stored in a separated serialized XString derivative (one per single sequence). If `"fa.rz"` or `"fa"`, then all the single sequences are stored in a single FASTA file (compressed in the RAZip format if `"fa.rz"`).
`nmask_per_seq`	A single integer indicating the desired number of masks per sequence. See the BSgenomeForge vignette in this package for more information.
`AGAPSfiles_type, AGAPSfiles_name, AGAPSfiles_prefix, AGAPSfiles_suffix, RMfiles_name, RMfiles_prefix, RMfiles_suffix, TRFfiles_name, TRFfiles_prefix, TRFfiles_suffix`	These arguments are named accordingly to the corresponding fields of a BSgenome data package seed file. See the BSgenomeForge vignette in this package for more information.

These functions are intended for Bioconductor users who want to make a new BSgenome data package, not for regular users of these packages. See the BSgenomeForge vignette in this package (vignette("BSgenomeForge")) for an extensive coverage of this topic.

H. Pag<c3><a8>s

seqs_srcdir <- system.file("extdata", package="BSgenome")
seqnames <- c("chrX", "chrM")

## Forge .2bit sequence files:
forgeSeqFiles("UCSC", "ce2",
              seqnames, prefix="ce2", suffix=".fa.gz",
              seqs_srcdir=seqs_srcdir,
              seqs_destdir=tempdir(), ondisk_seq_format="2bit")

## Forge .rds sequence files:
forgeSeqFiles("UCSC", "ce2",
              seqnames, prefix="ce2", suffix=".fa.gz",
              seqs_srcdir=seqs_srcdir,
              seqs_destdir=tempdir(), ondisk_seq_format="rds")

## Sanity checks:
library(BSgenome.Celegans.UCSC.ce2)
genome <- BSgenome.Celegans.UCSC.ce2

ce2_sequences <- import(file.path(tempdir(), "single_sequences.2bit"))
ce2_sequences0 <- DNAStringSet(list(chrX=genome$chrX, chrM=genome$chrM))
stopifnot(identical(names(ce2_sequences0), names(ce2_sequences)),
          all(ce2_sequences0 == ce2_sequences))

chrX <- readRDS(file.path(tempdir(), "chrX.rds"))
stopifnot(genome$chrX == chrX)
chrM <- readRDS(file.path(tempdir(), "chrM.rds"))
stopifnot(genome$chrM == chrM)

BSgenome documentation built on Nov. 8, 2020, 7:48 p.m.

BSgenome index

README.md Efficient genome searching with Biostrings and the BSgenome data packages How to forge a BSgenome data package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BSgenome
Software infrastructure for efficient representation of full genomes and their SNPs

BSgenomeForge: The BSgenomeForge functions
In BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs

Description

Usage

Arguments

Details

Author(s)

Examples

Related to BSgenomeForge in BSgenome...

R Package Documentation

Browse R Packages

We want your feedback!

BSgenome Software infrastructure for efficient representation of full genomes and their SNPs

BSgenomeForge: The BSgenomeForge functions In BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs

Description

Usage

Arguments

Details

Author(s)

Examples

Related to BSgenomeForge in BSgenome...

R Package Documentation

Browse R Packages

We want your feedback!

BSgenome
Software infrastructure for efficient representation of full genomes and their SNPs

BSgenomeForge: The BSgenomeForge functions
In BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs