gdb.create: Creates a new Genomic Database
In misha: Toolkit for Analysis of Genomic Data

gdb.create

R Documentation

Creates a new Genomic Database

Description

Creates a new Genomic Database.

Usage

gdb.create(
  groot = NULL,
  fasta = NULL,
  genes.file = NULL,
  annots.file = NULL,
  annots.names = NULL,
  format = NULL,
  verbose = FALSE
)

Arguments

`groot`	path to newly created database
`fasta`	an array of names or URLs of FASTA files. Can contain wildcards for multiple files
`genes.file`	name or URL of file that contains genes. If 'NULL' no genes are imported
`annots.file`	name of URL file that contains annotations. If 'NULL' no annotations are imported
`annots.names`	annotations names
`format`	database format: "indexed" (default, single genome.seq + genome.idx) or "per-chromosome" (separate .seq file per contig). If NULL, uses the value from `getOption("gmulticontig.indexed_format", TRUE)`
`verbose`	if TRUE, prints verbose messages

Details

This function creates a new Genomic Database at the location specified by 'groot'. FASTA files are converted to 'Seq' format and appropriate 'chrom_sizes.txt' file is generated (see "User Manual" for more details).

Two database formats are supported:

indexed: Single genome.seq + genome.idx (default). Recommended for genomes with many contigs. Provides better performance and scalability.
per-chromosome: Separate .seq file per contig.

If 'genes.file' is not 'NULL' four sets of intervals are created in the database: tss, exons, utr3 and utr5. See gintervals.import_genes for more details about importing genes intervals.

'fasta', 'genes.file' and 'annots.file' can be either a file path or URL in a form of 'ftp://[address]/[file]'. 'fasta' can also contain wildcards to indicate multiple files. Files that these arguments point to can be zipped or unzipped.

See the 'Genomes' vignette for details on how to create a database from common genome sources.

Value

None.

Examples


# ftp <- "ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10"
# mm10_dir <- file.path(tempdir(), "mm10")
# # only a single chromosome is loaded in this example
# # see "Genomes" vignette how to download all of them and how
# # to download other genomes
# gdb.create(
#     mm10_dir,
#     paste(ftp, "chromosomes", paste0(
#         "chr", c("X"),
#         ".fa.gz"
#     ), sep = "/"),
#     paste(ftp, "database/knownGene.txt.gz", sep = "/"),
#     paste(ftp, "database/kgXref.txt.gz", sep = "/"),
#     c(
#         "kgID", "mRNA", "spID", "spDisplayID", "geneSymbol",
#         "refseq", "protAcc", "description", "rfamAcc",
#         "tRnaName"
#     )
# )
# gdb.init(mm10_dir)
# gintervals.ls()
# gintervals.all()

misha documentation built on Feb. 20, 2026, 5:08 p.m.