NCBI-utils: Utility functions to access NCBI resources

NCBI-utilsR Documentation

Utility functions to access NCBI resources

Description

Low-level utility functions to access NCBI resources. Not intended to be used directly by the end user.

Usage

find_NCBI_assembly_ftp_dir(assembly_accession, assembly_name=NA)

fetch_assembly_report(assembly_accession, assembly_name=NA,
                      AssemblyUnits=NULL)

Arguments

assembly_accession

A single string containing either a GenBank assembly accession (e.g. "GCA_000001405.15") or a RefSeq assembly accession (e.g. "GCF_000001405.26").

Alternatively, for fetch_assembly_report(), the assembly_accession argument can be set to the URL to the assembly report (a.k.a. "Full sequence report").

assembly_name

A single string or NA.

AssemblyUnits

By default, all the assembly units are included in the data frame returned by fetch_assembly_report(). To include only a subset of assembly units, pass a character vector containing the names of the assembly units to include to the AssemblyUnits argument.

Value

For find_NCBI_assembly_ftp_dir(): A length-2 character vector:

  • The 1st element in the vector is the URL to the FTP dir, without the trailing slash.

  • The 2nd element in the vector is the prefix used in the names of most of the files in the FTP dir.

For fetch_assembly_report(): A data frame with 1 row per sequence in the assembly and 10 columns:

  1. SequenceName

  2. SequenceRole

  3. AssignedMolecule

  4. AssignedMoleculeLocationOrType

  5. GenBankAccn

  6. Relationship

  7. RefSeqAccn

  8. AssemblyUnit

  9. SequenceLength

  10. UCSCStyleName

Note

fetch_assembly_report is the workhorse behind higher-level and more user-friendly getChromInfoFromNCBI.

Author(s)

H. Pagès

See Also

getChromInfoFromNCBI for a higher-level and more user-friendly version of fetch_assembly_report.

Examples

ftp_dir <- find_NCBI_assembly_ftp_dir("GCA_000001405.15")
ftp_dir

url <- ftp_dir[[1]]     # URL to the FTP dir
prefix <- ftp_dir[[2]]  # prefix used in names of most files

list_ftp_dir(url)

assembly_report_url <- paste0(url, "/", prefix, "_assembly_report.txt")

## To fetch the assembly report for assembly GCA_000001405.15, you can
## call fetch_assembly_report() on the assembly accession or directly
## on the URL to the assembly report:
assembly_report <- fetch_assembly_report("GCA_000001405.15")
dim(assembly_report)
head(assembly_report)

## Sanity check:
assembly_report2 <- fetch_assembly_report(assembly_report_url)
stopifnot(identical(assembly_report, assembly_report2))

Bioconductor/GenomeInfoDb documentation built on Dec. 2, 2024, 1:41 a.m.