options(width=80)

Retrieval of UCSC RepeatMasker annotations through AnnotationHub resources

The UCSCRepeatMasker package provides metadata for r Biocpkg("AnnotationHub") resources associated with UCSC RepeatMasker annotations. The original data can be found through UCSC download URLs https://hgdownload.soe.ucsc.edu/goldenPath/XXXX/database/rmsk.txt.gz, where XXXX is the corresponding code to a UCSC genome version. Details about how those original data were processed into r Biocpkg("AnnotationHub") resources can be found in the source file:

UCSCRepeatMasker/scripts/make-data_UCSCRepeatMasker.R

while details on how the metadata for those resources has been generated can be found in the source file:

UCSCRepeatMasker/scripts/make-metadata_UCSCRepeatMasker.R

UCSC RepeatMasker annotations can be retrieved using the r Biocpkg("AnnotationHub"), which is a web resource that provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard (e.g., UCSC, Ensembl) and distributed sites, can be found. A Bioconductor r Biocpkg("AnnotationHub") web resource creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.

For example, to list the available UCSC RepeatMasker annotations for the human genome, we should first load the r Biocpkg("AnnotationHub") package:

library(AnnotationHub)

and then query the annotation hub as follows:

ah <- AnnotationHub()
query(ah, c("UCSC", "RepeatMasker", "Homo sapiens"))

We can retrieve the desired resource, e.g., UCSC RepeatMasker annotations for hg38, using the following syntax:

rmskhg38 <- ah[["AH99003"]]
rmskhg38

Note that the data is returned using a GRanges object, please consult the vignettes from the r Biocpkg("GenomicRanges") package for details on how to manipulate this type of object. The contents of the 11 metadata columns are described at the UCSC Genome Browser web page for the RepeatMasker database schema. Please consult the credits and references sections on that page for information on how to cite these data.

The GRanges object contains further metadata accessible with the metadata() method as follows:

metadata(rmskhg38)

Session information

sessionInfo()


functionalgenomics/UCSCRepeatMasker documentation built on April 6, 2023, 11:18 a.m.