README.md
In AlexWeinreb/wbData: Download And Use WormBase Data

wbData

Easily download and use Wormbase data. Can be used to download or read the “geneIDs” table or the GTF files from Wormbase, and provides functions to convert between Wormbase IDs and gene symbols.

The three advantages are:

finding files for any given Wormbase release (even though the actual path changes)
keeping all these files in a single place
facilitating common operations.

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("AlexWeinreb/wbData")

To load the geneIDs table for Wormbase release WS273 and convert between gene ID and gene symbol:

library(wbData)

gids <- wb_load_gene_ids("WS273")
s2i("unc-10", gids)
#> [1] "WBGene00006750"
i2s(c("WBGene00006752", "WBGene00004412"), gids)
#> [1] "unc-13" "rpl-1"

gene_coords <- wbData::wb_load_gene_coords(273)
head(gene_coords)
#> # A tibble: 6 × 7
#>   gene_id        chr   start   end strand position                  gene_biotype
#>   <chr>          <chr> <int> <int> <chr>  <chr>                     <chr>       
#> 1 WBGene00014450 MtDNA     1    55 +      MtDNA:         1-       … tRNA        
#> 2 WBGene00014451 MtDNA    58   111 +      MtDNA:        58-       … tRNA        
#> 3 WBGene00010957 MtDNA   113   549 +      MtDNA:       113-       … protein_cod…
#> 4 WBGene00010958 MtDNA   549   783 +      MtDNA:       549-       … protein_cod…
#> 5 WBGene00014452 MtDNA   785   840 +      MtDNA:       785-       … tRNA        
#> 6 WBGene00014453 MtDNA   842   896 +      MtDNA:       842-       … tRNA

This can be used in conjunction with other packages, for example to look up a gene model in IGV or JBrowse:


gene_coords |>
  dplyr::filter(gene_id == s2i("unc-10", gids)) |>
  dplyr::pull(position) |>
  clipr::write_clip()

clipr::read_clip()

You can load the list of strains available at the CGC. The downloaded list is saved in cache, and by default is only downloaded again if the cached file is more than 2 days old. Use the refresh argument to force refreshing.

strain_list <- wb_load_cgc_list()
nrow(strain_list)
#> [1] 24536

For example, looking at a specific strain:

strain_list$Genotype[strain_list$Strain == "NC902"]
#> [1] "unc-119(ed3) III; wdEx381."

Or looking for strains that have particular characteristics. For example, using a regular expression to look for strains that have hbl-1 potentially associated with a red fluorescent protein:


fluo_keywords <- c("cherry", "tomato", "scarlet", "rfp", " red ")
my_pattern <- paste("hbl-1", ".*::.*", fluo_keywords, sep="", collapse = "|")
with_fluo <- which(grepl(my_pattern, tolower(strain_list$Genotype)) | grepl(my_pattern, tolower(strain_list$Description)))
strain_list[with_fluo,]
#> # A tibble: 3 × 8
#>   Strain Species      Genotype Description Mutagen Outcrossed `Made by` Received
#>   <chr>  <chr>        <chr>    <chr>       <chr>   <chr>      <chr>     <chr>   
#> 1 VT3751 C. elegans   maIs105… maIs105 [c… Crispr… x2         Orkan Il… 06/28/1…
#> 2 VT3869 <em>Caenorh… wIs51 V… wIs51 [SCM… Crispr… x2         Orkan Il… 03/01/2…
#> 3 VT3922 <em>Caenorh… lin-28(… Precocious… Crispr… x2         Orkan Il… 03/01/2…

The downloaded file is stored on the local computer to avoid re-downloading every time. That way, wbData can also be used to obtain Wormbase files to use with other software.

The cache directory can be specified in three ways:

Explicitly by specifying a path as argument to the package functions.
Through option("wb_dir_cache"). In that case leave the argument as NULL.
If the dir_cache argument is NULL and the option is not specified, a user-specific cache directory is chosen based on the operating system by rappdirs.

To only list the files in cache without deleting them:

wb_clean_cache(273, delete = FALSE)
#> [1] "C:\\Users\\ALEXIS~1\\AppData\\Local/wbData/wbData/Cache/c_elegans.PRJNA13758.WS273.canonical_geneset.gtf.gz"
#> [2] "C:\\Users\\ALEXIS~1\\AppData\\Local/wbData/wbData/Cache/c_elegans.PRJNA13758.WS273.geneIDs.txt.gz"

The cache can be emptied with: