nocite: @steipe-rptPlus
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) # load BCB420-2019.ESA itself for knitr: pkgName <- trimws(gsub("^Package:", "", readLines("../DESCRIPTION")[1])) library(pkgName, character.only = TRUE)
This vignette describes the workflow that was used to prepare the BioGRID dataset for the package. Source data is protein-protein interaction data from BioGRID [@pmid30476227].
The BioGRID is a collection of curated protein-protein interaction data. BioGRID data is licensed under the MIT license. This document describes work with BioGRID 3.5.170 (2019-02-25) [@pmid30476227].
BioGRID interaction data is available in various formats common to protein-protein interaction databases. For our purposes of working with HGNC symbols, the BioGRID TAB 2.0 file format appears useful. Details are described here.
The file BIOGRID-ALL-3.5.170.tab2.zip
contains the following columns of interest to us:
BIOGRID-ALL-3.5.170.tab2.zip
(70.7 Mb);data
. (It should be reachable with file.path("..", "data", "BioGRID")
). Warning: ../data/BioGRID/BIOGRID-ALL-3.5.170.tab2.txt
is 618.9 Mb!
To begin processing, we need to make sure the required packages are installed:
readr
provides functions to read data which are particularly suitable for
large datasets. They are much faster than the built-in read.csv() etc. But caution: these functions return "tibbles", not data frames. (Know the difference.)
if (! requireNamespace("readr")) { install.packages("readr") }
FN <- file.path("..", "data", "BioGRID", "BIOGRID-ALL-3.5.170.tab2.txt") BioGRID <- as.data.frame(readr::read_tsv(FN, skip = 1, col_names = c("A", "B", "type", "taxID_A", "taxID_B"), col_types = "-------cc---c--ii-------")) nrow(BioGRID) # 1647089 # Remove all interactions for which taxID_A and taxID_B are not both 9606 BioGRID <- BioGRID[BioGRID$taxID_A == 9606 & BioGRID$taxID_B == 9606, c("A", "B", "type")] nrow(BioGRID) # 442753 # Remove all interactions for which A and B are not both in HGNC sym myURL <- paste0("https://github.com/hyginn/", "BCB420-2019-resources/blob/master/HGNC.RData?raw=true") load(url(myURL)) # loads HGNC data frame BioGRID <- BioGRID[BioGRID$A %in% HGNC$sym & BioGRID$A %in% HGNC$sym, c("A", "B", "type")] nrow(BioGRID) # 433505 # How many genetic interactions? sum(BioGRID$type == "genetic") # 4702 # Save dataset: saveRDS(BioGRID, file = file.path("..", "data", "BioGRID", "BioGRID.3.5.170.rds")) # 2.8 Mb # The dataset was uploaded to the assets server and is available with: BioGRID <- fetchData("BioGRID")
This release of the BCB420.2019.ESA
package was produced in the following context of supporting packages:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.