README.md
In hyginn/BCB420.2019.ESA: Exploratory Systems Analysis Tools

`BCB420.2019.ESA`

(Exploratory Systems Analysis Tools for BCB420-2019)

Boris Steipe,

Department of Biochemistry and Department of Molecular Genetics,

University of Toronto

Canada

< boris.steipe@utoronto.ca >

This README describes this package.

If any of this information is ambiguous, inaccurate, outdated, or incomplete, please check the most recent version of the package on GitHub and file an issue.

About this package:
Data ...
Functions ...
Notes
References and Further reading
Acknowledgements

This package is a joint development platform for student designed tools for Exploratory Systems Analysis, part of the University of Toronto course BCB420H1S (Computational Systems Biology) in the 2018/2019 academic year.

Suporting resources include curated systems data and other data resources:

2.1 The HGNC symbol reference

Load the HGNC object in the following way:

HGNC <- fetchData("HGNCreference")                # assigns the HGNC data object

str(HGNC)
# 'data.frame': 27087 obs. of  14 variables:
#  $ sym      : chr  "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
#  $ name     : chr  "alpha-1-B glycoprotein" "A1BG antisense RNA 1"  ...
#  $ UniProtID: chr  "P04217" NA "Q9NQ94" "P01023" ...
#  $ RefSeqID : chr  "NM_130786" "NR_015380" "NM_001198818" "NM_000014" ...
#  $ EnsID    : chr  "ENSG00000121410" "ENSG00000268895" "ENSG00000148584" ...
#  $ UCSCID   : chr  "uc002qsd.5" "uc002qse.3" "uc057tgv.1" "uc001qvk.2" ...
#  $ GeneID   : num  1 503538 29974 2 144571 ...
#  $ OMIMID   : chr  "138670" NA "618199" "103950" ...
#  $ acc      : chr  NA "BC040926" "AF271790" "BX647329, X68728, M11313" ...
#  $ chr      : chr  "19q13.43" "19q13.43" "10q11.23" "12p13.31" ...
#  $ type     : chr  "protein" "lncRNA" "protein" "protein" ...
#  $ prev     : chr  NA "NCRNA00181, A1BGAS, A1BG-AS" NA NA ...
#  $ synonym  : chr  NA "FLJ23569" "ACF, ASP, ACF64, ACF65, APOBEC1CF"  ...
#  $ RefSeqOld: chr  "NM_130786" "NR_015380" "NM_014576" "NM_000014" ...

2.2 Transcription factors from GTRD:

The GTRDgeneTFs list contains data which transcription factors (TF) that have been found to bind to the upstream regulatory regions of genes in ChIP-seq experiments bind to what genes; the data is compiled by the GTRD database. This is a list in which list-elements are named with gene symbols, and the elements are a vector of TF gene symbols.

Load geneTFs in the following way:

geneTFs <- fetchData("GTRDgeneTFs")       # assigns genes and their binding TFs

str(geneTFs)
# List of 17864
#  $ HES4        : chr [1:82] "AR" "ATF2" "ATF3" "ATF4" ...
#  $ PUSL1       : chr [1:64] "AR" "BHLHE40" "CEBPA" "CEBPB" ...
#  $ ACAP3       : chr [1:69] "BHLHE40" "CEBPA" "CEBPB" "CEBPD" ...
#  $ ATAD3B      : chr [1:110] "AR" "ASCL2" "ATF2" "ATF3" ...
#  $ ATAD3A      : chr [1:128] "ARID4B" "ASCL2" "ATF1" "ATF3" ...
#   [list output truncated]

The GTRDTFgenes data object is the inverse of GTRDgeneTFs - list-elements are named with TF gene symbols, and the elements are a vector of gene symbols which the TFs bind to.

Load TFgenes in the following way:

TFgenes <- fetchData("GTRDTFgenes")       # assigns genes and their binding TFs

str(TFgenes)
List of 635
#  $ E2F8    : chr [1:758] "AATF" "ABCB6" "ABT1" "ACAP3" ...
#  $ FEZF1   : chr [1:1009] "AAR2" "AARS2" "ABRAXAS2" "ABTB2" ...
#  $ ZNF320  : chr [1:183] "AASDHPPT" "ABHD14A" "AFMID" "ALDH16A1" ...
#  $ ZNF316  : chr [1:209] "AKAP13" "AKR1B10" "AMBRA1" "ANO5" ...
#  $ ZSCAN5B : chr [1:33] "AGBL5" "ALOXE3" "ANG" "B4GAT1" ...
#   [list output truncated]

2.3 STRING edges:

STRINGedges contains edges of the STRING database mapped to HGNC symbols. Load STRINGedges in the following way:


STRINGedges <- fetchData("STRINGedges0.9") # assigns the STRINGedges data object

str(STRINGedges)
#  Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    319997 obs. of  3 variables:
#   $ a    : chr  "ARF5" "ARF5" "ARF5" "ARF5" ...
#   $ b    : chr  "SPTBN2" "KIF13B" "KIF21A" "TMED7" ...
#   $ score: num  909 910 910 906 971 915 905 927 914 902 ...

2.4 Expression profiles:

Expression profiles were compiled from 52 microarray experiments downloaded from GEO, and quantile normalized. Details to follow. Here is code to load and use the profiles.


# Load the expression profiles:
myQNXP <- fetchData("GEOprofiles")  # loads quantile-normalized expression data

str(myQNXP)
#  num [1:27087, 1:52] 29.4 199.9 34.3 947.4 2249.2 ...
#  - attr(*, "dimnames")=List of 2
#   ..$ : chr [1:27087] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
#   ..$ : chr [1:52] "GSE35330.ctrl.4h" "GSE35330.cond.4h" "GSE35330.ctrl.16h" ...


# Some statistics:
sum( ! is.na(myQNXP)) # Number of measurements: 923361
mean(  myQNXP, na.rm = TRUE) # 502.7972
median(myQNXP, na.rm = TRUE) # 127.43
hist(log10(myQNXP), breaks = 100)

# how many HGNC genes have at least one measurement recorded?
sum( ! is.na(rowMeans(myQNXP, na.rm = TRUE))) # 21063

# how many HGNC genes have measurements recorded an all experiments?
sum( ! is.na(rowMeans(myQNXP, na.rm = FALSE))) # 10812

# colnames describe experiments (averaged over replicates)
colnames(myQNXP)

# how many unique experiments
myExp <- gsub("^([^\\.]+).+", "\\1", colnames(myQNXP))
length(unique(myExp)) # 15

#Access one profile by name
myQNXP["VAMP8", ]

# plot expression values
plot(myQNXP["VAMP8", ], ylab = "quantile normalized expression (AU)")

# add lines that separate the experiments
abline(v = cumsum(rle(myExp)$lengths) + 0.5, lwd = 0.5, col = "#0000CC44")

Gene / Gene correlations

corGenes <- function(A, B, prf) {
  # Calculate pearson correlation between gene expression
  # profiles A and B in prf identified by the gene symbol.
  # A and B can be either gene symbol or index.

  r <- cor(prf[A, ], prf[B, ], use = "pairwise.complete.obs")
  return(r)
}

corGenes("MRPL18", "RPF2", myQNXP)
corGenes(100, 200, myQNXP)

plotCorGenes <- function(A, B, prf) {
  # Plot correlation between gene expression
  # profiles A and B in prf identified by the gene symbol.
  # A and B can be either gene symbol or index.

  xMin <- min(c(0, prf[A, ], prf[B, ]), na.rm = TRUE)
  xMax <- max(c(   prf[A, ], prf[B, ]), na.rm = TRUE)

  plot(prf[A, ], prf[B, ],
       xlim = c(xMin, xMax),
       ylim = c(xMin, xMax),
       main = sprintf("%s vs. %s (r = %5.2f)", A, B, corGenes(A, B, prf)),
       xlab = (sprintf("%s expresion (AU)", A)),
       ylab = (sprintf("%s expresion (AU)", B)),
       asp = 1.0)
  abline(lm(prf[B, ] ~ prf[A, ]), col = "#AA0000", lwd = 0.67)

  return(invisible(NULL))
}

plotCorGenes(A = "SLC9A4", B = "NLGN2", prf = myQNXP)

Example plots for uncorrelated and highly correlated gene expression profiles.

Distribution of expression correlations between 10,000 randomly chosen gene pairs, compared to the distribution of shuffled expression values for the two genes. The correlations are not symmetric around zero. There are more negative correlations than expected, and more excess negative than positive correlations, but if two genes are positively correlated, the correlation trends to be higher.

2.5 InterPro domain annotations:

Interpro domain annotations were parsed from the 50 GB annotation source of InterPro v. 73.0 and mapped to HGNC symbols (details to follow).

# load the gene-annotations list:
myURL <- paste0("http://steipe.biochemistry.utoronto.ca/abc/assets/",
                "genesIPR.rds")
genesIPR <- readRDS(url(myURL))  # reads and assigns genesIPR list

str(genesIPR, list.len = 5)
# List of 19174
#  $ NUDT4B     : chr [1:3] "IPR000086" "IPR015797" "IPR020084"
#  $ IGLV4-69   : chr [1:5] "IPR003599" "IPR007110" "IPR013106" "IPR013783" ...
#  $ IGLV8-61   : chr [1:5] "IPR003599" "IPR007110" "IPR013106" "IPR013783" ...
#  $ IGLV4-60   : chr [1:5] "IPR003599" "IPR007110" "IPR013106" "IPR013783" ...
#  $ IGLV10-54  : chr [1:4] "IPR007110" "IPR013106" "IPR013783" "IPR036179"
#   [list output truncated]


# load the IPR_domain-locations list:
myURL <- paste0("http://steipe.biochemistry.utoronto.ca/abc/assets/",
                "IPRgenes.rds")
IPRgenes <- readRDS(url(myURL))  # reads and assigns IPRgenes list

str(IPRgenes, list.len = 5)
# List of 16178
#  $ IPR000086: chr [1:27] "NUDT4B" "NUDT19" "NUDT21" "TRPM2" ...
#  $ IPR015797: chr [1:28] "NUDT4B" "NUDT19" "NUDT21" "TRPM2" ...
#  $ IPR020084: chr [1:13] "NUDT4B" "NUDT3" "NUDT1" "NUDT2" ...
#  $ IPR003599: chr [1:456] "IGLV4-69" "IGLV8-61" "IGLV4-60" "IGLV7-46" ...
#  $ IPR007110: chr [1:674] "IGLV4-69" "IGLV8-61" "IGLV4-60" "IGLV10-54" ...
#   [list output truncated]

2.6 Systems:

A systems database (of currently four systems) can be loaded with fetchData(). Several utility functions have been added to the package (in ./R/SyDButils.R). Use SyDBgetSysSymbols(<database>, <sys>)[[1]] to access all gene symbols for a single system (or subsystem). Use unlist(SyDBgetSysSymbols(myDB, SyDBgetRootSysIDs(myDB)), use.names = FALSE) to access all genes in the database.

myDB <- fetchData("SysDB")


SyDBgetRootSysIDs(myDB)
#                                        PHALY                                       SLIGR 
#  "give.jams-1d8-648b-1e12-6a9f-65421424affe" "cast.rear-5f1-56ef-1cd2-1ae3-54a5556d59ff" 
#                                        NLRIN                                       HVGCR 
#  "scar.blur-9bc-29cf-31f2-1981-4d92edf4d0e6" "help.mink-f96-e98b-9e12-ab41-8217a3ecb0cd" 


names(SyDBgetRootSysIDs(myDB))
# [1] "PHALY" "SLIGR" "NLRIN" "HVGCR"


SyDBgetSysSymbols(myDB, "HVGCR")  # Note: returns a list.
# $HVGCR
#  [1] "ADCY9"    "ADRB2"    "AKAP7"    "CACNA1C"  "CACNA2D1" "CACNA2D3" "CACNB1"   "CACNB3"  
#  [9] "CACNB4"   "CACNG1"   "CACNG6"   "CALM1"    "CALM2"    "CALM3"    "GNAS"     "PRKACA"  
# [17] "PRKACB"   "PRKAR1A"  "PRKAR1B"  "PRKAR2A"  "PRKAR2B"

The old function stub fetchComponents() still works but is deprecated:

> fetchComponents("PHALY")
Note: fetchComponents(...) is deprecated. Use SyDBgetSysSymbols(<database>, <system code>) instead.

 [1] "AMBRA1"    "ATG14"     "ATP2A1"    "ATP2A2"    "ATP2A3"    "BECN1"     "BECN2"     "BIRC6"    
 [9] "BLOC1S1"   "BLOC1S2"   "BORCS5"    "BORCS6"    "BORCS7"    "BORCS8"    "CACNA1A"   "CALCOCO2" 
 ...

3.1 Fetching data

See:

?fetchData 
fetchData()

3.2 Systems database utilities

'./R/SyDButils.R' contains a number of exported functions to support work with a systems database:

SyDBgetRootSysIDs() - returns a named vector with the IDs of all root systems in a systems database;
SyDBgetSysSymbols() - returns a list of the same length as the input vector with HGNC symbols of each system in the input vector;
SyDBgetIDforKey() - returns a vector of IDs for key(s) in a column of a table;
SyDBgetValforID() - returns a vector of values in a column of a table where the ID matches the input;
SyDBinvalid() - checks whether its database argument is a valid, current, system database;
SyDBTree() - returns a tree representation of the hierarchical structure of a system or systems in the database.

Examples:

mySDB <- fetchData("SysDB")

SyDBgetRootSysIDs(mySDB)
#                                       PHALY                                       SLIGR 
# "give.jams-1d8-648b-1e12-6a9f-65421424affe" "cast.rear-5f1-56ef-1cd2-1ae3-54a5556d59ff" 
#                                       NLRIN                                       HVGCR 
# "scar.blur-9bc-29cf-31f2-1981-4d92edf4d0e6" "help.mink-f96-e98b-9e12-ab41-8217a3ecb0cd" 


names(SyDBgetRootSysIDs(mySDB))
# [1] "PHALY" "SLIGR" "NLRIN" "HVGCR"


SyDBgetIDforKey("HOPS complex", "code", "component", mySDB)
# [1] "flip.face-782-d729-9622-299c-073fb8f2ec94"


SyDBgetSysSymbols(mySDB, "HOPS complex")
# $`HOPS complex`
# [1] "VPS11"  "VPS16"  "VPS18"  "VPS33A" "VPS39"  "VPS41" 


cat(SyDBTree("NLRIN", mySDB, MAX = 3), sep = "\n")
# 
#   --NLRIN
#     |___NLRP3 activation signal
#         |___Common activating signal
#         |___NLRP3
#         |___NLRP3 activation
#     |___NLRIN regulation
#         |___NLRIN positive regulation
#         |___NLRIN negative regulation
#     |___NLRP3 priming signal
#         |___IL-beta priming signal
#         |___TNF priming signal
#         |___TLR priming signal
#     |___NOT IN NLRIN
#         |___NLRP7
#         |___NLRC4
#         |___AIM2
#         |___NLRP14
#         |___NLRP6
#         |___NLRP12
#         |___NLRP2
#         |___NLRP9b
#         |___NLRP1b

... in progress

hyginn/BCB420.2019.ESA documentation built on May 29, 2019, 1:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hyginn/BCB420.2019.ESA
Exploratory Systems Analysis Tools

README.md
In hyginn/BCB420.2019.ESA: Exploratory Systems Analysis Tools

`BCB420.2019.ESA`

(Exploratory Systems Analysis Tools for BCB420-2019)

Boris Steipe,

Department of Biochemistry and Department of Molecular Genetics,

University of Toronto

Canada

< boris.steipe@utoronto.ca >

1 About this package:

2 Data ...

2.1 The HGNC symbol reference

2.2 Transcription factors from GTRD:

2.3 STRING edges:

2.4 Expression profiles:

2.5 InterPro domain annotations:

2.6 Systems:

3 Functions

3.1 Fetching data

3.2 Systems database utilities

4 Notes

4 References and Further Reading

5 Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

hyginn/BCB420.2019.ESA Exploratory Systems Analysis Tools

README.md In hyginn/BCB420.2019.ESA: Exploratory Systems Analysis Tools

BCB420.2019.ESA

(Exploratory Systems Analysis Tools for BCB420-2019)

Boris Steipe,

Department of Biochemistry and Department of Molecular Genetics,

University of Toronto

Canada

< boris.steipe@utoronto.ca >

1 About this package:

2 Data ...

2.1 The HGNC symbol reference

2.2 Transcription factors from GTRD:

2.3 STRING edges:

2.4 Expression profiles:

2.5 InterPro domain annotations:

2.6 Systems:

3 Functions

3.1 Fetching data

3.2 Systems database utilities

4 Notes

4 References and Further Reading

5 Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

hyginn/BCB420.2019.ESA
Exploratory Systems Analysis Tools

README.md
In hyginn/BCB420.2019.ESA: Exploratory Systems Analysis Tools

`BCB420.2019.ESA`