1. About this package:

This package describes the workflow for downloading gene annotation data from Gene Ontology Annotation and how to annotate an example gene set and shows how to compute some database statistics [Ashburn, 2000] .

The package passes all checks without errors or warnings.

2. About Gene Ontology Annotation (GOA):

2.1 What is it?

Defining the function of a gene is more difficult than it might appear. Unlike with sequence and localization, there is no natural set fo terms used to define a gene's function. Gene Ontology is useful because it provides a common language with defined meaning to discuss gene function. This common language allows us to compare, categorize and collect gene functions. (Ashburn, 2000) Source

2.2 Why is it useful?

Gene Ontology is useful because it provides a common language with defined meaning to discuss gene function. This common language allows us to compare, categorize and collect gene functions.

2.3 The Data

Gene Ontology contains "GO terms"- which describe biological functions of genes- as well as the relationships between the. GO tesorms are the nodes in the gene ontology graph and are divided into three different categories:
* molecular function - "has_a"
* cellular concepts - "is_a"
* biological process - "part_of"
Gene Ontology Annotation (GOA) is part of GO's biocuration project which aims to associate the UniProt IDs of genes with their GO symbol annotation (Ashburn, 2000) [Source] (http://steipe.biochemistry.utoronto.ca/abc/index.php/BIN-FUNC-GO)

Gene Ontology Annotation is a link between a gene and a GO term annotation. which describes its All Gene Ontology Annotations data is available under a CC-BY 4.0 license. This document is based on the version of GOA updated on 2018/04/11.

2.4 Data Semantics:

Ontology (OBO) Files contain GO terms.
Each term contains:

1. unique identifier (i.e. 'GO Term') in format GO: 7-digit identifier
2. namespace - one of the three sub-ontologies (molecular function, cellular components, biological processes)
3. definition- description of what the specific GO term represents (and references)
4. relationship to other terms- whether it is a 'has_a', 'is_a','part_of' relationship to one or more other GO terms

Gene Ontology Annotation (GAF) Files: This package will be focus on genes in Homo sapiens.
These files are tab-deliminated and have 17 fields of information
Source.


The fields necessary for this package are:
The Database from which the object identifier is used
The UniProt identifier of an object
The gene symbol (i.e. HGNC gene symbol)
The GO_ID associated with the gene (i.e. the GO term annotation )

Describe the function of genes using terms in Gene Ontology. Each GOA describes an association between a gene and a GO term. There are three different subcategories of gene ontologies.
These annotations are derived from three sources of evidence which are represented by an Evidence Code for each GOA and either a reference or description of annotation
Source :
Experimentally-supported annotations (EXP)
Phylogenetically-inferred annotations (IBA)
* Computationally-inferred annotations (IEA)

3. Prerequisites

Install the package bcb420goa from the GitHub repository.

library(bcb420goa)

4. Data Download

This package requires two datafiles to be stored in a subfolder data in your working directory : You can download the files from the links provided.
GO Annotation File: (http://geneontology.org/gene-associations/goa_human.gaf.gz)
Ontology file: (http://purl.obolibrary.org/obo/go/go-basic.obo)

Alternatively, you can call the following function which will download the files automatically and store them in the appropriate directory:

```r

#bcb420goa::download_default()

```

Note: This function should only need to be called once the first time the package is used- as long as the files are not manually deleted.

 human.gaf <- read.csv(file.path(getwd(),"data/goa_human.gaf.gz"),
                        header = FALSE,
                        sep = "\t",
                        comment.char = "!",  #skip the metadata rows
                        col.names = column.names )
  head(human.gaf)

5. Initialization

Once, we have downloaded the two files above in our local directory, we need to load, parse and join the two files. We store this in some randomly named dataframe. This only needs to be done once each time you load the library. E.g.

 df<- bcb420goa::init()

6. Annotation

  (result <- getGOAnnotation(c("BLOC1S1", "BLOC1S2", "BORCS5"), df))

7. Example Gene Set Annotation

exampleGeneSet <- c("AMBRA1", "ATG14", "ATP2A1", "ATP2A2", "ATP2A3", "BECN1", "BECN2",
          "BIRC6", "BLOC1S1", "BLOC1S2", "BORCS5", "BORCS6", "BORCS7",
          "BORCS8", "CACNA1A", "CALCOCO2", "CTTN", "DCTN1", "EPG5", "GABARAP",
          "GABARAPL1", "GABARAPL2", "HDAC6", "HSPB8", "INPP5E", "IRGM",
          "KXD1", "LAMP1", "LAMP2", "LAMP3", "LAMP5", "MAP1LC3A", "MAP1LC3B",
          "MAP1LC3C", "MGRN1", "MYO1C", "MYO6", "NAPA", "NSF", "OPTN",
          "OSBPL1A", "PI4K2A", "PIK3C3", "PLEKHM1", "PSEN1", "RAB20", "RAB21",
          "RAB29", "RAB34", "RAB39A", "RAB7A", "RAB7B", "RPTOR", "RUBCN",
          "RUBCNL", "SNAP29", "SNAP47", "SNAPIN", "SPG11", "STX17", "STX6",
          "SYT7", "TARDBP", "TFEB", "TGM2", "TIFA", "TMEM175", "TOM1",
          "TPCN1", "TPCN2", "TPPP", "TXNIP", "UVRAG", "VAMP3", "VAMP7",
          "VAMP8", "VAPA", "VPS11", "VPS16", "VPS18", "VPS33A", "VPS39",
          "VPS41", "VTI1B", "YKT6")
(res<- getGOAnnotation(exampleGeneSet, df))

8. Statistics

9. Sources

Example and template for use of rpt package was taken from rpt package.
Gene Ontology Annotation Data taken from Gene Ontology Annotation
Ontology Annotation Data taken from Ontology
Wickham,H., et al. 2018. Package 'dplyr'. (https://dplyr.tidyverse.org)
Greene, D., et al. 2019. Package 'OntologyIndex'.(https://cran.r-project.org/web/packages/ontologyIndex/ontologyIndex.pdf)
Wickham,H., et al. 2018. Package 'readr'. (https://cran.r-project.org/web/packages/readr/readr.pdf)
Ashburner et al. Gene ontology: tool for the unification of biology (2000) Nat Genet 25(1):25-9. Online at Nature Genetics.
GO Consortium, Nucleic Acids Res., 2017
http://geneontology.org/page/ontology-documentation
http://geneontology.org/page/go-annotations
http://steipe.biochemistry.utoronto.ca/abc/assets/BIN-FUNC-GO.pdf



Deni678/bcb420goa documentation built on May 16, 2019, 9:12 a.m.