CfData: A set of characterized protein coding genes from the Cannis...

CfDataR Documentation

A set of characterized protein coding genes from the Cannis familiaris organism annotated to a target GO subgraph considering both experimental and electronic evidence.

Description

The CfData dataset consists of a list containing the following:

$dxCf: characterizations of 6962 protein coding genes in terms of 72 physico-chemical properties of their amino acid sequences. These sequences, obtained from the Uniprot database, are annotated to 36 GO-terms of the GO Molecular Function (GO-MF) ontology subdomain.

$tableCfGO: a set of 6962 protein coding genes annotated to GO-MF target classes. Genes are identified by their Uniprot ID mappings which are obtained with the org.Cf.eg.db annotation package set to work with both experimental and electronic evidence. Additionally, only those GO-MF terms with at least 500 annotated genes were preserved.

$graphCfGO: the target GO-MF subgraph obtained with the org.Cf.eg.db annotation package set to work with the set of GO-MF target classes.

$indexGO: two arrays of Uniprot ID mappings defining the train-test partition of the set 6962 protein coding genes annotated to GO-MF terms.

$nodesGO: labels of the GO-MF subgraph.

$varianceGOs: a vector labeled with the variance of each GO-MF term.

Usage

data("CfData")

Format

A list with five named entries containg:

dxCf

A matrix (6962 rows x 72 columns) containing the characterized proteins.

graphCfGO

An adjacency binary matrix (36 rows x 36 columns) corresponding to the GO-MF subgraph.

indexGO

A list with two named entries: indexTrain and indexTest each containing a numeric vector.

tableCfGO

A binary matrix (6962 rows x 36 columns) containing GOs associated with a protein.

nodesGO

A numerical vector containing the nodes of the GO-MF subgraph.

Source

Uniprot Taxonomy: 9615

https://www.uniprot.org/uniprot/?query=taxonomy:9615

Package: org.Cf.eg.db - Version: 3.8.2

https://bioconductor.org/packages/org.Cf.eg.db/

Examples

data(CfData)

## list objects included
ls(CfData)
# [1] "dxCf"  "graphCfGO" "indexGO"   "nodesGO"   "tableCfGO"

# Physico-chemical properties of each protein
head(CfData[["dxCf"]])

# GO-MF node labels, GO-terms, of each protein
head(CfData[["tableCfGO"]])


fspetale/fgga documentation built on Jan. 29, 2024, 6:53 p.m.