yeast: Yeast protein localisations.

Description Usage Format Source

Description

A dataset containing the subcellular protein localisations along with several amino acid sequence based metrics used to make a classification of the localisation. 'yeast' contains the full dataset while 'yeast_classes' is simply 'class' collumn from the 'yeast' dataset as a vector, for convenience in the practical excercise.

Usage

1

Format

A data frame with 1484 rows and 10 variables:

seq

Accession number for the SWISS-PROT database

mcg

McGeoch's method for signal sequence recognition

gvh

von Heijne's method for signal sequence recognition

alm

Score of the ALOM membrane spanning region prediction program

mit

Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins

erl

Presence of "HDEL" substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute

pox

Peroxisomal targeting signal in the C-terminus

vac

Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins

nuc

Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins

class

Experimentally observed subcellular localisations.

Class Distribution. The class is the localization site.

CYT (cytosolic or cytoskeletal) 463

NUC (nuclear) 429

MIT (mitochondrial) 244

ME3 (membrane protein, no N-terminal signal) 163

ME2 (membrane protein, uncleaved signal) 51

ME1 (membrane protein, cleaved signal) 44

EXC (extracellular) 37

VAC (vacuolar) 30

POX (peroxisomal) 20

ERL (endoplasmic reticulum lumen) 5

...

Source

Paul Horton & Kenta Nakai, ["A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins"] (https://www.aaai.org/Papers/ISMB/1996/ISMB96-012.pdf), Intelligent Systems in Molecular Biology, 109-115. St. Louis, USA 1996.


jr-packages/jrIntroBio documentation built on Dec. 24, 2019, 8:03 a.m.