In jr-packages/jrIntroBio: Jumping Rivers: Introduction to R for Biologists

A suggestion

This is just a suggestion to try and help get you into "good habits" early. If I was sitting to take this practical now, I would start a new R script file. That way all of the work that you have done associated with today's course is in one place, and the code for each of the practicals is separate from one another. This might feel a bit tedious right now, but as the amount of code you write and number of projects you take part in increases it will pay off to have a structured workflow.

Data frames

For this set of questions we will use a dataset from a 1996 paper on yeast protein localisations within the cell[^1]. Each row is a protein locus. For each locus the data contains the various features of the amino acid sequence to identify the subcellular locations and then finally the experimentally observed subcellular localisation.

To load the data into your current session you can use.

[^1]: Paul Horton & Kenta Nakai, "A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins", Intelligent Systems in Molecular Biology, 109-115. St. Louis, USA 1996.

data(yeast, package = "jrIntroBio")

(1) Use the head() function to inspect the top of the data, this can help give you a feel for what the data looks like and what variables are contained within the data? r head(yeast)

(1) How many proteins and how many variables are in this data set? r dim(yeast)

(1) Recall that if I want to pull out a single column, or variable, from a data frame we can use $. To extract the protein id's from this data set we write yeast$seq.^[If you can't remember what the names of the columns are, you can use colnames(yeast) to find out. Since this dataset is part of a package, you could also type ?yeast to get some documentation on the dataset]

(1) Using mean() and median() calculate these summary statistics for the measure on nucleus concensus patterns nuc. r mean(yeast$nuc) median(yeast$nuc)

(1) What is the lowest value for discriminant analysis[^2] on the amino acid content of vacuolar and extracellular proteins vac in the data.

[^2]: Discriminant analysis in this context is a comparison of the amino acid composition of an unknown protein in question to the amino acid composition of proteins of a known localisation. It doesn't matter exactly how this measure was derived for now, just know that a higher value means a higher degree of similarity in amino acid composition to the known proteins of that class. r min(yeast$vac)

(2) What is the maximum value for the result of discriminant analysis on the amino acid content of the 20-residue N-terminal region of mitochondrial and non-mitochondrial proteins mit? r max(yeast$mit)

(3) What is the standard deviation for vacuolar and extracellular measure vac? r sd(yeast$vac)

(4) How many proteins of each class are in the dataset? r table(yeast$class)

Solutions

Solutions to the practical questions are contained within the package

vignette("solutions2", package = "jrIntroBio")

jr-packages/jrIntroBio documentation built on Dec. 24, 2019, 8:03 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jr-packages/jrIntroBio
Jumping Rivers: Introduction to R for Biologists

In jr-packages/jrIntroBio: Jumping Rivers: Introduction to R for Biologists

A suggestion

Data frames

Solutions

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/jrIntroBio Jumping Rivers: Introduction to R for Biologists

In jr-packages/jrIntroBio: Jumping Rivers: Introduction to R for Biologists

A suggestion

Data frames

Solutions

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/jrIntroBio
Jumping Rivers: Introduction to R for Biologists