knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(onekp) library(knitr) library(magrittr)
All project with the onekp
R package start at the same place:
onekp <- retrieve_onekp() class(onekp)
The retrieve_onekp
function scrapes the metadata associated with each
transcriptome project from the 1KP public data
page. It also links each species to its
NCBI taxonomy ID (which is used later to filter by clade).
The only part of the OneKP object that you will need to interact with directly
is the @table
slot, a data.frame with the form:
onekp@table %>% head(3) %>% knitr::kable()
To get sequence, first subset the onekp@table
until it contains only the
species you want. There are several ways to do this.
You can use all the normal tools for subsetting the table directly, e.g.
onekp@table <- subset(onekp@table, family == 'Nymphaeaceae')
onekp
also has a few builtin tools for taxonomic selection
# filter by species name ('species' column of onekp@table) filter_by_species(onekp, 'Pinus radiata') # filter by species NCBI taxon ID ('tax_id' column of onekp@table) filter_by_species(onekp, 3347) # filter by clade name scientific name (get all data for the Brassicaceae family) filter_by_clade(onekp, 'Brassicaceae') # filter by clade NCBI taxon ID filter_by_clade(onekp, 3700)
Once you have chosen the studies you want, you can retrieve the protein or transcript FASTA files:
download_peptides(filter_by_clade(onekp, 'Brassicaceae')) download_nucleotides(filter_by_clade(onekp, 'Brassicaceae'))
This will download the files into a temporary directory. Alternatively, you may
set your own directory with the dir
argument. The downloaded protein FASTA
files have the extension .faa
and the DNA files the extension .fna
. The
basename for each file is the 1KP 4-letter code.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.