prepare_data: Converts the human proteome, as downloaded from UniProt, into...

View source: R/prepare-data.R

prepare_dataR Documentation

Converts the human proteome, as downloaded from UniProt, into a more convenient data structure to work with: a named list of strings.

Description

Also creates a few other files that are helpful for the analysis.

Usage

prepare_data(
  fasta_filename,
  trans_membrane_analysis_filename,
  protein_lengths_filename,
  proteome_as_data_filename,
  tmh_9mers_as_data_filename
)

Arguments

fasta_filename

proteome as FASTA file, for example 'proteome/UP000005640_9606.fasta.gz':

trans_membrane_analysis_filename

Filename for uhhh, something, for example 'tmh-predictions/trans-membrane-analysis-shortened.txt'

protein_lengths_filename

filename to store the length of all proteins in proteome, for example 'work/protein-lengths.txt'

proteome_as_data_filename

filename to store the proteome in R data format, for example 'work/proteome.Rdata'

tmh_9mers_as_data_filename

filename to store per TMH protein, the indices at which it is TMH, in R data format. for example 'work/tmh.9mers.Rdata'

Details

Proteome is downloaded from:

ftp://ftp.ebi.ac.uk/pub/databases/reference_proteomes/ QfO/Eukaryota/UP000005640_9606.fasta.gz

Proteome used in Bianchi et al., 2017 used:

https://github.com/richelbilderbeek/bianchi_et_al_2017/raw/master/proteome_2017/UP000005640_9606.fasta.gz

Author(s)

Richèl J.C. Bilderbeek, adapted from Johannes Textor


richelbilderbeek/bbbq documentation built on July 27, 2023, 2:15 a.m.