process_tmhmm: Create a data frame with the membrane locations of each amino...

View source: R/process_tmhmm.R

process_tmhmmR Documentation

Create a data frame with the membrane locations of each amino acid in a sequence

Description

This function creates a data frame with columns containing transcript IDs and corresponding output from tmhmm. The tmhmm output includes a location for each amino acid, with O and o representing extracellular, M representing transmembrane, and i representing intracellular. The data frame includes columns with the transcript ID, membrane location, gene name, starting amino acid, and ending amino acid for a certain transcript. The first row for each transcript contains the overall length of the amino acid sequence.

Usage

process_tmhmm(topo, AA_seq)

Arguments

topo

A data frame containing each transcript ID and the corresponding membrane location for each amino acid in its sequence formatted as a string.

AA_seq

A data frame containing the gene names, transcript IDs, APPRIS annotations, and protein sequences for each transcript.

Value

A data frame containing the overall length and individual lengths of each section of the surface protein corresponding to a certain transcript.


EliLillyCo/surfaltr documentation built on May 3, 2022, 10:12 a.m.