build_fastas: Create a FASTA file with sequence fragments

build_fastasR Documentation

Create a FASTA file with sequence fragments

Description

From (potential) sites and their surrounding amino acids, create a FASTA-conforming file. Requires a column of unique header values for each site.

Usage

build_fastas(
  data,
  path,
  name_col,
  seq_col,
  header_pattern = "{.data[[name_col]]}|{get_middle_fragment(.data[[seq_col]], 7)}"
)

Arguments

data

Data frame in long format, containing unique names and sequence windows.

path

Path to write the file to, including file name and extension.

name_col

Name of the column that contains metadata about the sequence. Used as a part of header_pattern by default If missing, uses header_pattern to construct unique IDs.

seq_col

Name of the column that contains the sequence windows.

header_pattern

Pattern for use in glue::glue(), used to construct a new unique header column based on current columns. Default uses name_col and appends 5 aa window around site for unique identification by read_netphorest(). Any whitespace will be replaced by underscores. If set to NULL, header_pattern will just be 'name_col'.

Value

Returns the input data with new headers included as fasta_id, invisibly.

Examples

kinsub_path <- system.file('extdata', 'kinase_substrate_dataset_head', package = 'phosphocie')
kinsub <- read_kinsub(kinsub_path)
tmp <- tempfile()

build_fastas(kinsub, tmp, name_col = 'unique_id', seq_col = 'fragment_15')

build_fastas(kinsub, tmp, seq_col = 'fragment_15', header_pattern = '{acc_id}|{gene}|{substrate}|{residue}{position}')

casblaauw/phosphocie documentation built on March 30, 2022, 8:28 p.m.