split_fasta: Split a fasta formatted file.

View source: R/split_fasta.R

split_fastaR Documentation

Split a fasta formatted file.

Description

The function splits a fasta formatted file to a defined number of smaller .fasta files for further processing.

Usage

split_fasta(
  path_in,
  path_out,
  num_seq = 20000,
  trim = FALSE,
  trunc = NULL,
  id = FALSE
)

Arguments

path_in

A path to the .FASTA formatted file that is to be processed.

path_out

A path where the resulting .FASTA formatted files should be stored. The path should also contain the prefix name of the fasta files on which _n (integer from 1 to number of fasta files generated) will be appended along with the extension ".fa"

num_seq

Integer defining the number of sequences to be in each resulting .fasta file. Defaults to 20000.

trim

Logical, should the sequences be trimmed to 4000 amino acids to bypass the CBS server restrictions. Defaults to FALSE.

trunc

Integer, truncate the sequences to this length. First 1:trunc amino acids will be kept.

id

Logical, should the protein id's be returned. Defaults to FALSE.

Value

if id = FALSE, A Character vector of the paths to the resulting .FASTA formatted files.

if id = TRUE, A list with two elements:

id

Character, protein identifiers.

file_list

Character, paths to the resulting .FASTA formatted files.


EliLillyCo/surfaltr documentation built on May 3, 2022, 10:12 a.m.