split_fasta: Split a fasta formatted file.

Description Usage Arguments Value Examples

Description

The function splits a fasta formatted file to a defined number of smaller .fasta files for further processing.

Usage

1
2
3
4
5
6
7
8
split_fasta(
  path_in,
  path_out,
  num_seq = 20000,
  trim = FALSE,
  trunc = NULL,
  id = FALSE
)

Arguments

path_in

A path to the .FASTA formatted file that is to be processed

path_out

A path where the resulting .FASTA formatted files should be stored. The path should also contain the prefix name of the fasta files on which _n (integer from 1 to number of fasta files generated) will be appended along with the extension ".fa"

num_seq

Integer defining the number of sequences to be in each resulting .fasta file. Defaults to 20000.

trim

Logical, should the sequences be trimmed to 4000 amino acids to bypass the CBS server restrictions. Defaults to FALSE.

trunc

Integer, truncate the sequences to this length. First 1:trunc amino acids will be kept.

id

Logical, should the protein id's be returned. Defaults to FALSE.

Value

if id = FALSE, A Character vector of the paths to the resulting .FASTA formatted files.

if id = TRUE, A list with two elements:

id

Character, protein identifiers.

file_list

Character, paths to the resulting .FASTA formatted files.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 
library(ragp)
#create a fasta file to be processed, not needed if the input file is already present
data(at_nsp)
library(seqinr)
write.fasta(sequence = strsplit(at_nsp$sequence, ""),
            name = at_nsp$Transcript.id,
            file = "at_nsp.fasta")

#assumes input/output file are in working directory:
file_paths <- split_fasta(path_in = "at_nsp.fasta",
                          path_out = "at_nsp_split",
                          num_seq = 500)

## End(Not run)

missuse/ragp documentation built on Jan. 4, 2022, 10:49 a.m.