read_fasta: Reads a file in fasta format

read_fastaR Documentation

Reads a file in fasta format

Description

Reads a file in fasta format by line (used in both load_fasta and load_fasta2).

Usage

read_fasta(file, acc_pattern = ">([^ ]+?) .*", comment_char = "")

Arguments

file

A character string to the name of a protein fasta file.

acc_pattern

A regular expression describing the pattern to separate the header lines of fasta entries. The default is to separate a header and keep the character string before the first space where the so kept will be used as the name of an entry. The character ">" at the beginning of the header will also be removed.

comment_char

Character: a character or an empty string. Use "" to turn off the interpretation of comment lines.

See Also

write_fasta

Examples


# assume the file and location of "uniprot_hs_2020_05.fasta"
fasta <- read_fasta("~/proteoQ/dbs/fasta/uniprot/uniprot_hs_2020_05.fasta")
head(names(fasta))

# use the first fifty characters
fasta <- read_fasta("~/proteoQ/dbs/fasta/uniprot/uniprot_hs_2020_05.fasta",
                    ">(.{50}).*")
head(names(fasta))

# uniprot_acc
fasta <- read_fasta("~/proteoQ/dbs/fasta/uniprot/uniprot_hs_2020_05.fasta",
                    ">..\\|([^\\|]+)\\|.*")
head(names(fasta))

# use every in the header
fasta <- read_fasta("~/proteoQ/dbs/fasta/uniprot/uniprot_hs_2020_05.fasta",
                    ">(.*)")
head(names(fasta))



qzhang503/proteoQ documentation built on Dec. 14, 2024, 12:27 p.m.