df_to_fasta | R Documentation |
Converts a data frame containing sequence IDs and sequences into a FASTA-formatted file, optionally compressed as gzip.
df_to_fasta(df, id_col, seq_col, output_file = NULL, gzip = TRUE, verbose = TRUE)
df |
A data frame with at least two columns: one for sequence IDs and one for sequences. |
id_col |
A character string specifying the column name containing sequence IDs. |
seq_col |
A character string specifying the column name containing sequence data. |
output_file |
A character string specifying the output file path. If NULL, the function will stop with an informative message. |
gzip |
A logical indicating whether to compress the output as a gzip file. Defaults to |
verbose |
A logical indicating whether to print progress messages. Defaults to |
This function efficiently writes large sequence datasets to FASTA format, handling compression and progress reporting. It validates input columns and manages memory by processing data in chunks.
No return value. Writes a FASTA file to the specified path.
temp_dir <- tempdir()
temp_output <- file.path(temp_dir, "output.fa.gz")
seq_data <- data.frame(
transcript_id = c("ENST0001", "ENST0002"),
sequence = c("ATGCTAGCTAG", "GCTAGCTAGCT")
)
df_to_fasta(seq_data, "transcript_id", "sequence", temp_output)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.