View source: R/parse_methylation_from_fastq.R
| write_modified_fastq | R Documentation |
This function takes a dataframe containing DNA modification information
(e.g. produced by read_modified_fastq()) and writes it back to modified
FASTQ, equivalent to what would be produced via samtools fastq -T MM,ML.
Arguments give the names of columns within the dataframe from which to read.
If multiple types of modification have been assessed (e.g. both methylation
and hydroxymethylation), then multiple colnames must be provided for locations
and probabilites, and multiple prefixes (e.g. "C+h?") must be provided.
IMPORTANT: These three vectors must all be the same length, and the modification
types must be in a consistent order (e.g. if writing hydroxymethylation and methylation
in that order, must do H then M in all three vectors and never vice versa).
If quality isn't known (e.g. there was a FASTA step at some point in the pipeline),
the quality argument can be set to NA to fill in quality scores with "B". This
is the same behaviour as SAMtools v1.21 when converting FASTA to SAM/BAM then FASTQ.
I don't really know why SAMtools decided the default quality should be "B" but there
was probably a reason so I have stuck with that.
Default arguments are set up to work with the included example_many_sequences data.
write_modified_fastq(
dataframe,
filename = NA,
read_id_colname = "read",
sequence_colname = "sequence",
quality_colname = "quality",
locations_colnames = c("hydroxymethylation_locations", "methylation_locations"),
probabilities_colnames = c("hydroxymethylation_probabilities",
"methylation_probabilities"),
modification_prefixes = c("C+h?", "C+m?"),
include_blank_tags = TRUE,
return = FALSE
)
dataframe |
|
filename |
|
read_id_colname |
|
sequence_colname |
|
quality_colname |
|
locations_colnames |
|
probabilities_colnames |
|
modification_prefixes |
|
include_blank_tags |
|
return |
|
character vector. The resulting modified FASTQ file as a character vector of its constituent lines (or invisible(NULL) if return is FALSE). This is probably mostly useful for debugging, as setting filename within this function directly writes to FASTQ via writeLines(). Therefore, defaults to returning invisible(NULL).
## Write to FASTQ (using filename = NA, return = FALSE
## to view as char vector rather than writing to file)
write_modified_fastq(
example_many_sequences,
filename = NA,
read_id_colname = "read",
sequence_colname = "sequence",
quality_colname = "quality",
locations_colnames = c("hydroxymethylation_locations",
"methylation_locations"),
probabilities_colnames = c("hydroxymethylation_probabilities",
"methylation_probabilities"),
modification_prefixes = c("C+h?", "C+m?"),
return = TRUE
)
## Write methylation only, and fill in qualities with "B"
write_modified_fastq(
example_many_sequences,
filename = NA,
read_id_colname = "read",
sequence_colname = "sequence",
quality_colname = NA,
locations_colnames = c("methylation_locations"),
probabilities_colnames = c("methylation_probabilities"),
modification_prefixes = c("C+m?"),
return = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.