View source: R/merge_with_metadata.R
| merge_methylation_with_metadata | R Documentation |
Merge a dataframe of methylation/modification data (as produced by
read_modified_fastq()) with a dataframe of metadata, reversing
sequence and modification information if required such that all information
is now in the forward direction.
merge_fastq_with_metadata() is the equivalent function for working with
unmodified FASTQs (sequence and quality only).
Methylation/modification dataframe must contain columns of "read" (unique read ID),
"sequence" (DNA sequence), "quality" (FASTQ quality score), "sequence_length"
(read length), "modification_types" (a comma-separated string of SAMtools modification
headers produced via vector_to_string() e.g. "C+h?,C+m?"), and,
for each modification type, a column of comma-separated strings of modification
locations (e.g. "3,6,9,12") and a column of comma-separated strings of
modification probabilities (e.g. "255,0,64,128"). See read_modified_fastq()
for more information on how this dataframe is formatted and produced.
Other columns are allowed but not required, and will be preserved unaltered
in the merged data.
Metadata dataframe must contain "read" (unique read ID) and "direction"
(read direction, either "forward" or "reverse" for each read) columns,
and can contain any other columns with arbitrary information for each read.
Columns that might be useful include participant ID and family designations
so that each read can be associated with its participant and family.
Important: A key feature of this function is that it uses the direction
column from the metadata to identify which rows are reverse reads. These reverse
reads will then be reversed-complemented and have modification information reversed
such that all reads are in the forward direction, ideal for consistent analysis or
visualisation. The output columns are "forward_sequence", "forward_quality",
"forward_<modification_type>_locations", and "forward_<modification_type>_probabilities".
Calls reverse_sequence_if_needed(), reverse_quality_if_needed(),
reverse_locations_if_needed(), and reverse_probabilities_if_needed()
to implement the reversing - see documentation for these functions for more details.
If wanting to write reversed sequences to FASTQ via write_modified_fastq(), locations
must be symmetric (e.g. CpG) and offset must be set to 1. Asymmetric locations are impossible
to write to modified FASTQ once reversed because then e.g. cytosine methylation will be assessed
at guanines, which SAMtools can't account for. Symmetrically reversing CpGs via
reversed_location_offset = 1 is the only way to fix this.
merge_methylation_with_metadata(
methylation_data,
metadata,
reversed_location_offset = 0,
reverse_complement_mode = "DNA"
)
methylation_data |
|
metadata |
|
reversed_location_offset |
|
reverse_complement_mode |
|
dataframe. A merged dataframe containing all columns from the input dataframes, as well as forward versions of sequences, qualities, modification locations, and modification probabilities (with separate locations and probabilities columns created for each modification type in the modification data).
## Locate files
modified_fastq_file <- system.file("extdata",
"example_many_sequences_raw_modified.fastq",
package = "ggDNAvis")
metadata_file <- system.file("extdata",
"example_many_sequences_metadata.csv",
package = "ggDNAvis")
## Read files
methylation_data <- read_modified_fastq(modified_fastq_file)
metadata <- read.csv(metadata_file)
## Merge data (including reversing if needed)
merge_methylation_with_metadata(methylation_data, metadata, reversed_location_offset = 0)
## Merge data with offset = 1
merge_methylation_with_metadata(methylation_data, metadata, reversed_location_offset = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.