View source: R/merge_with_metadata.R
| reverse_locations_if_needed | R Documentation |
merge_methylation_with_metadata() helper)This function takes a vector of condensed modification locations/indices (e.g.
c("3,6,9,12", "1,4,7,10")), a vector of directions (which must all be either
"forward" or "reverse", not case-sensitive), and a vector of sequence lengths
(integers).
Returns a vector of condensed locations where reads that were originally forward
are unchanged, and reads that were originally reverse are flipped to now be forward.
Optionally, a numerical offset can be set. If this is left at 0 (the default value),
then a CpG assessed for methylation would be reverse-complemented to a CG with the
modification information ascribed to the G (as the G is at the location where the actual
modified C was on the other strand). However, setting the offset to 1 would shift all
of the modification indices by 1 such that the modification is now ascribed to the C of the
reverse-strand CG. This is beneficial for visualising the modifications as it ensures consistency
between originally-forward and originally-reverse strands by making the modification score associated
with each CpG site always be located at the C, but may be misleading for quantitative analysis.
Setting the offset to anything other than 0 or 1 should work but may be biologically misleading,
so produces a warning.
Called by merge_methylation_with_metadata() to create a forward dataset, alongside
reverse_sequence_if_needed(), reverse_quality_if_needed(), and reverse_probabilities_if_needed().
Example:
Forward sequence, with indices of Cs in CpGs numbered:
C C C A G G C G G C G G C G A C C G A
7 10 13 17
(length = 19, locations = "7,10,13,17", CpGs = 7-8, 10-11, 13-14, 17-18)
Reverse sequence, with indices of C in CpGs numbered:
T C G G T C G C C G C C G C C T G G G 2 6 9 12
(length = 19, locations = "2,6,9,12", CpGs = 2-3, 6-7, 9-10, 12-13)
As CG reverse-complements to itself, each CpG site has a 1:1 correspondence with
a CpG site in the reverse strand. Many methylation calling models assess C-methylation
at the C of each CpG. To map the locations from C to C, we take 19 - <index> such that
"7,10,13,17" becomes "12,9,6,2" and "2,6,9,12" becomes "17,13,10,7".
The symmetry of CpGs means mapping from C to C is also symmetric.
This is achieved by setting offset = 1, as mapping C to C involves shifting position by 1.
Conversely, to map the locations from C to G (i.e. preserving the actual location of each
modification, which is required if assessed locations are non-symmetric/don't reverse-complement
to themselves like CpGs do), we take 20 - <index> such that
"7,10,13,17" becomes "13,10,7,3" i.e. the indices of the Gs in CpGs in the reverse
sequence. Likewise "2,6,9,12" becomes "18,14,11,8" i.e. the indices of the Gs in CpGs in
the forward sequence.
This is achieved by setting offset = 0, as mapping C to G preserves the actual original position
at which each modification was assessed, but changes the base to its complement.
In general, new locations are calculated as (<length> + 1 - <offset>) - <index>.
Of course, output locations are reversed before returning so that they all
return in ascending order, as is standard for all location vectors/strings.
If wanting to write reversed sequences to FASTQ via write_modified_fastq(), locations
must be symmetric (e.g. CpG) and offset must be set to 1. Asymmetric locations are impossible
to write to modified FASTQ once reversed because then e.g. cytosine methylation will be assessed
at guanines, which SAMtools can't account for. Symmetrically reversing CpGs via offset = 1 is
the only way to fix this.
reverse_locations_if_needed(
locations_vector,
direction_vector,
length_vector,
offset = 0
)
locations_vector |
|
direction_vector |
|
length_vector |
|
offset |
|
character vector. A vector of all forward versions of the input locations vector.
reverse_locations_if_needed(
locations_vector = c("7,10,13,17", "2,6,9,12"),
direction_vector = c("forward", "reverse"),
length_vector = c(19, 19),
offset = 0
)
reverse_locations_if_needed(
locations_vector = c("7,10,13,17", "2,6,9,12"),
direction_vector = c("forward", "reverse"),
length_vector = c(19, 19),
offset = 1
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.