findPatternPos: Function to find positions of the nucleotide patterns in the...
In BUMHMM: Computational pipeline for computing probability of modification from structure probing experiment data

Description Usage Arguments Details Value Error Author(s) References See Also Examples

This function finds all occurrences of a nucleotide pattern in the sequence. For each occurrence, the function returns the index of the middle nucleotide, computed as: ceiling(length(pattern) / 2). The function supports data for the plus and minus DNA strands; for the minus strand, all patterns are turned to complementary sequence.

1	findPatternPos(patterns, sequence, strand)

`patterns`	A list of nucleotide permutations of length `n`, as returned by `nuclPerm`.
`sequence`	A `DNAString` object storing the reference genomic sequence to search for the patterns in. The sequence corresponding to plus strand is expected.
`strand`	A character, indicating the plus (`+`) or minus strand (`-`). For the minus strand, the occurrences found for a particular pattern will be attributed to the pattern with complementary sequence.

This function uses stringi::stri_locate_all_fixed().

This function aims to assist with addressing sequence bias in structure probing data. The sequence in the neighbourhood of a nucleotide is assumed to have an effect on its structural state. By considering sequence patterns of a certain length (specified by the user), this function finds indices of the middle nucleotide of each pattern's occurrences within the sequence. We then separately analyse the nucleotides occurring in the middle of each pattern, taking into account sequence dependency.

This function returns a list where each component corresponds to a pattern (indicated by the field names) and contains indices of the middle nucleotides of that pattern's occurrences within the sequence.

The following errors are returned if:

"Strand should be either plus or minus, specified with a sign." strand is not specified as "+" or "-";

"The sequence should be non-empty." provided sequence is empty;

"The list of patterns should be non-empty." the list of patterns to search for in the sequence is empty.

Alina Selega, Sander Granneman, Guido Sanguinetti

Selega et al. "Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments", Nature Methods (2016).

See also nuclPerm.

    library(SummarizedExperiment)

    ## Extract the DNA sequence from se
    sequence <- subject(rowData(se)$nucl)

    ## Generate patterns of length 3
    n <- 3
    patterns <- nuclPerm(n)

    ## Find positions of pattern occurrences
    nuclPosition <- findPatternPos(patterns, sequence, '+')