join2PAM: Predict Protospacer Adjacent Motifs from Joined Spacer and...

View source: R/join2PAM.R

join2PAMR Documentation

Predict Protospacer Adjacent Motifs from Joined Spacer and Alignment Data

Description

This function allows you to filter alignment data and then predict protospacer adjacent motifs. Predicted motifs are scored based on number of alignments and the amount of information the predicted PAM encodes relative to the maximum possible information encoded.

Usage

join2PAM(
  joinedData,
  uniqueAlignsRange = T,
  excludeSelfRange = T,
  numGapsRange = 0,
  e.valueRange = 1,
  nucleotidesShorterThanProtospacerRange = 1,
  queryStartRange = 2,
  prophageOnlyRange = F,
  flankLength = 10,
  RangeStart = 1,
  saveLogo = T,
  savePAMSeqs = F,
  removeFASTA = T,
  collectionFrameExist = F
)

Arguments

joinedData

The dataframe containing both CRISPR array spacer data and alignment data. Oftentimes will be the output of the joinSpacerDFandAlignmentDF function.

uniqueAlignsRange

A vector setting the range of values determining unique alignment filter criteria. The abundance of certain sequences can be biased by representation in the database. Given a TRUE input, this filter will only keep one instance of alignments that are from the same organism and align to the same spacer. Values can be TRUE or FALSE. Defaults to TRUE.

excludeSelfRange

A vector setting the range of values determining whether the organism encoding the CRISPR array is excluded. The most abundant alignment will likely be to the organism encoding the CRISPR array analyzed. Given a TRUE input, this filter will remove all alignmens to the organism that encodes the CRISPR array being analyzed. Values can be TRUE of FALSE. Defaults to TRUE.

numGapsRange

A vector setting the range of values determining the maximum number of alignment gaps permissable in an alignment. Alignments with equal or fewer number of alignment gaps as the input will pass through the filter. Values can range from 0 to the length of the CRISPR array. Defaults to 0.

e.valueRange

A vector setting the range of values determining the maximum e-value permissable for an alignment. Alingments with an equal or lower e-value will pass through the filter. Values must be positive. Defaults to 0.05.

nucleotidesShorterThanProtospacerRange

A vector setting the range of values determining the maximum length that an alignment can be shorter than a spacer. Alignments with a length less than the spacer length minus the value will be filtered out. Values can range from 0 to the length of the spacer. Defaults to 0.

queryStartRange

A vector setting the range of values determining the maximum nucleotide position in ths spacer that the alignment can start. Alignments that start 3' of the indicated position will be filtered out. Values can range from 1 to the length of the spacer. Defaults to 1.

prophageOnlyRange

A vector setting the range of values determining whether prophage content is used as a filter criteria. Given a TRUE input, only alignments that are located within predicted prophage regions will pass through the filter. Values can be TRUE or FALSE. Defaluts to FALSE.

flankLength

A number indicating the number of length of DNA sequence to be searched for a PAM. This value influences the PAM score calculation. Defaults to 10.

RangeStart

A number indicating what filter combination in the filter criteria space the function should start at. Defaults to 1.

saveLogo

A value that determines whether the sequence logos generated are saved. Given a TRUE input, a PDF of the sequence logo is saved to the working directory. Defaults to TRUE.

savePAMSeqs

A value that determines whether the vector of sequences used to build the sequence logo are saved. Given a TRUE input, the vectors of upstream and downstream sequences are assigned to the global environment as upstreamPAMSeqs and downstreamPAMSeqs, respectively. Defaults to FALSE.

removeFASTA

A value that determines whether the temporary FASTA file from eFetch is deleted from the working directory. Given a TRUE input, the FASTA file is deleted. Defaults to TRUE.

collectionFrameExist

A value that determines whether a new collectionFrame object will be generated or whether one exists. Given a TRUE input, data will be added to the extant collectionFrame and may write over existing data. This is useful when running a set of filter criteria is interupted and the user wants to continue from the interuption. Defaults to FALSE.

Examples

If you wanted to search the prediction space over the e-value cut-offs 0.01, 0.05, and 0.1 keeping the rest of the values at default and resulting in 3 total predictions, you would enter:
join2PAM(joinedData = nameOfJoinedDataframe, e.valueRange = c(0.01, 0.05, 0.1))

If you wanted to search the prediction space with and without prophage prediction keeping the rest of the values at default and resulting in 2 total predictions, you would enter:
join2PAM(joinedData = nameOfJoinedDataframe, prophageOnlyRange = c(TRUE, FALSE))

If you wanted to search the prediction space combine the searhes above, resulting in 6 total predictions, you would enter:
join2PAM(joinedData = nameOfJoinedDataframe, e.valueRange = c(0.01, 0.05, 0.1), prophageOnlyRange = c(TRUE, FALSE))

grybnicky/Spacer2PAM documentation built on Jan. 30, 2023, 2:55 a.m.