get.original.loci | R Documentation |
(read.CFML+)
Get original sequence positions of polymorphic loci.If you ran read.CFML
on ClonalFrameML output before running treeWAS
,
this function can be used to identify the original sequence positions of your polymorphic loci.
E.g., If treeWAS
identified loci "1417.a" and "2017.g" as significant, get.original.loci
can identify corresponding sequence positions "1165743" and "1741392" and return
flanking sequence segments.
get.original.loci(
seqs,
dat,
sig.snps.names,
n.bp = 50,
suff.length = 2,
csv = TRUE,
csv.prefix = NULL,
NA.thresh = 0.2
)
seqs |
A |
dat |
An object containing the output of the |
sig.snps.names |
A character vector containing the names of polymorphic loci whose original sequence positions you desire (see details). |
n.bp |
An integer specifying the desired length of the flanking sequence to be returned; by default, 50 (see details). |
suff.length |
An integer specifying the suffix length
of |
csv |
A logical indicating whether to save the results as a CSV file. |
csv.prefix |
An optional character vector specifying a directory and
filename prefix for the CSV file (if |
NA.thresh |
A number between 0 and 1 indicating the max allowable proportion of NAs that the output sequence fragments can contain. (if a sequence fragment from row 1 exceeds this threshold, a sufficiently complete sequence fragment will be sought in subsequent rows); by default, 0.2. |
seqs must contain ClonalFrameML input*,
which can be read in from fasta with read.dna("FILENAME.fasta", format="fasta")
(*not the ClonalFrameML output file "ML_sequence.fasta" or the seqs
element of read.CFML
output).
sig.snps.names can contain any set of colnames(snps)
, for example,
the set of significant loci identified by treeWAS
(out$treeWAS.combined$treeWAS.combined
).
n.bp specifies the total length of flanking sequence
(drawn from the first row of seqs
only),
half of which will be on either side of each locus in sig.snps.names
.
Each such sequence will be of total length n.bp+1
, arranged (e.g., with n.bp = 50
) as:
<—25bp—><locus.i><—25bp—>.
suff.length tells the removeLastN
function how many characters are used to specify
the allele in sig.snps.names
and colnames(snps)
. For names of the form:
"1234.a", suff.length = 2
(note that the decimal counts as a character).
If snps
names are purely numeric with no alleles indicated
(i.e., they already match names in seqs
), then set suff.length = 0
.
get.original.loci
returns a list containing:
loci
: The original sequence positions for all polymorphic loci in seqs
.
loci.sig
: The original sequence positions for all polymorphic loci in sig.snps.names
.
seq.sig
: A list of length sig.snps.names
containing sequence fragments of length n.bp
.
Caitlin Collins caitiecollins@gmail.com
## Example ##
## Not run:
fasta <- "./filename.fas"
prefix <- "/filename.fas.out"
## read in original fasta sequence:
seqs <- read.dna(fasta, format="fasta")
## load saved read.CFML output
dat <- get(load(sprintf('%s.read.CFML_dat.Rdata', prefix)))
## get sig snps from treeWAS results
sig.snps.names <- out$treeWAS.combined$treeWAS.combined
out <- get.original.loci(seqs, dat, sig.snps.names, n.bp=40, csv=T, csv.prefix="/filename")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.