get.original.loci: '(read.CFML+)' Get original sequence positions of polymorphic...

View source: R/readCFML.R

get.original.lociR Documentation

(read.CFML+) Get original sequence positions of polymorphic loci.

Description

If you ran read.CFML on ClonalFrameML output before running treeWAS, this function can be used to identify the original sequence positions of your polymorphic loci. E.g., If treeWAS identified loci "1417.a" and "2017.g" as significant, get.original.loci can identify corresponding sequence positions "1165743" and "1741392" and return flanking sequence segments.

Usage

get.original.loci(
  seqs,
  dat,
  sig.snps.names,
  n.bp = 50,
  suff.length = 2,
  csv = TRUE,
  csv.prefix = NULL,
  NA.thresh = 0.2
)

Arguments

seqs

A DNAbin object containing the original sequences input into ClonalFrameML (see details).

dat

An object containing the output of the read.CFML function.

sig.snps.names

A character vector containing the names of polymorphic loci whose original sequence positions you desire (see details).

n.bp

An integer specifying the desired length of the flanking sequence to be returned; by default, 50 (see details).

suff.length

An integer specifying the suffix length of snps elements; by default, 2 (see details).

csv

A logical indicating whether to save the results as a CSV file.

csv.prefix

An optional character vector specifying a directory and filename prefix for the CSV file (if csv=TRUE); default name/suffix, "sig_loci.csv". Please be careful: Any existing file of that name will be overwritten!

NA.thresh

A number between 0 and 1 indicating the max allowable proportion of NAs that the output sequence fragments can contain. (if a sequence fragment from row 1 exceeds this threshold, a sufficiently complete sequence fragment will be sought in subsequent rows); by default, 0.2.

Details

seqs must contain ClonalFrameML input*, which can be read in from fasta with read.dna("FILENAME.fasta", format="fasta") (*not the ClonalFrameML output file "ML_sequence.fasta" or the seqs element of read.CFML output).

sig.snps.names can contain any set of colnames(snps), for example, the set of significant loci identified by treeWAS (out$treeWAS.combined$treeWAS.combined).

n.bp specifies the total length of flanking sequence (drawn from the first row of seqs only), half of which will be on either side of each locus in sig.snps.names. Each such sequence will be of total length n.bp+1, arranged (e.g., with n.bp = 50) as:
<—25bp—><locus.i><—25bp—>.

suff.length tells the removeLastN function how many characters are used to specify the allele in sig.snps.names and colnames(snps). For names of the form: "1234.a", suff.length = 2 (note that the decimal counts as a character). If snps names are purely numeric with no alleles indicated (i.e., they already match names in seqs), then set suff.length = 0.

Value

get.original.loci returns a list containing:

  1. loci: The original sequence positions for all polymorphic loci in seqs.

  2. loci.sig: The original sequence positions for all polymorphic loci in sig.snps.names.

  3. seq.sig: A list of length sig.snps.names containing sequence fragments of length n.bp.

Author(s)

Caitlin Collins caitiecollins@gmail.com

Examples

## Example ##
## Not run: 
fasta <- "./filename.fas"
prefix <- "/filename.fas.out"

## read in original fasta sequence:
seqs <- read.dna(fasta, format="fasta")

## load saved read.CFML output
dat <- get(load(sprintf('%s.read.CFML_dat.Rdata', prefix)))

## get sig snps from treeWAS results
sig.snps.names <- out$treeWAS.combined$treeWAS.combined

out <- get.original.loci(seqs, dat, sig.snps.names, n.bp=40, csv=T, csv.prefix="/filename")

## End(Not run)


caitiecollins/treeWAS documentation built on March 9, 2024, 3:15 p.m.