knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "tools/" )
pedfac
is an R-based front-end for a C-based computational program called pedigraph
.
Currently, pedfac
is available from GitHub, and, in the process of installing the package,
the binary pedigraph
for Mac OSX will be installed too.
Currently, a compiled version of pedigraph
is not available for Windows.
To install pedfac
, you can use the install_github()
function from the devtools
package:
install.packages("devtools") # if you don't already have the devtools package devtools::install_github("ngthomas/pedfac")
runPedFac takes in user's input space-separated genotype file - "geno.txt" so that it can generate all necessary secondary files for the C-script pedigree sampler to run. Once the sampling iterations are completed, it returns a summary of the MCMC output and also all sampled pedigrees.
The genotype file is a space-separated file that contains genotype and metadata information
from the individuals whose pedigree is to be determined.
Each line in the file holds information about a single individual ordered as follows:
For example, for an observed male individual- "Tom" who
the entry would look like this: 10 1 1 1986.5 1 0 3 3
A more detailed breakdown for each column/field in each individual row in the file is given here:
N N
. (NOT CLEAR THOMAS, IS THIS REALLY FOR HAPLOTYPES? DO WE USE JUST ONE COLUMN FOR ALL MARKERS? NEED TO TALK THIS OVER). 0
represents a homozygote for the major allele, 2
represents a homozygote for the minor allele, and 1
represents a heterozygote. There are still
two space-separated columns for each locus. The first is a string of comma-separated genotype classes (i.e. 0,1,2
) and
the second is a comma-separated string with their respective genotype likelihoods. For example 0,1,2 0.9,0.3,0.2
AA|AT,AC|AT,AA|AC,AT|AT 0.5,0.1,0.1,0.1
This space-separated input file -- "marker_info.txt" --- can only be given for biallelic marker data. It holds metadata information regarding the status of the genotype markers.
The first line of the file begins with the tag "name" followed by a space-separated list of the names of the loci.
In subsequent lines, one can choose to put genotyping error and allele frequency information by starting the lines with the appropriate tags, as listed below:
gerror
: genotyping error rate at the locus. A positive value from 0 to 1. It describes rates of genotyping error from a very
simplistic model, the "alpha" model: with probability of (1-a/n) the true genotype is the observed genotype and
with probability a/n the genotype is drawn randomly from the population
genotype frequencies. If genotype posterior values are given, the gerror
information is ignored. If the genotype is not specified, we default the genotype error baseline as 0.02. afreq
: allele frequency of the "1" allele (as opposed to the "0") allele. Must be a positive float value from 0 to 1. If it's not specified, the population frequency is based on all observed genotype.e.g., the file could look like this:
name SNP_1 SNP_2 SNP_3 SNP_4 gerror 0.2 0.2 0.2 0.2 afreq 0.01 0.04 0.1 0.23
-i/--inputPath: directory path that contains the input "geno.txt" and optional "marker_info.txt". String. Required. -o/--outputPath: directory path to store intermed and final output files. String. Optional. If not specified, use input path Regarding sampling: -r/--randomSeed: random seed to pass on sampler. Positive integer; a randomly generated value as default -n/--nIter: number of sampling iteration. Positive integer; 1 as default -c/--cyclicChoice: choices of handling loops. [0 (default), 1, or 2]. 0 - not allowing loops; 1 - throttle method; 2 - decimation method -f/--observeFrac: assumed sampling fraction. Float value from 0 to 1; 0.8 as default. However, if you don't want to impose any prior knowledge about sampling fraction, use the value of -1. -u/--maxUnobs: maximum number of unobserved individuals allowed in between any two individuals. Nonnegative integer; 1 as default -m/--maxGen: number of predecessor generation(s) considered beyond the earliest observed generation. Nonnegative integer; 0 as default. Setting it as 0 means that individuals of the earliest observed generation are treated as founders. Regarding specie life history: -s/--minAge: minimal age of sexual maturation or fecundity (in year). Positive float value; 1 as default -a/--maxAge: maximum age of sexual maturation or fecundity (in year). Positive float value; 1 as default Regarding genotype marker: (need to generate intermediate summary files of how markers are compressed/collapsed ) -hm/--haploMethod: Selected method in the case of handling multiallelic markers. Positive integer 0 - 2; 0 as default. 0 - taking the most informative allele whose frequency is closest to 0.5 1 - (not avail) deconstructing haplotype into a set of nucleotide units 2 - (not avail) reduce the multiallelic basis into n class of binomial switches -g/--genoerr: Assumed background genotype error rate in the form of epsilon. Float value from 0 to 1; 0.02 as default. If the genotype error row - 'gerror' of marker_info.txt is provided, this param will be overridden.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.