README.md

# pedfac — factor-graph based Bayesian pedigree inference

03 October, 2018

Requirements and Installation

This package requires python v 3.0+ to be installed in either a Linux or a Mac OS environment. Check out : conda.io/miniconda.html to install or update your local python.

After python v 3.0+ is installed, the next step is to ensure that the python package numpy is installed. Do so with

conda install numpy

Once python and its requisite packages are installed, you can install the pedfac software by cloning this GitHub repoistory, like so:

git clone https://github.com/ngthomas/pedfac.git

You can perform a test run as follows:

cd pedfac
python bin/run-pedfac -i example/case_0/ -n 5

Doing so will create an output file in:…

About run-pedfac

This Python script expects the user to provide the path to a valid space-separated genotype file named “genotype.txt” so that it can generate all necessary secondary files for the C-script pedigree sampler to run. Once the sampling iterations are completed, it returns a summary of the MCMC output and also details of the sampled pedigrees.

The genotype file:

The genotype file is a space-separated file that contains genotype and metadata information from the individuals whose pedigree is to be determined. Each line in the file holds information about a single individual ordered as follows:

For example, for an observed male individual who

the entry would look like this: 10 1 1 1986.5 1 0 3 3

A more detailed breakdown for each column/field in each individual row in the file is given here:

Optional marker input file: “marker_info.txt”:

This space-separated input file – “marker_info.txt” — can only be given for biallelic marker data. It holds metadata information regarding the status of the genotype markers.

The first line of the file begins with the tag “name” followed by a space-separated list of the names of the loci.

In subsequent lines, one can choose to put genotyping error and allele frequency information by starting the lines with the appropriate tags, as listed below:

e.g., the file could look like this:

name SNP_1 SNP_2 SNP_3 SNP_4  
gerror 0.2 0.2 0.2 0.2  
afreq 0.01 0.04 0.1 0.23  

About this wrapper parameters:

-i/--inputPath: directory path that contains the input "geno.txt" and optional "marker_info.txt". String. Required.
-o/--outputPath: directory path to store intermed and final output files. String. Optional. If not specified, use input path

Regarding sampling:
-r/--randomSeed: random seed to pass on sampler. Positive integer; a randomly generated value as default
-n/--nIter: number of sampling iteration. Positive integer; 1 as default
-c/--cyclicChoice: choices of handling loops. [0 (default), 1, or 2].
    0 - not allowing loops;
    1 - throttle method;
    2 - decimation method

-f/--observeFrac: assumed sampling fraction. Float value from 0 to 1; 0.8 as default. However, if you don't want to impose any prior knowledge about sampling fraction, use the value of -1.

-u/--maxUnobs: maximum number of unobserved individuals allowed in between any two individuals. Nonnegative integer; 1 as default
-m/--maxGen: number of predecessor generation(s) considered beyond the earliest observed generation. Nonnegative integer; 0 as default. Setting it as 0 means that individuals of the earliest observed generation are treated as founders.

Regarding specie life history:
-s/--minAge: minimal age of sexual maturation or fecundity (in year). Positive float value; 1 as default
-a/--maxAge: maximum age of sexual maturation or fecundity (in year). Positive float value; 1 as default

Regarding genotype marker: (need to generate intermediate summary files of how markers are compressed/collapsed )
-hm/--haploMethod: Selected method in the case of handling multiallelic markers. Positive integer 0 - 2; 0 as default.
    0 - taking the most informative allele whose frequency is closest to 0.5
    1 - (not avail) deconstructing haplotype into a set of nucleotide units
    2 - (not avail) reduce the multiallelic basis into n class of binomial switches
-g/--genoerr: Assumed background genotype error rate in the form of epsilon. Float value from 0 to 1; 0.02 as default. If the genotype error row - 'gerror' of marker_info.txt is provided, this param will be overridden.

version of pedigraph

the latest exec file is of version b369e18d555901f716437d626f5ff75cf5a9a719



ngthomas/pedfac documentation built on Dec. 10, 2020, 6:28 p.m.