Description Usage Arguments Value Examples
Perform pre-analysis data processing of PLINK formatted unphased haplotype data,
including removal of SNPs and samples with high proportions of missing data, SNPs with low minor
allele frequencies and SNPs in high linkage disequilibrium (LD, based on R^2 if model=2
).
Also, calculate population allele frequencies as well as haplotype frequencies between pairs of
SNPs (based on R^2 if model=2
).
1 2 3 4 | getGenotypes(ped.map, reference.ped.map = NULL, snp.ld = NULL, model = 1,
maf = 0.01, sample.max.missing = 0.1, snp.max.missing = 0.1,
maximum.ld.r2 = 0.99, chromosomes = NULL, input.map.distance = "M",
reference.map.distance = "M")
|
ped.map |
a list with 2 objects:
|
reference.ped.map |
a list containing reference data used to calculate population allele
frequencies and haplotype frequencies, in the same format as |
snp.ld |
optional for
where each row contains the LD information for a single pair of SNPs. The data frame should contain the header
|
model |
an integer of either 1 or 2 denoting which of the two models should be run.
|
maf |
the smallest minor allele frequency allowed in the analysis. The default value is 0.01. |
sample.max.missing |
the maximum proportion of missing data allowed for each sample. The default value is 0.1. |
snp.max.missing |
the maximum proportion of missing data allowed for each SNP. The default value is 0.1. |
maximum.ld.r2 |
the maximum linkage disequilibrium R2 value allowed between pairs of SNPs.The default value is 0.99. |
chromosomes |
a numeric vector containing a subset of chromosomes to perform genotype filtering on. The
default is |
input.map.distance |
either "M" or "cM" denoting whether the genetic map distances in
|
reference.map.distance |
either "M" or "cM" denoting whether the genetic map distances in
|
A named list of three objects:
A pedigree containing the samples that remain after filtering. The pedigree is the first six columns
of the PED file and these columns are
headed fid, iid, pid, mid, sex
and aff
, respectively.
A data frame with the first five columns:
Chromosome (type "character"
, "numeric"
or "integer"
)
SNP identifiers (type "character"
)
Genetic map distance (Morgans, M) (type "numeric"
)
Base-pair position (type "numeric"
or "integer"
)
Population allele frequency (type "numeric"
)
where each row describes a single marker. These columns are headed chr, snp_id, pos_M, pos_bp
and freq
respectively.
If model=2
then the following columns are also included:
Numeric ID of condition SNP (type "numeric"
or "integer"
)
Haplotype probability: pba (type "numeric"
)
Haplotype probability: pbA (type "numeric"
)
Haplotype probability: pBa (type "numeric"
)
Haplotype probability: pBA (type "numeric"
)
Population allele frequency on the condition SNP (type "numeric"
)
with the headers condition_snp, pba, pbA, pBa, pBA
and freq_condition_snp
.
The remaining columns contain the genotype data for each sample, where a single column corresponds to a single sample. These columns are
labeled with merged family IDs and individual IDs separated by a slash symbol (/).
The model selected.
The list is named pedigree, genotypes
and model
respectively.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # look at the simulated data
str(example_pedmap)
# format and filter the example data using model 2 and reference data
my_genotypes <- getGenotypes(ped.map = example_pedmap,
reference.ped.map = example_reference_pedmap,
snp.ld = example_reference_ld,
model = 2,
maf = 0.01,
sample.max.missing = 0.1,
snp.max.missing = 0.1,
maximum.ld.r2 = 0.99,
chromosomes = NULL,
input.map.distance = "M",
reference.map.distance = "M")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.