harmonize_genotypes: Genotype harmonization

Description Usage Arguments Details Value

View source: R/harmonize_genotypes.R

Description

Harmonization of genotype data stored using different file formats with different and potentially unknown strands. Linkage disequilibrium (LD) patterns are used to determine the correct strand GC and AT SNPs. This is a simple wrapper for GenotypeHarmonizer.

Usage

1
2
3
4
5
6
7
harmonize_genotypes(input, ref, output, input.type, ref.type, output.type,
  input.prob, force.chr, call.rate.filter, chr.filter, hwe.filter,
  maf.filter, sample.filter.list, variant.filter.list, mach.r2.filter,
  variant.pos.filter.list, ambiguous.snp.filter = FALSE,
  update.id = FALSE, min.ld, min.variants, variants, check.ld = FALSE,
  maf.align, update.reference.allele = FALSE, keep = FALSE,
  exec = "GenotypeHarmonizer")

Arguments

input

[string]
The base path of the data to align. The extensions are determined based on the input data type.

ref

[string]
The base path of the reference data used for alignment. The extensions are determined based on the input data type. If not specified the input data is simply converted to the specified output type.

output

[string]
The base path of the output data.

input.type

[string]
The input data type. If not defined will attempt to automatically select the first matching dataset on the specified path.

ref.type

[string]
The input data type. If not defined will attempt to automatically select the first matching dataset on the specified path.

output.type

[string]
The output data type. Defaults to –input.type.

input.prob

[number]
The minimum posterior probability to call genotypes in the input data. Defaults to 0.4.

force.chr

[int or X, Y, MT]
SHAPEIT2 does not output the sequence name in the first column of the haplotype file and for GEN files this can also be the case. Use this option to force the chromosome for all variants. This option is only valid in combination with input.type SHAPEIT2 and input.type GEN.

call.rate.filter

[number]
The minimum call rate to include variant from input data.

chr.filter

[int or X, Y, MT]
Filter input data on chromosome.

hwe.filter

[number]
The minimum hardy weinberg equilibrium p-value to include variant from input data.

maf.filter

[number]
The minimum minor allele frequency to include variant from input data.

sample.filter.list

[string]
Path to file with samples IDs to include from input data. For plink data and oxford sample files only the sample id (column 2) is used.

variant.filter.list

[string]
Path to file with variant IDs to include from input data.

mach.r2.filter

[number]
The minimum MACH R2 measure to include SNPs.

variant.pos.filter.list

[string]
Path to file with variant CHR\tPOS or CHR:POS to include from input data.

ambiguous.snp.filter

[flag]
Filter out ambiguous SNPs (A/T, C/G) SNPs.

update.id

[flag]
Update the variant identifiers using the reference data. The identifiers of the output data will be the same as the reference data.

min.ld

[number]
The minimum LD (r^2) between the variant to align and potential supporting variants. Defaults to 0.3.

min.variants

[int]
The minimum number of supporting variant before before we can do an alignment. Defaults to 3.

variants

[int]
Number of flanking variants to consider. Defaults to 100.

check.ld

[flag]
Also check the LD structure of non AT and non GC variants. Variants that do not pass the check are excluded.

maf.align

[number]
If there are not enough variants in LD and the minor allele frequency (MAF) of a variant <= the specified value in both study as in reference then the minor allele can be used as a backup for alignment. Defaults to 0.

update.reference.allele

[flag]
Make sure the output data uses the same reference allele as the reference data set.

keep

[flag]
Keep variants in input file if not present in reference file.

exec

[string]
Path of GenotypeHarmonizer executable. You can also give a JAVA call: java -Xmx5g -jar <path/to/GenotypeHarmonizer.jar>.

Details

https://github.com/molgenis/systemsgenetics/wiki/Genotype-Harmonizer

Value

Captured system output as character vector.


imbs-hl/imbs documentation built on Sept. 6, 2019, 11:05 p.m.