estimate_ethnicity: Format VCF files and compute the genomic components (and some...

View source: R/estimate_ethnicity.R

estimate_ethnicityR Documentation

Format VCF files and compute the genomic components (and some figures) for ethnicity.

Description

Format VCF files and compute the genomic components (and some figures) for ethnicity.

Usage

estimate_ethnicity(
  cohort_name,
  input_vcfs,
  input_type,
  output_directory,
  ref1kg_vcfs,
  ref1kg_population,
  ref1kg_maf = 0.05,
  splitted_by_chr = TRUE,
  quality_tag = NULL,
  quality_threshold = 0.9,
  recode = "all",
  vcf_half_call = "missing",
  n_comp = 10,
  n_cores = 6,
  bin_path = list(vcftools = "/usr/bin/vcftools", bcftools = "/usr/bin/bcftools", bgzip
    = "/usr/bin/bgzip", tabix = "/usr/bin/tabix", plink = "/usr/bin/plink1.9")
)

Arguments

cohort_name

A character. A name to describe the studied population compared to 1,000 Genomes.

input_vcfs

A character. A path to one or several VCFs file.

input_type

A character. Either "array" or "sequencing".

output_directory

A character. The path where the data and figures is written.

ref1kg_vcfs

A character. A path to the reference VCFs files (i.e., 1,000 Genomes sequencing data).

ref1kg_population

A character. A file which describe samples and their ethnicity.

ref1kg_maf

A numeric. MAF threshold for SNPs in 1,000 Genomes

splitted_by_chr

A logical. Is the VCFs files splitted by chromosome?

quality_tag

A character. Name of the imputation quality tag for "array", e.g., "INFO" or "R2". Default is NULL.

quality_threshold

A numeric. The threshold to keep/discard SNPs based on their imputation quality.

recode

A character. Which VCF should be filtered and recode, either "all" or "input".

vcf_half_call

A character. The mode to handle half-call. + 'haploid'/'h': Treat half-calls as haploid/homozygous (the PLINK 1 file format does not distinguish between the two). This maximizes similarity between the VCF and BCF2 parsers. + 'missing'/'m': Treat half-calls as missing (default). + 'reference'/'r': Treat the missing part as reference.

n_comp

A numeric. The number of principal components to be computed.

n_cores

An integer. The number of CPUs to use to estimate the ethnicity.

bin_path

A list(character). A list giving the binary path of vcftools, bcftools, bgzip, tabix and plink.

Value

A data.frame.


mcanouil/rain documentation built on Nov. 28, 2022, 10:40 a.m.