run_ancestry_format: Running functions to format data for ancestry prediction

View source: R/ancestry.R

run_ancestry_formatR Documentation

Running functions to format data for ancestry prediction

Description

This function runs convert_to_plink2 and rename_variant_identifiers to format the data for the ancestry identification with superpop_classification

Usage

run_ancestry_format(
  indir,
  name,
  qcdir = indir,
  verbose = FALSE,
  path2plink2 = NULL,
  keep_individuals = NULL,
  remove_individuals = NULL,
  exclude_markers = NULL,
  extract_markers = NULL,
  showPlinkOutput = TRUE,
  format = "@:#[hg38]",
  plink2format = FALSE,
  var_format = FALSE,
  path2load_mat
)

Arguments

indir

[character] /path/to/directory containing the basic PLINK 1.9 data file name.bim, name.fam, name.bed

name

[character] Prefix of PLINK 1.9 files, i.e. name.bim, name.fam, name.bed

qcdir

[character] /path/to/directory where the plink2 data formations as returned by plink2 –make-pgen will be saved to. User needs writing permission to qcdir. Per default is qcdir=indir.

verbose

[logical] If TRUE, progress info is printed to standard out.

path2plink2

[character] Absolute path to PLINK executable (https://www.cog-genomics.org/plink/2.0/) i.e. plink 2 should be accessible as path2plink -h. The full name of the executable should be specified: for windows OS, this means path/plink.exe, for unix platforms this is path/plink. If not provided, assumed that PATH set-up works and PLINK will be found by exec('plink').

keep_individuals

[character] Path to file with individuals to be retained in the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples not listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals.

remove_individuals

[character] Path to file with individuals to be removed from the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals.

exclude_markers

[character] Path to file with makers to be removed from the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All listed variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers.

extract_markers

[character] Path to file with makers to be included in the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All unlisted variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers.

showPlinkOutput

[logical] If TRUE, plink log and error messages are printed to standard out.

format

[character] This gives the template to rewrite the variant identifier. A '@' represents the chromosome code, and a '#' represents the base-pair position.

plink2format

[logical] If TRUE, data is in plink2 format already and convert_to_plink2 will not be run

var_format

[logical] If TRUE, variant identifiers are in correct format already and rename_variant_identifiers will not be run

path2load_mat

[character] /path/to/directory where loading matrices are kept. This can be downloaded from the github repo. Note that the name of the file before the .eigenvec.allele or .acount must be included in file path.

Value

Name of file with correct format

Examples

indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
## Not run: 
# the following code is not run on package build, as the path2plink on the
# user system is not known.
run_ancestry_format(indir=indir, qcdir=qcdir, 
  name=name, path2plink2 = path2plink2)

## End(Not run)

plinkQC documentation built on Nov. 26, 2025, 1:07 a.m.