process.input: Process input

View source: R/input_processing.R

process.inputR Documentation

Process input

Description

Processes summary statistics and related information (e.g. sample overlap, case/control ratios). Ensures alignment of SNPs across data sets

Usage

process.input(
  input.info.file,
  sample.overlap.file,
  ref.prefix,
  phenos = NULL,
  input.dir = NULL
)

Arguments

input.info.file

Name of info file containing phenotype IDs ('phenotype'), N cases ('cases'), N controls ('controls'), sumstats file ('filename'). For continuous phenotypes, the number of controls should be set to 0, while cases can just be set to 1 (this is only used for computing the case/control ratio, which should be 1 for continuous phenotypes).

sample.overlap.file

Name of file with sample overlap information. Can be set to NULL if there is no overlap

ref.prefix

Prefix of reference genotype data in plink format (*.bim, *.bed, *.fam)

phenos

A vector of phenotype IDs can be provided if only a subset of phenotypes are desired (if NULL, all phenotypes in the input info file will be processed). This can be convenient if a subset from a larger number of phenotypes are analysed, as only a single input.info / sample overlap file needs to be created.

input.dir

Directory containing the files specified in the info file.

Value

An object containing processed input data and related info

  • info - the processed input info file. Columns added during processing:

    • N = cases + controls

    • prop_cases = cases / N

    • binary = !is.na(prop_cases) & (prop_cases != 1)

  • P - number of phenotypes

  • sample.overlap - sample overlap matrix

  • sum.stats - processed summary statistics (SNP aligned effect sizes, subsetted to common SNPs across data sets, effect sizes converted to Z, etc)

  • ref.prefix - genotype reference data prefix

  • analysis.snps - subset of SNPs that are shared across all data sets and were not removed during alignment

  • unalignable.snps - SNPs removed during alignment (e.g. for being strand ambiguous)

  • ref - environment containing the genotype reference data bim file (ref$bim)


josefin-werme/LAVA documentation built on July 4, 2024, 8:11 p.m.