format_data: format data

View source: R/format_data.R

format_dataR Documentation

format data

Description

Get the trait summary data ready for the QC checks.

Usage

format_data(
  dat = NULL,
  trait = NA,
  population = NA,
  ncase = NA,
  ncontrol = NA,
  rsid = NA,
  effect_allele = NA,
  other_allele = NA,
  beta = NA,
  se = NA,
  lnor = NA,
  lnor_se = NA,
  eaf = NA,
  p = NA,
  or = NA,
  or_lci = NA,
  or_uci = NA,
  chr = NA,
  pos = NA,
  z_score = NA,
  drop_duplicate_rsids = TRUE
)

Arguments

dat

the dataset to be formatted

trait

the name of the trait.

population

describe the population ancestry of the dataset

ncase

number of cases or name of the column specifying the number of cases

ncontrol

number of controls or name of the column specifying the number of controls. If your summary data was generated in a linear model of a continuous trait, use ncontrol to indicate the total sample size.

rsid

name of the column containing the rs number or identifiers for the genetic variants

effect_allele

name of the effect allele column

other_allele

name of the non-effect allele column

beta

name of the column containing the SNP effect sizes. Use this argument if your summary data was generated in a linear model of a continuous trait.

se

standard error for the beta. Use this argument if your summary data was generated in a linear model of a continuous trait.

lnor

name of the column containing the log odds ratio. If missing, tries to infer it from the odds ratio

lnor_se

name of the column containing the standard error for the log odds ratio. If missing, tries to infer it from 95% confidence intervals or pvalues

eaf

name of the effect allele frequency column

p

name of the pvalue columne

or

name of column containing the odds ratio

or_lci

name of column containing the lower 95% confidence interval for the odds ratio

or_uci

name of column containing the upper 95% confidence interval for the odds ratio

chr

name of the column containing the chromosome number for each genetic variant

pos

genomic position for the genetic variant in base pairs

z_score

effect size estimate divided by its standard error

drop_duplicate_rsids

drop duplicate rsids? logical. default TRUE. duplicate rsids may for example correspond to triallelic SNPs.

Value

data frame


MRCIEU/mrQC documentation built on May 6, 2023, 1:40 p.m.