This function provides an extensive summary range of your SNP data, allowing you to perform in-depth quality control of your genotyping results, and to explore your data before analysis. Summary measures include allele and genotype frequencies and counts, missingness rate, Hardy Weinberg equilibrium and more in the whole data set or stratified by other variables, such as case-control status. It can also test for differences in missingness between groups.
an object of class "formula" (or one that can be coerced to that class). The right side of ~ must have the terms in an additive way, and these terms must refer to variables in 'data' must be of character or factor classes whose levels are the genotypes with the alleles written in their levels (e.g. A/A, A/T and T/T). The left side of ~ must contain the name of the grouping variable or can be left blank (in this case, summary data are provided for the whole sample, and no missingness test is performed).
an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If they are not found in 'data', the variables are taken from 'environment(formula)'.
an optional vector specifying a subset of individuals to be used in the computation process (applied to all genetic variables).
a function which indicates what should happen when the data contain NAs. The default is NULL, and that is equivalent to
character string indicating the separator between alleles (e.g. when using A/A, A/T and T/T genotype codification, 'sep' should be set to '/'. Default value is ” indicating that genotypes are coded as AA, AT and TT.
logical, print results from
An object of class 'compareSNPs' which is a data.frame (when no groups are specified on the left of the '~' in the 'formula' argument) or a list of data.frames, otherwise. Each data.frame contains the following fields:
- Ntotal: Total number of samples for which genotyping was attempted
- Ntyped: Number of genotypes called
- Typed.p: Percentage genotyped
- Miss.t: Number of missing genotypes
- Miss.p: Proportion of missing genotypes
- Minor: Minor Allele
- MAF: Minor allele frequency
- A1: Allele 1
- A2: Allele 2
- A1.ct: Count Allele 1
- A2.ct: Count Allele 2
- A1.p: Frequency of Allele 1
- A2.p: Frequency of Allele 2
- Hom1: Allele 1 Homozygote
- Het: Heterozygote
- Hom2: Allele 2 Homozygote
- Hom1.ct: Allele 1 Homozygote count
- Het.ct: Heterozygote Count
- Hom2.ct: Allele 2 Homozygote count
- Hom1.p: Frequency of Allele 1 Homozygote
- Het.p: Heterozygote frequency
- Hom2.p: Frequency of Allele 2 Homozygote
- HWE.p: Hardy-Weinberg equilibrium p-value
Additionaly, when analysis is stratified by groups, the last component consists of a data.frame containing the p-values of missingness comparison among groups.
'print' returns a 'nice' format table for each group with the main results for each SNP (Ntotal, Ntyped, Minor, MAF, A1, A2, HWE.p), and the missingness test when group is considered.
It uses some functions taken from SNPassoc created by Juan Ram?n Gonz?lez et al.
Hardy-Weinberg equilibrium test is performed using the
Gavin Lucas (gavin.lucas<at>cleargenetics.com)
Isaac Subirana (isubirana<at>imim.es)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
require(compareGroups) # load example data (taken from SNPassoc) data(SNPs) # visualize first rows head(SNPs) # select casco and all SNPs myDat <- SNPs[,c(2,6:40)] # QC of three SNPs by groups of cases and controls res<-compareSNPs(casco ~ .-casco, myDat) res # QC of three SNPs of the whole data set res<-compareSNPs( ~ .-casco, myDat) res
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.