merge_snpRdata: Merge two snpRdata objects

View source: R/utility_functions.R

merge_snpRdataR Documentation

Merge two snpRdata objects

Description

Merge two snpRdata objects using sample and SNP metadata. Functions much like base R's merge function, but the 'by' and 'all' options can be specified at the SNP and sample level.

Usage

merge_snpRdata(
  x,
  y,
  by.sample = intersect(names(sample.meta(x)), names(sample.meta(y))),
  by.sample.x = by.sample,
  by.sample.y = by.sample,
  by.snp = intersect(names(snp.meta(x)), names(snp.meta(y))),
  by.snp.x = by.snp,
  by.snp.y = by.snp,
  all = TRUE,
  all.x.snps = all,
  all.y.snps = all,
  all.x.samples = all,
  all.y.samples = all,
  resolve_conflicts = "error"
)

Arguments

x, y

snpRdata objects to merge

by.sample, by.sample.x, by.sample.y

Columns of sample metadata by which to merge across samples–function identically to the by, by.x, and by.y arguments to merge, see documentation there for details.

by.snp, by.snp.x, by.snp.y

Columns of SNP metadata by which to merge across SNPs–function idetically to the by, by.x, and by.y arguments to merge, see documentation there for details.

all

logical, default TRUE. If TRUE, all samples and SNPs will be maintained in the output snpRdata object, with missing data matching the missing data format of x added where genotypes are not in either x or y.

all.x.snps, all.y.snps

logical, default all. Keep SNPs in the data even if they are only present in x or y, respectively.

all.x.samples, all.y.samples

logical, default all. Keep samples in the data even if they are only present in x or y, respectively.

resolve_conflicts

character, default 'error'. Controls how conflicting genotypic information in x and y is handled. See 'Details' for options and explanation.

Details

While this function can be used essentially identically to how one might use base R's merge function, there are a few differences to note.

First, samples that are genotyped at identical loci in both data sets can be handled several ways, controlled by the resolve_conflicts argument:

  • warning: Return a harsh warning and a data frame with more information on genotypes at identical samples/SNPs are different between x and y.

  • error: The default, return an error when conflicts are detected.

  • x Use genotypes from x to resolve conflicts.

  • y Use genotypes from y to resolve conflicts.

  • random Randomly sample (non-missing) genotypes from x and y to resolve conflicts.

Note that called genotypes are always taken over un-called genotypes when there are merge conflicts, and missing data in one but not the other data set will not trigger an error or a warning if those options are selected.

Secondly, the by and all arugment families from merge are extended to refer to either samples or SNPs, such that all samples can be maintained but not all SNPs, for example.

Lastly, all of the all family of arguments default to TRUE instead of FALSE, since purely overlapping genotypes/SNPs is unlikely to be desired. FALSE values provided to any specific all argument will sill override all = TRUE, as in merge.

At present, merge_snpRdata is not maximally efficient in that it will remove all tabulated statistics and re-tabulate all internal summaries. Improvements are in development.

Value

A merged snpRdata object.

Author(s)

William Hemstrom

Examples

# create data to merge in
y <- data.frame(s1 = c("GG", "NN"),
                s2 = c("GG", "TG"),
                s3 = c("GG", "TT"),
                s4 = c("GA", "TT"),
                s5 = c("GG", "GT"),
                s6 = c("NN", "GG"))
                
snp.y <- data.frame(chr = c("groupVI", "test_chr"),
                    position = c(212436, 10))
                   
samp.y <- data.frame(pop = c("ASP", "ASP", "ASP", "test1", "test2", "test3"),
                     ID = c(1, 2, 3, "A1", "A2", "A3"),
                     fam = c("A", "B", "C", "T", "T", "T"))
y <- import.snpR.data(y, snp.y, samp.y)

x <- stickSNPs
sample.meta(x)$ID <- 1:ncol(x)

## Not run: 
# Not run, will error due to conflicts
z <- merge_snpRdata(x, y)

# Not run, will return a warning and report mismatches
z <- merge_snpRdata(x, y, resolve_conflicts = "warning")

## End(Not run)

# take a random genotype in the case of conflicts
z <- merge_snpRdata(x, y, resolve_conflicts = "random")
z


hemstrow/snpR documentation built on March 20, 2024, 7:03 a.m.