merge_snpRdata: Merge two snpRdata objects
In hemstrow/snpR: Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

merge_snpRdata

R Documentation

Merge two snpRdata objects

Description

Merge two snpRdata objects using sample and SNP metadata. Functions much like base R's merge function, but the 'by' and 'all' options can be specified at the SNP and sample level.

Usage

merge_snpRdata(
  x,
  y,
  by.sample = intersect(names(sample.meta(x)), names(sample.meta(y))),
  by.sample.x = by.sample,
  by.sample.y = by.sample,
  by.snp = intersect(names(snp.meta(x)), names(snp.meta(y))),
  by.snp.x = by.snp,
  by.snp.y = by.snp,
  all = TRUE,
  all.x.snps = all,
  all.y.snps = all,
  all.x.samples = all,
  all.y.samples = all,
  resolve_conflicts = "error"
)

Arguments

`x`, `y`	`snpRdata` objects to merge
`by.sample`, `by.sample.x`, `by.sample.y`	Columns of sample metadata by which to merge across samples–function identically to the `by`, `by.x`, and `by.y` arguments to `merge`, see documentation there for details.
`by.snp`, `by.snp.x`, `by.snp.y`	Columns of SNP metadata by which to merge across SNPs–function idetically to the `by`, `by.x`, and `by.y` arguments to `merge`, see documentation there for details.
`all`	logical, default TRUE. If TRUE, all samples and SNPs will be maintained in the output `snpRdata` object, with missing data matching the missing data format of `x` added where genotypes are not in either `x` or `y`.
`all.x.snps`, `all.y.snps`	logical, default `all`. Keep SNPs in the data even if they are only present in `x` or `y`, respectively.
`all.x.samples`, `all.y.samples`	logical, default `all`. Keep samples in the data even if they are only present in `x` or `y`, respectively.
`resolve_conflicts`	character, default 'error'. Controls how conflicting genotypic information in `x` and `y` is handled. See 'Details' for options and explanation.

Details

While this function can be used essentially identically to how one might use base R's merge function, there are a few differences to note.

First, samples that are genotyped at identical loci in both data sets can be handled several ways, controlled by the resolve_conflicts argument:

warning: Return a harsh warning and a data frame with more information on genotypes at identical samples/SNPs are different between x and y.
error: The default, return an error when conflicts are detected.
x Use genotypes from x to resolve conflicts.
y Use genotypes from y to resolve conflicts.
random Randomly sample (non-missing) genotypes from x and y to resolve conflicts.

Note that called genotypes are always taken over un-called genotypes when there are merge conflicts, and missing data in one but not the other data set will not trigger an error or a warning if those options are selected.

Secondly, the by and all arugment families from merge are extended to refer to either samples or SNPs, such that all samples can be maintained but not all SNPs, for example.

Lastly, all of the all family of arguments default to TRUE instead of FALSE, since purely overlapping genotypes/SNPs is unlikely to be desired. FALSE values provided to any specific all argument will sill override all = TRUE, as in merge.

At present, merge_snpRdata is not maximally efficient in that it will remove all tabulated statistics and re-tabulate all internal summaries. Improvements are in development.

Value

A merged snpRdata object.

Author(s)

William Hemstrom

Examples

# create data to merge in
y <- data.frame(s1 = c("GG", "NN"),
                s2 = c("GG", "TG"),
                s3 = c("GG", "TT"),
                s4 = c("GA", "TT"),
                s5 = c("GG", "GT"),
                s6 = c("NN", "GG"))
                
snp.y <- data.frame(chr = c("groupVI", "test_chr"),
                    position = c(212436, 10))
                   
samp.y <- data.frame(pop = c("ASP", "ASP", "ASP", "test1", "test2", "test3"),
                     ID = c(1, 2, 3, "A1", "A2", "A3"),
                     fam = c("A", "B", "C", "T", "T", "T"))
y <- import.snpR.data(y, snp.y, samp.y)

x <- stickSNPs
sample.meta(x)$ID <- 1:ncol(x)

## Not run: 
# Not run, will error due to conflicts
z <- merge_snpRdata(x, y)

# Not run, will return a warning and report mismatches
z <- merge_snpRdata(x, y, resolve_conflicts = "warning")

## End(Not run)

# take a random genotype in the case of conflicts
z <- merge_snpRdata(x, y, resolve_conflicts = "random")
z

hemstrow/snpR documentation built on July 5, 2025, 4:38 a.m.

hemstrow/snpR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hemstrow/snpR
Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

merge_snpRdata: Merge two snpRdata objects
In hemstrow/snpR: Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

Merge two snpRdata objects

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to merge_snpRdata in hemstrow/snpR...

R Package Documentation

Browse R Packages

We want your feedback!

hemstrow/snpR Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

merge_snpRdata: Merge two snpRdata objects In hemstrow/snpR: Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

Merge two snpRdata objects

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to merge_snpRdata in hemstrow/snpR...

R Package Documentation

Browse R Packages

We want your feedback!

hemstrow/snpR
Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

merge_snpRdata: Merge two snpRdata objects
In hemstrow/snpR: Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data