View source: R/utility_functions.R
merge_snpRdata | R Documentation |
Merge two snpRdata objects using sample and SNP metadata. Functions
much like base R's merge
function, but
the 'by' and 'all' options can be specified at the SNP and sample level.
merge_snpRdata(
x,
y,
by.sample = intersect(names(sample.meta(x)), names(sample.meta(y))),
by.sample.x = by.sample,
by.sample.y = by.sample,
by.snp = intersect(names(snp.meta(x)), names(snp.meta(y))),
by.snp.x = by.snp,
by.snp.y = by.snp,
all = TRUE,
all.x.snps = all,
all.y.snps = all,
all.x.samples = all,
all.y.samples = all,
resolve_conflicts = "error"
)
x , y |
|
by.sample , by.sample.x , by.sample.y |
Columns of sample metadata by which
to merge across samples–function identically to the |
by.snp , by.snp.x , by.snp.y |
Columns of SNP metadata by which to merge
across SNPs–function idetically to the |
all |
logical, default TRUE. If TRUE, all samples and SNPs will be
maintained in the output |
all.x.snps , all.y.snps |
logical, default |
all.x.samples , all.y.samples |
logical, default |
resolve_conflicts |
character, default 'error'. Controls how
conflicting genotypic information in |
While this function can be used essentially identically to how one might
use base R's merge
function, there are a few differences
to note.
First, samples that are genotyped at identical loci
in both data sets can be handled several ways, controlled by the
resolve_conflicts
argument:
warning: Return a harsh
warning and a data frame with more information on genotypes at identical
samples/SNPs are different between x
and y
.
error: The default, return an error when conflicts are detected.
x Use genotypes from x
to resolve conflicts.
y Use genotypes from y
to resolve conflicts.
random Randomly sample (non-missing) genotypes from x
and y
to resolve conflicts.
Note that called genotypes are always taken over un-called genotypes when there are merge conflicts, and missing data in one but not the other data set will not trigger an error or a warning if those options are selected.
Secondly, the by
and all
arugment families from
merge
are extended to refer to either samples or SNPs,
such that all samples can be maintained but not all SNPs, for example.
Lastly, all of the all
family of arguments default to TRUE
instead of FALSE
, since purely overlapping genotypes/SNPs is unlikely
to be desired. FALSE
values provided to any specific all
argument will sill override all = TRUE
, as in
merge
.
At present, merge_snpRdata
is not maximally efficient in that
it will remove all tabulated statistics and re-tabulate all internal
summaries. Improvements are in development.
A merged snpRdata
object.
William Hemstrom
# create data to merge in
y <- data.frame(s1 = c("GG", "NN"),
s2 = c("GG", "TG"),
s3 = c("GG", "TT"),
s4 = c("GA", "TT"),
s5 = c("GG", "GT"),
s6 = c("NN", "GG"))
snp.y <- data.frame(chr = c("groupVI", "test_chr"),
position = c(212436, 10))
samp.y <- data.frame(pop = c("ASP", "ASP", "ASP", "test1", "test2", "test3"),
ID = c(1, 2, 3, "A1", "A2", "A3"),
fam = c("A", "B", "C", "T", "T", "T"))
y <- import.snpR.data(y, snp.y, samp.y)
x <- stickSNPs
sample.meta(x)$ID <- 1:ncol(x)
## Not run:
# Not run, will error due to conflicts
z <- merge_snpRdata(x, y)
# Not run, will return a warning and report mismatches
z <- merge_snpRdata(x, y, resolve_conflicts = "warning")
## End(Not run)
# take a random genotype in the case of conflicts
z <- merge_snpRdata(x, y, resolve_conflicts = "random")
z
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.