knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.retina = 2, fig.width = 480/72, fig.asp = 0.7, dev.args = list(type = "cairo-png") )
First, let's load the package. If you haven't installed it yet, you can do so with the install.packages("xadmix")
command to get the latest stable release from CRAN. Alternatively, you can install the latest development release from GitHub: devtools::install_github("SpaceCowboy-71/xadmix")
. This, however, requires the package devtools
.
library(xadmix)
The xadmix
package provides a dummy dataset ("xadmixture") containing simulated genetic admixture data. Each observation has an accession identifier, the country where the plant material was collected, an entry for the species and five admixture coefficients, K. The values of these ancestries sum up to 1.
data("xadmixture") str(xadmixture) # number of observations per country table(xadmixture$country) # number of observations per species table(xadmixture$species)
xadmix
comes with an optimized data subsetting function, admix_subset()
. It enables filtering of the data for ancestry (K) percentages greater or less than a certain value. Additionally, any number of columns can be filtered for values supplied in a vector. By default, it also prints the progress after each subsetting step.
Multiple ancestries along with their percentages can be passed as argument vectors. Subsetting is then done pairwise, filtering the first ancestry by the first pecentage, then the second ones and so on. Some examples:
# keep only observations with K1 > 0.15 and K2 > 0.01 subset1 <- admix_subset(xadmixture, anc = c("K1", "K2"), pct = c(0.15, 0.01)) # keep only observations with K2 < 0.1 and K3 < 0.1 subset2 <- admix_subset(xadmixture, anc = c("K2", "K3"), pct = c(0.1, 0.1), comparison = "less") # filtering for countries and species subset3 <- admix_subset(xadmixture, country = c("GBR", "FRA"), species = c("lorem", "dolor"))
Subsets can be chained using the pipe operator (%>%
) from package magrittr
. This way, filtering for percentages greater and less than certain values is possible.
By setting the argument quiet
to TRUE
, no subset progress will be printed.
library(magrittr) # keep only observations with K1 > 0.1 and K4 < 0.3, # without printing subset progress subset4 <- admix_subset(xadmixture, anc = "K1", pct = 0.1, quiet = TRUE) %>% admix_subset(anc = "K1", pct = 0.3, comparison = "less", quiet = TRUE) # print number of observations for comparison nrow(xadmixture) nrow(subset4)
xadmix
also comes with a user-friendly function for generating pretty stacked barplots, optimized to use with admixture data.
If the dataset were to contain only the accession numbers in the first column and ancestry percentages in subsequent columns, the function could be used without any further arguments.
However, the xadmixture
dataset contains additional columns. Therefore, the columns for all K`s have to be selected by passing an argument. An example plotting the whole dataset:
# ancestries (K) are in the fourth to last column, # and plotted without showing bar labels admix_barplot(xadmixture, K = 4:ncol(xadmixture), names = FALSE )
More information can be gained by looking at the subsets generated above:
# grouping data by column "country", # and sorting each group by ancestry column "K1" admix_barplot(subset1, K = 4:ncol(xadmixture), grouping = "country", sortkey = "K1" ) # changing color palette to "turbo" from package 'viridis', admix_barplot(subset2, K = 4:ncol(xadmixture), palette = "turbo", grouping = "species", sortkey = "K4" )
# removing title and changing axis labels text admix_barplot(subset3, K = 4:ncol(xadmixture), main = "", xlab = "Accessions", ylab = "Ancestry [%]", palette = "alternating", sortkey = "K1", names = FALSE )
Sometimes the group labels can get cut off. When there are not enough observations in the group, the width of the grouped bars can be smaller than the width of the group label. ggplot2
does not yet offer a flag to avoid this problem. In xadmix
, however, there is an option to remove clipping altogether:
# directly output grouped plot with clipping removed from elements # (useful if there are groups with a low number of observations) subset5 <- admix_subset(xadmixture, anc = c("K3", "K4"), pct = c(0.3, 0.2), quiet = TRUE)
# noclip set to "TRUE" admix_barplot(subset5, K = 4:ncol(xadmixture), sortkey = "K5", grouping = "country", palette = "viridis", names = FALSE, main = "Noclip on", noclip = TRUE) # noclip set to "FALSE" admix_barplot(subset5, K = 4:ncol(xadmixture), sortkey = "K5", grouping = "country", palette = "viridis", names = FALSE, main = "Noclip off", noclip = FALSE)
♦
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.