knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "running-admixture-figs/" )
The genoscapeRtools
has a few functions that simplify the process of running
the program admixture
multiple times and multiple K values. This is designed to
run on a Mac with a decent Unix-like system. Hasn't been tested elsewhere.
The first hurdle is to get your data into a proper PLINK "bed" file that admixture
can
read. This is pretty easy from the VCF file, but we are going to do it straight from the
object that read_012
creates.
So, read that thing in:
library(genoscapeRtools) wifl_clean <- read_012("../cleaned_indv175_pos105000")
Then, if you want to make a plink bed file out of that you can do it like this:
convert_012_to_bed(wifl_clean, chromo_override = TRUE, prefix = "plinky")
That creates all these files:
dir(pattern = "plinky.*")
And now you can pass the plinky.bed
file to our function that runs admixture a lot of times.
That function will look for the .fam and the .bim files in the same place, so they have to be there
too.
This function sets up a directory and runs admixture in parallel in all of them. We use the defaults, which puts all the results in a directory called "admixture_runs" in the current working directory.
We are going to just do one rep and only at K=3 and K=4
run_admixture(bed = "plinky.bed", Reps = 1, Kvals = 3:4)
The above can take a long time and it is hard to kill the process once you start it. So, just let 'er go.
To see how things are coming along you can look at the files inside the r_x_K_y
directories in the admixture_runs
directories where all the action is happening.
When it is all done, the output is in the directories like this:
2016-11-25 06:34 /tutorials/--% (master) ls admixture_runs/* admixture_runs/data: README.txt input.bed input.bim input.fam admixture_runs/r_1_K_3: admixture.stderr admixture.stdout input.3.P input.3.Q admixture_runs/r_1_K_4: admixture.stderr admixture.stdout input.4.P input.4.Q
When that is done, read it into a data frame:
admixQ <- slurp_admixture()
And then plot it quickly.
ggplot_the_Qs(Qs = admixQ)
And, I need to add a cross-validation flag if we want to do cross-validation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.