knitr::opts_chunk$set(echo = TRUE)
Make sure this is done:
install.packages(c("devtools", "roxygen2", "testthat", "knitr", "styler"))
This is mandatory reading: https://style.tidyverse.org/, at least, for now, Chapters 1--7.
Make sure that the configurations are done on their projects. See and
follow directions at:
https://r-pkgs.org/man.html#man-workflow-2
Note that we have:
Roxygen: list(markdown = TRUE)
in the DESCRIPTION
file.
Let's follow through the packages book.
git config --global merge.conflictstyle diff3
Also set up upstream:
git remote add upstream https://github.com/eriqande/HatcheryPedAgree.git
Then, any time you want to get new commits from the upstream repo, you do:
git fetch upstream
That "fetches" the objects.
And, to merge them into your current branch, once they are fetched, you do:
git merge upstream/master
To be able to pull upstream into the current branch, you can set that up with:
git branch -u upstream/master
Then, to pull it into master you can:
git checkout master git pull
To make it easier to work with branches, we installed
.git-prompt.sh
and updates PS1
to show the branch on our
bash prompt.
Be sure to get upstream
set up. Then run through the pull request procedure.
What format are these currently in? I want to to have both of the steelhead data sets in a canonical form to be in this project, but have them gitignored. Then I will put the coho SNPs up on GitHub for running through examples, etc.
I will send a specification of what I want, once I have figured that out, and then Anne and Laura can email those to me and I can put them together in a directory that we can send out to everyone. That way we can all test new functions on these private data sets.
DON'T COMMIT THESE DATA SETS TO THE REPO!
Laura and Anne, here is the format we want the main, initial datasets to be in:
A tibble of genotypes that must have at least these columns:
indiv
: the individual ID (could be, for example, the NMFS_DNA_IDlocus
: the name of the locusgene_copy
: a column of 1's and 2'sallele_int
: a column of alleles, 1-4, with NA for missing data.These columns can be in any order, but ordered as above is natural. Note that the genotypes can contain more columns. For example, it could have a full microhaplotype allele column, etc. But for the SNP data sets, we don't really need another one.
A tibble of metadata relevant to the parentage. At this point I am thinking that this includes:
indiv
sex
a column of Male, Female, or NAspawner_group
: typically the data of collection/spawningyear
: the year the individual was sampled/spawnedAnne and Laura, can you think of anything else that belongs in here?
An example of these two data files can be found now by loading the package
which gives access to coho_genotypes
and coho_metadata
. (You will have to pull
from upstream to master and load that version).
data
folder and data.R
extdata
too.We need to do a few things here:
Along the way, we will want to investigate the results. So, we will break this down into different functions.
Here are some things to read about that we will use:
I need you guys to get your steelhead data sets into that format so we can have those to test on.
Weird stuff, but it will let us convert to rubias with microhaps, when we have those.
Two main points here:
Both Anne and Laura have sent me their genotypes and metadata file. They did a lot of the backend hard work, but I have a few things to do here to clean them up.
Genotypes: These all looked great. I checked them, ungrouped the tibble and then reordered the column names. I also arranged things so that all the genotypes for an individual are together in one place. This is not a necessity, but it makes it easier to look at.
Metadata. This looks good, but I caught a few issues:
# read in the original cm <- read_rds("../private/as_sent/cvsh_meta.rds") # It turns out that the hatchery names are not consistent. So we # need to fix those. Also, I don't want spaces in those names. # Note, we need to put spaces in hatchery names into a check() function. cm2 <- cm %>% mutate( hatchery = recode( hatchery, `Coleman National Fish Hatchery` = "Coleman Hatchery", `Mokelumne Hatchery` = "Mokelumne River Hatchery", `Nimbus River Hatchery` = "Nimbus Hatchery" ), hatchery = str_replace_all(hatchery, " +", "_") ) # then i resaved that write_rds(cm2, path = "../private/cvsh_metadata.rds", compress = "xz")
# having a look rg <- read_rds("../private/as_sent/RR_steelhead_geno.rds") # minor tweaks rg %>% select(indiv, locus, gene_copy, allele_int) %>% arrange(indiv, locus, gene_copy) %>% write_rds("../private/rrsh_genotypes.rds", compress = "xz") # how about metadata? # It needs ungrouping... # hatchery names and sex look good read_rds("../private/as_sent/RR_steelhead_meta.rds") %>% ungroup() %>% write_rds(path = "../private/rrsh_metadata.rds", compress = "xz")
After looking over Anne and Laura's data about the frequency of iteroparous
fish and also the occurrence of individuals that are spawned on multiple
days within a year (sometimes under different names), I realized that it might work
better to finish implementing snppit
support for individuals to belong to multiple
spawner groups. Looking over the code I see that it already has support for fish
to spawn in multiple years, and it should not take too much to allow individuals to
belong to multiple different spawner groups.
I have done this with a few changes to the code. I am doing it in a new branch
of the snppit repo called allow-multiple-spawner-groups
. Here are the changes that
I made: https://github.com/eriqande/snppit/commit/5ccad7b2e619e14436df38a0d4d2086fbbe62d48.
Still working on testing it, but now the drill for repeat spawners and "matching samples" is to:
So, I must throw down some R code to do that...
Some things we have dealt with:
?
instead of NA.
But, in the user-supplied meta data we should still use NA if there is nothing known about it.
Only in cases where user wants to add the unknown spawner to within a comma-separated string
does the user code missing as ?. (For example 4/11/204,?,3/12/2013). However it is hard to imagine
that this is something anyone would ever need to do.Stuff to discuss:
system
and system2
commands. For now we are just going to keep it simple.inst
directory and binaries (They don't fly with CRAN!).Add a new_id
column to the cross_hatchery_matches and to cross_sex_matches as well,
if possible.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.