knitr::opts_chunk$set(echo = TRUE)

Topics for discussion during meetings

Thursday, May 14, 2020

1. Installing packages

Make sure this is done:

install.packages(c("devtools", "roxygen2", "testthat", "knitr", "styler"))

2. Code style

This is mandatory reading: https://style.tidyverse.org/, at least, for now, Chapters 1--7.

3. Roxygen

Make sure that the configurations are done on their projects. See and follow directions at:
https://r-pkgs.org/man.html#man-workflow-2

Note that we have:

Roxygen: list(markdown = TRUE)

in the DESCRIPTION file.

4. Forking and Pull requests

Let's follow through the packages book.

git config --global merge.conflictstyle diff3

Also set up upstream:

git remote add upstream https://github.com/eriqande/HatcheryPedAgree.git

Then, any time you want to get new commits from the upstream repo, you do:

git fetch upstream

That "fetches" the objects.

And, to merge them into your current branch, once they are fetched, you do:

git merge upstream/master

To be able to pull upstream into the current branch, you can set that up with:

git branch -u upstream/master   

Then, to pull it into master you can:

git checkout master
git pull

To make it easier to work with branches, we installed .git-prompt.sh and updates PS1 to show the branch on our bash prompt.

Be sure to get upstream set up. Then run through the pull request procedure.

5. Data sets

What format are these currently in? I want to to have both of the steelhead data sets in a canonical form to be in this project, but have them gitignored. Then I will put the coho SNPs up on GitHub for running through examples, etc.

I will send a specification of what I want, once I have figured that out, and then Anne and Laura can email those to me and I can put them together in a directory that we can send out to everyone. That way we can all test new functions on these private data sets.

DON'T COMMIT THESE DATA SETS TO THE REPO!

For: Wednesday, May 20, 2020

1. Data set format

Laura and Anne, here is the format we want the main, initial datasets to be in:

An example of these two data files can be found now by loading the package which gives access to coho_genotypes and coho_metadata. (You will have to pull from upstream to master and load that version).

2. Let's talk about the data folder and data.R

3. Matching samples

We need to do a few things here:

  1. Function to convert to rubias format
  2. Run rubias to find matching samples.
  3. Find connected components.
  4. Figure out which of the version of the individuals will be kept as the canonical genotype.

Along the way, we will want to investigate the results. So, we will break this down into different functions.

Here are some things to read about that we will use:

I need you guys to get your steelhead data sets into that format so we can have those to test on.

4. A little bit about NSE and rlang::enquo()

Weird stuff, but it will let us convert to rubias with microhaps, when we have those.

5. R CMD Check

Two main points here:

For: Thursday, May 21, 2020

1. Eric munges the RR and CV steelhead data sets

Both Anne and Laura have sent me their genotypes and metadata file. They did a lot of the backend hard work, but I have a few things to do here to clean them up.

CVSH

Genotypes: These all looked great. I checked them, ungrouped the tibble and then reordered the column names. I also arranged things so that all the genotypes for an individual are together in one place. This is not a necessity, but it makes it easier to look at.

Metadata. This looks good, but I caught a few issues:

# read in the original
cm <- read_rds("../private/as_sent/cvsh_meta.rds")

# It turns out that the hatchery names are not consistent. So we
# need to fix those. Also, I don't want spaces in those names.

# Note, we need to put spaces in hatchery names into a check() function.
cm2 <- cm %>%
  mutate(
    hatchery = recode(
      hatchery,
      `Coleman National Fish Hatchery` = "Coleman Hatchery",
      `Mokelumne Hatchery` = "Mokelumne River Hatchery",
      `Nimbus River Hatchery` = "Nimbus Hatchery"
    ),
    hatchery = str_replace_all(hatchery, " +", "_")
  )

# then i resaved that
write_rds(cm2, path = "../private/cvsh_metadata.rds", compress = "xz")

Russian River

# having a look
rg <- read_rds("../private/as_sent/RR_steelhead_geno.rds")

# minor tweaks
rg %>% 
  select(indiv, locus, gene_copy, allele_int) %>%
  arrange(indiv, locus, gene_copy) %>%
  write_rds("../private/rrsh_genotypes.rds", compress = "xz")

# how about metadata?
# It needs ungrouping...
# hatchery names and sex look good
read_rds("../private/as_sent/RR_steelhead_meta.rds") %>%
  ungroup() %>%
  write_rds(path = "../private/rrsh_metadata.rds", compress = "xz")

Week of Memorial Day (May 26--29)

A Minor Revamp of snppit

After looking over Anne and Laura's data about the frequency of iteroparous fish and also the occurrence of individuals that are spawned on multiple days within a year (sometimes under different names), I realized that it might work better to finish implementing snppit support for individuals to belong to multiple spawner groups. Looking over the code I see that it already has support for fish to spawn in multiple years, and it should not take too much to allow individuals to belong to multiple different spawner groups.

I have done this with a few changes to the code. I am doing it in a new branch of the snppit repo called allow-multiple-spawner-groups. Here are the changes that I made: https://github.com/eriqande/snppit/commit/5ccad7b2e619e14436df38a0d4d2086fbbe62d48.

Now that is done...

Still working on testing it, but now the drill for repeat spawners and "matching samples" is to:

So, I must throw down some R code to do that...

Friday, May 29, 2020

Some things we have dealt with:

Stuff to discuss:

  1. Multiple spawner groups and years with matchers.
  2. Cross-hatchery and cross-sex matchers. Note cross-hatchery stuff is tough to fix, so we make extra copies of individuals. There is potential for self-assignment in those cases, so we have named those copies so that it is easy to find them in the output and vet them.
  3. The system and system2 commands. For now we are just going to keep it simple.
  4. Temporary files and directories.
  5. The inst directory and binaries (They don't fly with CRAN!).

To Do:



eriqande/HatcheryPedAgree documentation built on Sept. 21, 2023, 7:24 p.m.