library(knitr) library(kableExtra) knitr::opts_chunk$set(echo = TRUE) library(stringi) library(rmsutilityr) library(nprcgenekeepr)
Provided in reverse chronological order
nprcgenekeepr
Simple metrics for the Original Version (20160816) and Current Version 1.0.3 (20200526). Source Files Lines Code Blank Lines Document Lines Comment Lines Original Files 8 3621 1927 531 0 1163 Version 1.0.3 293 13776 7775 441 4613 947
Simple metrics for the Original Version (20160816) and Current Version 1.0.3 (20200526) and current version groups of functions. Source Files Lines Code Blank Lines Document Lines Comment Lines Original Files 8 3621 1927 531 0 1163 Version 1.0.3 293 13776 7775 441 4613 947 1.0.3: functions 163 8272 3007 226 4406 633 1.0.3: ui 11 1782 1506 98 2 176 1.0.3: tests 119 3722 3262 117 205 138
Is the tutorial for CRAN or the Consortium users. need overview of tab structure and order (left to right).
I am going to write a separate tutorial explicitly for the consortium members as typical R users would want conventional use of terms and descriptors. For example, Shiny apps are always described as Shiny apps.
if (Sys.Date() > "2020-08-01") { path <- "/Users/msharp/Documents/Development/R/r_workspace/" } else { path <- "/Users/msharp/Documents/Development/R/r_workspace/library/" } originalFiles <- stri_c("/Users/msharp/Documents/Projects/Active_Projects/", "nprcgenekeepr_project/20160816_GeneticManagementTools") functions <- stri_c(path, "nprcgenekeepr/R") tests <- stri_c(path, "nprcgenekeepr/tests/testthat") ui <- stri_c(path, "nprcgenekeepr/inst/application") paths <- c(originalFiles = originalFiles, functions = functions, ui = ui, tests = tests) codeCounts <- data.frame() for (path in paths) { counts <- classify_code_lines(files = ".", path = path) codeCounts <- rbind(codeCounts, data.frame( section = names(paths)[paths == path], files = counts["files"], lines = counts["lines"], code = counts["code"], blank_lines = counts["blank_lines"], roxygen_doc_lines = counts["roxygen_doc_lines"], comments = counts["comments"], check.names = FALSE, stringsAsFactors = FALSE)) } codeChanges <- data.frame( Source = c("Original Files", paste0("Version ", getVersion(date = FALSE))), Files = c(codeCounts$files[names(paths) == "originalFiles"], sum(codeCounts$files[names(paths) != "originalFiles"])), Lines = c(codeCounts$lines[names(paths) == "originalFiles"], sum(codeCounts$lines[names(paths) != "originalFiles"])), Code = c(codeCounts$code[names(paths) == "originalFiles"], sum(codeCounts$code[names(paths) != "originalFiles"])), `Blank Lines` = c(codeCounts$blank_lines[names(paths) == "originalFiles"], sum(codeCounts$blank_lines[ names(paths) != "originalFiles"])), `Document Lines` = c( codeCounts$roxygen_doc_lines[names(paths) == "originalFiles"], sum(codeCounts$roxygen_doc_lines[names(paths) != "originalFiles"]) ), `Comment Lines` = c(codeCounts$comments[names(paths) == "originalFiles"], sum(codeCounts$comments[names(paths) != "originalFiles"])), check.names = FALSE, stringsAsFactors = FALSE ) save_names <- names(codeChanges) names(codeChanges) <- names(codeCounts) sectionCodeChanges <- rbind(codeChanges, codeCounts[-1, ], make.row.names = FALSE) sectionCodeChanges$section[is.element(sectionCodeChanges$section, c("functions", "ui", "tests"))] <- paste0(getVersion(date = FALSE), ": ", sectionCodeChanges$section[is.element(sectionCodeChanges$section, c("functions", "ui", "tests"))]) names(sectionCodeChanges) <- save_names names(codeChanges) <- save_names
kable(codeChanges, caption = stri_c("Simple metrics for the Original Version (", "20160816) and Current Version ", getVersion(date = TRUE), ".")) %>% kable_styling(full_width = FALSE, position = "left") %>% collapse_rows(latex_hline = "none") %>% column_spec(3, width = "5em") ## version was 0.5.37 (20191108) at the time of writing.
kable(sectionCodeChanges, caption = stri_c("Simple metrics for the Original Version (", "20160816) and Current Version ", getVersion(date = TRUE), " and current version groups ", "of functions.")) %>% kable_styling(full_width = FALSE, position = "left",) %>% column_spec(3, width = "5em") %>% row_spec(2, hline_after = TRUE)
Progress Since Last Meeting
a. Code changes - Added filter to pedigree IDs available for breeding group formation so that only animals at the institution (exit == NA) and animals with recorded birth dates (birth != NA) are potential breeding group members. b. Documentation Updates - Shiny application tutorial first draft is nearly complete. Breeding group formation remains. c. CRAN submission preparation - Have begun using RHUB tools to prepare for the CRAN submission. - library(rhub) - library(usethis) - cran_prep <- check_for_cran() - cran_prep$cran_summary() - usethis::use_cran_comments()
Meeting Notes
a. Mark will fly up on 19th leave 21st b. Need to ensure access to WiFi and video. c. Finish Shiny tutorial d. Recheck all files to ensure animals are deidentified via obfuscations of IDs, birth dates, and exit dates. e. Plan to have a CRAN submission prior to November 20, 2019, meeting. f. Want to develop a better name. This should be done prior to CRAN submission. g. Matt is to test LabKey connection. h. Amanda and Mark will see if Wayne can move forward on Mark's access to PRIMe i. Mark will work on Genetic Production, Genetic Value Analysis, and Founders as described in the meeting notes for 20190916.
Production Calculation
Amanda corrected and clarified how to calculate Production using the
a. The Production Status is calculated on September 09, 2019. b. Births = count of all animals in group born since January 1, 2017 through December 31, 2018, that lived at least 30 days. Animals born after December 31, 2018, are not counted. c. Dams = count of all females in group that have a birth date on or prior to September 09, 2016. d. Production = Births / Dams e. Production Status (color) 1. Shelter and pens 1. Red -- < 0.51 1. Yellow -- >= 0.51 & < 0.54 1. Green -- >= 0.54 1. Corrals 1. Red -- < 0.61 1. Yellow -- >= 0.61 & < 0.65 1. Green -- >= 0.65 2. Genetic Production
Percent of females >= 3.5 at the start of the 2 calendar year period that have not produced offspring in the past 2 calendar years. Filter out animals over ?? (start with 18) years old.
Genetic Value Analysis Have Mark look at algorithm and develop simple options for handling progeny that are missing parental information.
One option is to not calculate genetic value of animals < minParentAge.
Founders We need to get a better understanding of the issues surrounding the identification of founders and how founders affect the genetic value analysis. As a first step Mark will develop code to list the founders and the count of male and female founders. See how they affect the genetic value analysis.
In the very short table below x123 has a founder as a dam while x122 does not have a founder for a sire because of the respective values of dam_from_center and sire_from_center.
| id | sire | sire_from_center | dam | dam_from_center | birth | |-----|------|------------------|------|------------------|------------| |x123 | s123 | Y | | N | 1988/04/21 | |x122 | | Y | d123 | Y | 1989/08/18 |
Matt Schultz and R Setup for nprcmanager
Mark is to contact Matt to arrange a time to work with him to make sure he get nprcmanger set up as a repository on his computer.
Mark will also send the interactive tutorial to him and Amanda. Done 20190916.
This was an ad hoc meeting to get clarification on heat map development for genetic diversity reporting. This meeting was a response to questions in a July 18, 2019, email from Mark in response to an email from Amanda with the subject of "genetic diversity reporting questions".
As a result of this meeting, the following columns are defined as follows
Are all members of the breeding group Indian-origin rhesus macaques?
= 10% <= 15% chinese: Proportion of chinese ancestry is borderline Yellow; Red > 15%; This information is found under animal history under genetics main tab -- genetic ancestry. 3. Among all females in group age >= 3 years what is percentage with a kinship coefficient <= 0.0156 with at least 1 male >= 5 years old in the group. 1. Denomonator = count of females >= 3 years of age 1. Numerator = count of females with kinship coefficient <= 0.0156 with at least 1 male >= 5 years old in the group. 1. Thresholds 1. Red -- < 0.6 1. Yellow -- >= 0.6 & <= 0.9 1. Green -- > 0.9 4. Fecundity 1. Denominator = count of females >= 3 years of age 1. Numerator = count of births that live > 30 days 1. Thresholds 1. Shelter and pens 1. Red -- < 0.6 1. Yellow -- >= 0.6 & <= 0.63 1. Green -- > 0.63 1. Corrals 1. Red -- < 0.5 1. Yellow -- >= 0.5 & <= 0.53 1. Green -- > 0.53 5. Flagged for genotype of phenotype 1. Thresholds 1. Red -- >= 3 group members flagged 1. Yellow -- < 3 and >= 1 group members flagged 1. Green -- 0 group members flagged
Joint Working Group Meeting week before Thanksgiving. We will meet with the Breeding Colony Managers Group. November 19-21, 2019.
Work on loop code: have list of animals involved in loops with counts, number of loops, number of animals in loops. Not high priority.
Ballou & Lacy describe genome uniqueness as "the proportion of simulations in which an individual receives the only copy of a founder allele." We have interpreted this as meaning that genome uniqueness should only be calculated for living, non-founder animals. Alleles possessed by living founders are not considered when calculating genome uniqueness.
We have a differing view on this, since a living founder can still contribute to the population. The function below calculates genome uniqueness for all living animals and considers all alleles. It does not ignore living founders and their alleles.
Our results for genome uniqueness will, therefore differ slightly from those returned by Pedscope. Pedscope calculates genome uniqueness only for non-founders and ignores the contribution of any founders in the population. This will cause Pedscope's genome uniqueness estimates to possibly be slightly higher for non-founders than what this function will calculate.
Have it in its own gray box
Pull downs to right in one column: Done 20190309 a. Number of groups desired b. Animals will be grouped with mother that are below age (years): [] Use minimum parent age. <-- default -- Have not figured out how to have minParentAge from Input tab to show up in the BreedingGroupFormation tab. Have dropdown disappear if checkbox above is selected. c. Animals with kinship above .... d. Dropdown with
Amanda provided a new logo to use, which represents all of the National Primate Research Centers. This is necessary because the ONPRC parent organization does not allow them to have a logo. Mark will incorporate that logo into the application and add some comments regarding development supported by ONPRC and SNPRC. Added 20190119.
Discussed harem formation and formation of groups with a set sex ratio. Done 20190103.
Discussed what to do about overlapping group formation. Decided to instruct the user to run the Make Groups command again to get a new set of groups. The new set will overlap with the prior results, but will be mutually exclusive within a run. Added to things to do for 20190225 meeting 20190224
Discussed layout of breeding group formation tab by using a mockup tool. Decided to have six workflows selectable in a radio button configuration. The user interface will redraw based on the radio button selected to reflect the user interface elements needed within the selected workflow. Done 20190224
Need to test group formation with a candidate animal not in the pedigree or not in the genetic analysis. Found that this causes a application crash. Done 20190224
Need to test group formation with a candidate animal not in the pedigree or not in the genetic analysis. Found that groups are not formed, but an error is not displayed. The user must remove the false Id to get group formation to work. Done 20190224.
Make an error reporting function that informs the user what has occurred when an animal not in the pedigree or not in the genetic analysis is entered as a seed animal or a candidate animal.
Change "Animals will be ignored below this age" to "Animals will be grouped with the mother below age". Done 20190224.
Change to Animals with kinship above this value will be excluded. Done 20190224.
Change to Include kinship in display of groups. Done 20190224.
Disable breeding group formation tab until genetic value analysis has occurred. Have disabled tab display text explaining that genetic value analysis must be performed first.
Have 6 boxes for seed animals. User can provide seed animals for up to six groups. Done 20190224
The next meeting will be 20190211 at 4 PM Pacific time.
After some discussion, all agreed that the features described in items 4 and 5 are sufficiently important to delay the preparation of the tutorials and subsequent technical paper a few weeks. Thus, Mark will work on getting these implemented in December with plans on working on the tutorials in January.
Ability to form harems. Done 20181230.
Ability to form breeding groups with specified sex ratios. Done 20180103.
One group type is what is currently done by the software (no overlap of animals among any of the groups formed.) These groups will be made up of all of the animals that can be used in groups based on kinship and sex ratio criteria such that none of the groups have any individuals in common.
The second type of group is a collection of sub-groups where each sub-group is a group of the first type. Thus, there is no overlap of animals within any one sub-group of groups and there is potential overlay among the various subgroups. This is complected and hard to follow so it is illustrated with the list below where each letter represent a specific animal. Overlapping Groups 1-4 have within them sets of animal that have no animals in common with other unique sets within the group.
For example, animal H appears in set 1 of each overlapping group, but it does not appear more than once in any on group. Also, animal E appears in group 1, 2, and 4 and not in group 3.
library(stringi) overlapping_groups <- list() for (i in 1:4) { unique <- sample(LETTERS, 15, replace = FALSE) group <- stri_c("Overlap_Group_", i) overlapping_groups[[group]] <- data.frame(unique_set_1 = unique[1:5], unique_set_2 = unique[6:10], unique_set_3 = unique[11:15], stringsAsFactors = FALSE ) } overlapping_str <- vapply(overlapping_groups, function(x) { stri_c("Unique Set 1: ", stri_c(x[[1]], collapse = ","), "\n", "Unique Set 2: ", stri_c(x[[2]], collapse = ","), "\n", "Unique Set 3: ", stri_c(x[[3]], collapse = ","), "\n") }, character(1)) cat(overlapping_str)
Overlapping Group 4
Amanda and Mark are to develop a better descriptor for the button that causes the pedigree information to be read in tested and sent to the pedigree browser function. Mark has changed it to Read and Check Pedigree for now. Done 20181212.
Make a combined logo for Oregon and SNPRC. Have ONPRC on top using blue and green. Check on the University of Oregon website for a color palette. Have ONPRC in the lighter color similar to the SNPRC color scheme. Have the macaque and oval in the same blue color. Clean up the resolution of the Oregon logo as it is currently fuzzy. Some research found the following link OHSU Color Palette.
Alert #C34D36
Below are some relevant articles found from a simple web search with the Google search engine using the search elements: "genetic association studies in pedigrees" * https://www.stats.ox.ac.uk/~mcvean/gwa4.pdf * https://dx.doi.org/10.1186%2F1753-6561-8-S1-S26 * https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1459209/ * https://www.ncbi.nlm.nih.gov/pubmed/12687644 11. Not mentioned during the meeting but taken from other correspondence. Mark will Contact Daniel Nicolalde (fdnicolalde@primate.wisc.edu; (608) 890-4592; INFORMATICS AND DATA SERVICES) to see if he will assist in getting the LabKey interface working with the Wisconsin National Primate Research Center system. 12. The next meeting is scheduled for November 5, 2018 at 4 PM Pacific Time.
Meeting was canceled.
Topics I wanted to bring up.
A population must be defined before proceeding to the Genetic Value Analysis''.
b. Check box above __Export__ button on _Pedigree Browser_ tab.
Trim pedigree based on specified population''.
Its intent is to reduce the number of animals being examined.
It does reduce the number by removing all individuals not related
to the focal animals. Does it change genetic analysis.
It does not change the genetic analysis for the animals left.
Obviously for animals removed, it does. Done 2018-08-16I am assuming that this is fine and stated only to keep users aware that if they have parental information it has to be entered.
Is this will be a problem that needs to be addressed?
Make sure this is true. I may also want to check for internal inconsistencies.
This should be explained more fully and the program behavior may need to be changed. Currently the program assumes the columns first and second are there and are correctly formed. Currently there is no check to ensure that first_name and second_name data elements are consistent with first and second data elements.
Should we provide a summary of changes made to the input data such as changes in column names, sex identifiers, etc., removal of duplicated rows and columns that are added?
Need name for genotype (first_name, second_name) Need new name for breeders only file type. Reword "Animals without birth dates are not considered." When minimum parent age is included, animals without birth dates are not rejected.
Change update breeding colony (pedigree browser) by adding explanatory text.
These should be stored elsewhere
Test coverage is 86.32%.
Unit test coverage is 84.79%.
Unit test coverage is about 15%.
We have successfully installed and used nprcmanager on Microsoft (MS) Windows 7, MS Windows 10, and MacOS 10.12.6 running R 3.4.1.
I discovered why the plot of genetic uniqueness scores was not as expected. It was from an analysis where 0 was the threshold setting, which I think we should consider removing as an option. I have set the default to 3, which is what Amanda Vinson's paper indicated that ONPRC typically used.
I got the logging system integrated into the package. I am using the package futile.logger (funny name). Note the check box at the bottom of the side panel on the two attached images. When the Debug on check box is checked (it is not checked by default), the application writes to a file in the users home directory named nprcmanager.log. Currently I am only logging events occurring the the server.R file, because that is where I tend to have most of my problems show up. I have attached an example file in which turned on logging at the debug level and then read in a couple of pedigree files.
I added code coverage reports to the automated build system running on Travis-CI.org. Currently we only have 5.68% of the code being tested with the unit tests I created thus far. I do not know the code well enough to know what percentage we will have when I feel comfortable, but I know most of the functions have no tests at all.
Can now assign known genotypes to individuals and to incorporate that information into the gene dropping routine. This has been done at the function level in the geneDrop function. The pedigree submitted via the UI can optionally contain genetic information.
Added a genetic uniqueness plot to the Summary Statistics tab.
I have connected the Travis CI (continuous integration) tool to my github.com/rmsharp/nprcmanager so that when I check in code it automatically tries to build it from scratch using packages from CRAN. Travis CI is now automatically building nprcmanager without errors or warnings. This puts us considerably closer to being able to offer the package via CRAN if we want.
I met with Jack Kent, Deborah Newman, and Charles Peterson about how to use genetic data in a gene dropping simulation.
From: R. Mark Sharp msharp@TxBiomed.org
Subject: Re: Genetic management of colonies
Date: March 23, 2017 at 5:01:07 PM CDT
To: Jack Kent jkent@txbiomed.org
Cc: Deborah Newman dnewman@txbiomedgenetics.org, Charles Peterson charlesp@txbiomed.org
Jack, Debbie, and Charles,
Thank you for meeting with me. I appreciate your help in clarifying the various stages of the issues associated with providing genetic management guidance via simulation with partially known MHC data.
I will look more closely into what will be needed to add genetic information and gene frequency information into the gene dropping routines in the kinship2 package. The initial implementation will use only a single locus, which should be a sufficient for MHC data
A. First step will be to ensure we get expected proportions of genes in children of individuals when we provide genetic data to all founders using a gene frequency based algorithm.
B. Second step will be to ensure we get expected proportions of genes in children when all parental generations have fully known genotypes.
C. Third step will be to ensure we get expected proportions of genes in children when parental generations have partially known genotypes and all other genotypes are uninformative.
D. Fourth step will be to ensure we get expected proportions of genes in children when parental generations have partially known genotypes and remaining parental genes are determined by gene frequency.
We will not worry about getting rid of genes in the pedigrees since colony managers need only not breed animals that carry unwanted alleles.
I will produce a small number of pedigree drawings using the data Debbie has provided to get feedback on preferences and additional requirements.
I will not deal with paternity and maternity issues within the pedigrees at this time.
Thanks again.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.