wzxhzdk:0 ## Background and Links: - @POG_LRC / @GUcancersci / @bloodwise_uk - I'm a postdoc bioinformatician at The Paul O'Gorman (POG) Leukaemia Research Centre (University of Glasgow) - ... working for Prof. Mhairi Copland (POG) and Dr. David Vetrie (Wolfson-Wohl Cancer Research Centre) - ... on a Bloodwise-funded grant - ... into chronic-myeloid leukaemia - @haematobot - Personal mumblings about code / analysis / bioinformatics and seemingly very little else ... - https://biolearnr.blogspot.com/ - Even more mumblings ## Preamble See `https://github.com/russHyde/polyply` wzxhzdk:1 # Data-Modelling ## Tidy Data and the Normal-Forms {.build} In tidy data: - TD1 - Each variable forms a column. - TD2 - Each observation forms a row. - TD3 - Each type of observational unit forms a table. - [TD4 - A key permitting table-joins is present] See also, Boyce-Codd Normal-Forms and relational-database-design. - ?? TD5 - A tidy way of encapsulating your nicely decomposed tables - ?? TD6 - An explicit workflow for combining your tables back together ## Common _Untidy_ Data Structures Tidy-data / normal-forms in R - $\downarrow$ duplication - play nicely with some important things (`ggplot2` etc) But untidy data-structures are useful if they: - $\uparrow$ access efficiency - $\downarrow$ code complexity - play nicely with other important things ## `Biobase::ExpressionSet` wzxhzdk:2 wzxhzdk:3 Figure made with `DiagrammeR` ## `Biobase::ExpressionSet` (cont.) Conversion of the `assayData` to meet tidy-data standards: wzxhzdk:4 wzxhzdk:5 Doesn't meet tidy-data standards: - rows correspond to features, columns to samples - not all variables are in columns (since row-IDs are meaningful) - entries are the same 'type' of variable ---- Easy fix: wzxhzdk:6 ## But ... - Matrix representation was more dense - Lost all encapsulation - (After modifying featureData / phenoData to match) - Have to join rather than index - Have to keep track of multiple data-frames, rather than one data-structure ## That multi-data-frame _thing_ For a reasonably complex project: - tidy-data / normal-forms mean more data-frames Wanted: - a lightweight approach to working with multiple 'conceptually-related' data-frames - that plays nicely with `tidyverse` verbs - that feeds into `ggplot2` - that plays nicely with untidy data-structures I use _all the time_ # `tidygraph` already (sort of) does this ## Graph theory wzxhzdk:7 ## Basics of 'graph theory' speak A graph is made up of two sets: - _V_, a set of vertices: - aka nodes, actors, ... - _E_, a set of edges: - pairwise relationships between vertices - aka interactions, lines, arcs, ... - Need to store attributes for both nodes and edges ## `tbl_graph` data structure `tidygraph` is really a wrapper around the package `igraph` wzxhzdk:8 ## `tbl_graph` data structure wzxhzdk:9 ## The `activate` verb Think of the `tbl_graph` as `list[nodes, edges]` To modify the contents of a given data-frame, `activate` it: wzxhzdk:10 # `polyply` and multiple, linked data-frames ## `polyply` {.build} Aim: - multiple data-frames in one data-structure - $\rightarrow$ class `poly_frame`: extends list` - `poly_frame`: [list[data-frame], merge_fn] - mutation / filtering - merging ## Exported functions - `as_poly_frame` - convert a data-structure into a `poly_frame` - `activate` - choose a data-frame from within the `poly_frame` - `filter` - modify the contents of the active data-frame - `merge` - user defined data-frame combiner (default: reduce(inner_join)(df_list)) - others to be added (mutate / select etc) # Examples ## ExpressionSet Example wzxhzdk:11 wzxhzdk:12 wzxhzdk:13 ## Construct a poly-frame from an ExpressionSet wzxhzdk:14 ## What did we just make? wzxhzdk:15 ## Filter and plot: wzxhzdk:16 ## Filter and plot(cont.) wzxhzdk:17 ## Taxonomy and brains wzxhzdk:18 ## Taxonomies (cont.) wzxhzdk:19 ## Taxonomies (cont.) wzxhzdk:20 ## Taxonomies & brains (cont.) wzxhzdk:21 # Thanks

russHyde/polyply documentation built on July 13, 2019, 11:05 a.m.