knitr::opts_chunk$set(echo = TRUE,
                      collapse = TRUE,
                      comment = "#>")
BiocStyle::markdown()

Introduction

This vignette aims to describe the core code of Prostar and is addressed to developers. It describes how Prostar works in terms of information system and how it is built. This document is written as a tutorial to be followed step-by-step in order to understand the different parts of the shiny app.

Prostar is thought to be as much generic as possible. This has the great advantage to be easier for a developer to implement new process modules and new pipelines. The counterpart of this feature is that the code of the core part maybe be a bit complex to understand due to the heavy use of variables. This is why the nomenclature is extremely important when coding Prostar

This is also another reason for that tutorial made step-by-step

Introducing Prostar 2.0

A collection of modules

Workflows as prostar core

Introduction, key advantages

Prostar is able to treat objects structured as a list of items, each of these are datasets. Generally, when a data process is run on such an object , it runs an the last dataset and return an object with a new dataset at the end of the list. This new dataset is the result of the data processing module. This result can affect either quantitative data in the dataset or just its metadata.

Thus, any data processing module developed for Prostar is an analysis unit which input is an list of n datasets and the output is n list of n+1 datasets. This is a mandatory requirement.

Actually, Prostar uses the class [QFeatures] as main object and each dataset that it is composed of is an object of class [SummarizedExperiment]

Data processing modules are very similar to general workflows as they are composed of a serie of steps, each one working on the result of a previous one. To implement this type of xxx in the user interface, one uses 'timelines'

Timelines to navigate into workflows

The timelines implemented in Prostar have the following characteristics:

The effects of each button will be discussed in the sections related to workflows and processing modules.

Data processing modules

As we see previously, a data processing module is a unit of treatment which takes an object (list of n datasets) as input and return an object (list of n+1 datasets). It is composed of one or more steps, each step using the result of a previous step. Thus, the basic behaviour of a processing module is very similar to the workflow: data processing is straightforward. Steps are also tagged mandatory or not and can be skipped if they are not.

At the end of each processing module, the user have to validate it. By this action, it creates the final dataset, adds it to the object received in input and returns it

Timeline features

The behaviour of the timeline used in the general workflow is similar in many points to the data processing modules

The 'next'and 'previous' buttons can be enabled or disabled w.r.t the current position. Let i in [1, n] be the current position

| current position i | Previous button | Next button | | :------------- | :----------: | -----------: | | i = 1 (start of the timeline) | Disabled | enabled because there is at least one further processing module | | 1 < i < n | enabled | enabled if there are further processing modules and if module(i) is not mandatory | | i = n (end of the timeline) | Enabled | Disabled |

Note that the 'Reset' button is always enabled.

Behaviour

The 'prev' button needs two conditions being TRUE to be enabled: if there is at least one more step backward, if the process is not yet validated (otherwise, the user must use the 'undo' button)

The 'prev' button needs two conditions being TRUE to be enabled: if there is at least one more step backward, if the process is not yet validated (otherwise, the user must use the 'undo' button)

The 'reset' button is only enabled while the user fill the UI and has not yet clicked on the 'validate' button. Its effects is to set all inputs of the process to their default value.

If the user goes backward on a previous step and validate this step, then the following steps are automatically set to 'undone' and have to be rerun. This guarantees that the steps are always done in the same way. This feature must be implemented in each module source code. It cannot be coded in the navigation module (recursive loop on the listener of isDone vector)

The undo button is enabled only if the user is on the last step and has validated it.The action is the same as reset but, in addition, we get back the previous dataset

Writing your own processing module

One of the main feature of Prostar 2.0 is the ability given to a developer to easily write its own data processing module.

Generalities

A process module takes as parameter one dataset of 'n' items returns a dataset of n + 1 items. The return value of the process p is the last item of the dataset. A dataset cannot be run several times on its own result

Workflows

Prostar implements workflows which are composed of one or more data processing units called 'data process modules'. The different workflows available in Prostar are defined in a configuration file. Each workflow is specific to a particular analysis of data (proteomic, peptidomic, p2p, etc...)

The design of each workflow is the result of xxx. Thus, in a classic run, each data processing module must be run in the right order from the first one to the last one. The default direction for running a workflow in straightforward. Data processing modules in a workflow are tagged whether they are mandatory or not. This allows a user to bypass a particular processing module if it is not mandatory.

A data processing module is mandatory in two cases:

Another option is to get back in the workflow timeline to rerun a process module or to run a skipped process. In both cases, in order to satisfy the general paradigm of Prostar's workflow, when the user gets back and run a previous process, then all further datasets are deleted and the last one is the result of the process just run.

In a more formal way, let:

If there are any skipped process, then n = m and the dataset obj(i) is the result of the process W(i). This is not true in case of skipped process. The only thing that one can be sure if that obj(1) is the result of W(1). So, in the general case, one have:

The first position in the timeline is always validated and not mandatory as it is an description of the workflow. Thus, there is at least one processing module straightforward.

Timeline features

The behaviour of the timeline used in the general workflow is similar in many points to the data processing modules

The 'next'and 'previous' buttons can be enabled or disabled w.r.t the current position. Let i in [1, n] be the current position

| current position i | Previous button | Next button | | :------------- | :----------: | -----------: | | i = 1 (start of the timeline) | Disabled | enabled because there is at least one further processing module | | 1 < i < n | enabled | enabled if there are further processing modules and if module(i) is not mandatory | | i = n (end of the timeline) | Enabled | Disabled |

Note that the 'Reset' button is always enabled.

Behaviour

The 'prev' button needs two conditions being TRUE to be enabled: if there is at least one more step backward, if the process is not yet validated (otherwise, the user must use the 'undo' button)

The 'prev' button needs two conditions being TRUE to be enabled: if there is at least one more step backward, if the process is not yet validated (otherwise, the user must use the 'undo' button)

The 'reset' button is only enabled while the user fill the UI and has not yet clicked on the 'validate' button. Its effects is to set all inputs of the process to their default value.

If the user goes backward on a previous step and validate this step, then the following steps are automatically set to 'undone' and have to be rerun. This guarantees that the steps are always done in the same way. This feature must be implemented in each module source code. It cannot be coded in the navigation module (recursive loop on the listener of isDone vector)

The undo button is enabled only if the user is on the last step and has validated it.The action is the same as reset but, in addition, we get back the previous dataset

Example

Understanding the source code of workflows

Initialization of a navigation module instance

At start,

The code for the UI is composed of a list a several elements used in the navigation module (mod_navigation.R).

r.nav <- reactiveValues(
    name = "test",
    stepsNames = c("Screen 1", "Screen 2","Screen 3"),
    ll.UI = list( screenStep1 = uiOutput("screen1"),
                  screenStep2 = uiOutput("screen2"),
                  screenStep3 = uiOutput("screen3")),
    isDone =  c(FALSE, FALSE, FALSE),
    mandatory =  c(FALSE, TRUE, FALSE),
    reset = FALSE,
    skip = NULL,
    undo = NULL

  )

UI Each process UI is composed of one or more screens which correspond to the different steps of the process. The navigation between each screen is realized by means of buttons placed on the right and the left of the navigation UI (the 'timeline').

The timeline may take different aspects w.r.t the style applied. The principle is to have

Several buttons are available: * the buttons 'Prev' (on the left) and 'Next' 'on the right' are always part of the timeline. AT the initialization of the process, only the 'Next' button is visible and disabled

Logics for processes and datasets

Here are the different rules that exists in Prostar core to deal with the different transactions between data processing modules and the dataset. The UI of Prostar allows the user to navigate between those two lists so as to rerun a process, fix a mistake in its parameters. But, to guarantee that the workflow stay consistent with the objective/philosophy of Prostar, some rules are implemented.

Each data processing modules and dataset can be seen as a list of respectively P and N items. Let i, p be the indices of respectively the current item in the dataset list and the current item in the process list of the workflow.

The different combination of values for i and p (their relative position) determines the behaviour of datasets and processes regarding the general philosophy of Prostar.

Without any action of the user on the current item nor process, Prostar follows the standard workflow:

The user may change either the current item in the dataset or the current process to be run. This leads to several situations.

Default (standard) behaviour

But, the user may navigate differently in the process and dataset lists. Thus, different situations may occur. Suppose we are after a standard run, then p = P, i = N, i = p + 1

Reprocess a previous item of the dataset

Suppose i = N and p = P (from the standard workflow). The user can change the current item of the dataset (WHAT'S ITS GOAL ?????)

In both cases, one delete all the items of the dataset from i+1 to N then the new item (produced by the process) is appended to the dataset list.

At the end of this sequence, the current indices must return to their normal values (i = N, p = P). That means that the real workflow has been modified by the user but the final workflow is like the standard one

Nomenclature

All the files containing code are stored in the directory R. The modules are prefixed with 'mod_'

Test modules

Each module in the directory 'R' has a test file, located in 'dev/test_dev'.

Description of the package Prostar

Prostar is built with the [golem] package.

Step 1. A simple Shiny App

Prostar is buil with the shiny framework and use the 'navbar page menu' layout. Previous versions of Prostar used other layouts (such as xxx) but the free space on the screen was too small to put all the outputs.

First, let write a simple A

Workflow files

In each directory of a workflow, there are several type of files (module source code, watch modules code, miscellaneous code). As those files are dynamically loaded in Prostar (and not at the same time), it is necessary to identify in a unique manner each file with its function. Two ways are possible to do that: build a complex hierarchy of directories in which directory correspond to a specific function, name each source code file with a complex name

Actually, Prostar's files structure uses the second point. In the directory of a workflow, the module's files are composed of four strings separated by a '_': 'mod' which is the prefix used by the package [golem] to identify the modules, the name of the workflow. It seems obvious because we are in the directory of the wf but it is more for the eyes of the developer and for make easiest the reading of the name of the files, * the name of the process itself.

For the files containing the code to launch the server part of each module, we have prefixed the names by 'watch_'

UI side

The code for the ui side of Prostar is in the file 'R/app_ui.R'. It consists in two main parts: a loading page (div id = 'loading_page'), the ui of Prostar ('main_content')

It is based on the navbarPage layout (see xxx) in the shiny package. There are several static menus (and submenus) that are present even if no dataset is loaded:

All of these submenus are implemented as modules : there ui parts are called her and their server part are called from the server side of Prostar (R/app_server.R).

Server side

The code for the server side of Prostar is in the file 'R/app_server.R'.

Declaration of global reactive variables

Call to server modules

The server-side of all the modules which ui are declared in the app_ui.R file are called in this file.

Load a dataset and corresponding modules

Once a dataset is loaded in prostar (the variable xxx is instanciated), Prostar launches the different modules that are part of the pipeline previously choosen by the user. The list of modules in that pipeline are stored in the file 'config.R'

Config.R

This file contains the definition of pipelines, ie the list of processing modules included in the pipeline. For example, here is the definition for the protein pipeline.

pipeline.defs <- list(
  protein = c('Filtering',
              'Normalization',
              'Imputation')
)
}

If a developer wants to implement a new pipeline, he has to complete the definition while respecting the nomenclature of the items of the list. This is very important since Prostar maps theses names with source code files.



samWieczorek/Prostar.2.0 documentation built on Dec. 4, 2022, 11:53 a.m.