R/help.R

Defines functions getQAText getComparativeProfText getCompCellularText getCompProfilingText getDataAssesmentText getIntroText getDprofilerText

Documented in getComparativeProfText getCompCellularText getCompProfilingText getDataAssesmentText getDprofilerText getIntroText getQAText

#' getDprofilerText
#' 
#' Dprofiler help text
#'
getDprofilerText<-function(){
  
  list(
           column(6,
                  h2("What is Dprofiler ?"),
                  br(),
                  p(style="text-align: justify;",
                    strong("Dprofiler"), " is a novel platform to analysis and ", strong("score") ,"phenotypic information of samples using integration of bulk RNA-Seq and scRNA-Seq."),
                  br(),
                    p(style="text-align: justify;",
                      "Biospecimen collected from multiple sources of experimental and technical conditions often exhibit", 
                      strong("diverse molecular profiles and patterns."), "Each individual sample presents an additional source of information towards elucidating biological mechanisms, 
                      and analysis towards biological discovery for each individual sample may even be more informative than mainstream differential expression analysis of phenotypic conditions of interest."),
                  p(style="text-align: justify;",
                    "Understanding complex patterns of expression profiles are essential to elucidating diseases, and such knowledge can only be generated by using advanced statistical measures and 
                    methods that are capable of computationally modeling and ", strong("separately investigating molecular profiles of each sample."), 
                    ", preferably using external sources of information on the phenotypic behaviour of interest."),
                  p(style="text-align: justify;",
                    "To deciphering these complex sample-specific phenotypic profiles within gene expression datasets, we have developed", strong("Dprofiler.")),
                  p(style="text-align: justify;",
                    "Dprofiler computationally profiles a set of targeted samples by connecting them to reference expression datasets with phenotypic profiles of interest. Building on these reference 
                    profiles of phenotypic groups, Dprofiler evaluates bulk RNA-Seq samples, detect possibly existing anomalies and heterogeneities, and further explores causes and sources of distinct 
                    phenotypic patterns with the aid of single cell maps and other reference gene expression datasets."),
                  p(style="text-align: justify;",
                    "Users are allowed to use a variety of algorithms to calculate a ", strong("Membership Score"), "of samples associated to some phenotypic profiles of interest. Dprofiler derives reference 
                    phenotypic profiles from submitted datasets, cell-types of single cell maps and conditions from external gene expression data sets. Membership scores are universally interpretable, and 
                    indicate the similarity of a sample to these reference profiles."),
                  br(),
                  p("Dprofiler offers multiple capabilities using these three components: "),
                    tags$ul(
                      tags$li(strong("Computational Phenotypic Profiling: "), "Profiling and scoring submitted samples using homogeneouos (or pure) subpopulations of phenotypic reference profiles within the same dataset."),
                      tags$li(strong("Cellular Compositional Profiling: "), "Infering the cellular composition of each sample with a reference scRNA data and estimate fractions of cellular populations of interest."),
                      tags$li(strong("Comparative Phenotypic Profiling: "), "Profiling and scoring samples using phenotypic profiles of a reference bulk expression data set.")
                    ),
                  p("Dprofiler is based on a recently developed Shiny (R) application, DEBrowser, an interactive tool for DE analysis and visualization. DEBrowser incorporates DESeq2, EdgeR, and Limma 
                    coupled with shiny to produce real-time changes within your plot queries and allows for interactive browsing of your DE results. Similar to DEBrowser, our novel application Dprofiler 
                    also manipulates your results in a way that allows for interactive plotting of analysis results."),
                  p("For more information, please visit Dprofiler documentation from ",a("this link", href = "https://dprofiler-docs.readthedocs.io/en/latest/"),".")
                  ),
           column(4,
                  br(),
                  br(),
                  br(),
                  tags$img(src = "www/images/DprofilerWorkflow.png", width="100%", height= "100%")
                  )
  )
  
}

#' getIntroText
#' 
#' Dprofiler help text
#'
getIntroText<-function(){
  
  list(
    column(8,
      
      ####
      ## Data input section ####
      ####
      
      h3("1. Data input"),
      p("There are three types input data in Dprofiler. These are: "),
      tags$ul(
        tags$li(strong("Bulk Expression Data: "), "used for profiling samples within and establish homogeneous reference profiles. The user should provide a count data but uploading a metadata is optional."),
        tags$li(strong("scRNA Expression Data Object: "), "used for deconvoluting Bulk RNA data and infering cellular compositions of these bulk samples. Both single cell count data and the marker table are optional."),
        tags$li(strong("Reference Bulk Expression Data: "), "used for comparative profiling of the Bulk data set using reference phenotypic profiles of reference bulk data set(s). 
                Both reference count data and the metadata are optional."),
      ),
      p(style="text-align: justify;",
      "For both submitted Bulk and Reference Bulk expression data, there are two distinct inputs given in '.txt', '.csv' or '.tsv' formats.
        These are Count Data File and Metadata File where uploading metadata is optional."),
      p("However, for scRNA data set, the user should provide an ", 
        a("ExpressionSet",href = "https://www.bioconductor.org/packages/devel/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf"), 
      " object."),
      p("In addition, users may connect to DolphinMeta using their credentials to import reference bulk expression data from any Dmeta project."),
      h4("1.1 Count Data File"),            
      p(style="text-align: justify;",
      "This input file could be provided for both submitted and reference bulk datasets and it should contain summarized count 
      results of all samples in the experiment, an example of the expected input data format is presented as below:"),
      p("Note: the count file of the external reference count data should typically include only the genes that are typically correlated with phenotypes of interest."),
      tableOutput("countFile"),
      p(style="text-align: justify;",
      "Where columns are samples, rows are the mapped genomic features/genes."),
      h4("1.2 Metadata File"),
      p(style="text-align: justify;",
      "In addition to the count data file; you may also upload metadata file to correct for 
  batch effects or any other normalizing conditions you might want to address that might be
  within your results. To handle for these conditions, simply create a metadata file by 
  using the example table at below:"),
      tableOutput("metaFile"),
      p(style="text-align: justify;",
        "Please note that, if your data is not complex, metadata file is optional, you don't need to upload. Hence only for submitted bulk expression data, a count data is mandatory. 
        However for reference bulk expression data, both count data and metadata are optional."),
      h4("1.3 scRNA Data File"),
      p(style="text-align: justify;",
        "For scRNA data set, the user should provide an .rds file containing an",
        a("ExpressionSet",href = "https://www.bioconductor.org/packages/devel/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf"),
        "whose metadata (pData(`You Expression Set Object`)) should contain the following variables or columns:"),
        tags$ul(
          tags$li(strong(" (i) "), "sample/donor associated to each barcode"),
          tags$li(strong(" (ii) "), "total UMI counts of each barcode"),
          tags$li(strong(" (iii) "), "(multiple) cell annotation or label of each barcodes"),
          tags$li(strong(" (iv) "), "other categorical and numerical variables relavant to barcodes"),
        ),
      p(style="text-align: justify;",
        "In addition, users should provide a marker table associated with the scRNA ExpressionSet object. Typically, the marker table should include the following columns:"),
      tags$ul(
        tags$li(strong("gene: "), "a column with gene names"),
        tags$li(strong("cluster: "), "a column with cluster names"),
        tags$li(strong("Level: "), "a column that points to the metadata column in scRNA ExpressionSet object including the cluster names"),
      ),
      p(style="text-align: justify;",
        "Users can employ the ", strong("getReferenceSingleCellRNA"), " function in Dprofiler to generate all files necessary for Compositional Profiling. 
        The input should be a Seurat Object."),
      
      h4("1.4 Start using Dprofiler"),
      
      p("Please click 'Upload' menu on the left to start using Dprofiler and upload files as instructed above."),
      
      p("If you do not have a dataset to upload, you can use the built in Psoriasis demo data file by clicking on the ", strong("Load Demo Psoriasis"), " button that loads a case study. To view the entire demo data file, you can download."),
      tags$ol(
        tags$li(a("GSE107871",href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE107871"),": a bulk RNA-Seq count data of lesional and non-lesional psoriasis skin samples as well as healthy control samples processed by the RNA-Seq pipeline of ", a("DolphinNext", href = "https://github.com/UMMS-Biocore/dolphinnext"),"."),
        tags$li(a("E-MTAB-8142",href="https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-8142/"),": a reference scRNA-Seq count data of lesional and non-lesional Psoriasis skin samples."),
        tags$li(a("Human Skin RNA Atlas",href="https://dmeta-skin.dolphinnext.com/"),": an external reference dataset of integrated bulk RNA-Seq data from multiple Psoriasis studies across four conditions: healthy, lesional, non-lesional.")
      ),
      
      p(style="text-align: justify;",
      "After successfully uploading your data, you should see the summary of your data in the", strong('Upload Summary'), " sections. 
        To move the filtering section please click ",  strong("Filter"), ". ")
    )
  )
}

#' getDataAssesmentText
#' 
#' Dprofiler help text
#'
getDataAssesmentText<-function(){
  list(
    column(8,
      h3("2. Data Processing"),
      p(style="text-align: justify;", "Dprofiler implements data processing routines of DEBrowser."),
      h4("2.1 Filtering"),    
      p(style="text-align: justify;",
        "In this section, you can simultaneously visualize the changes of your submitted dataset 
      while filtering out the low count genes. Choose your filtration criteria from ", 
      strong("Filtering Methods"), " box which is located just center of the screen. 
      Three methods are available to be used:"),
      tags$ul(
        tags$li(strong("Max:"), "Filters out genes where maximum count for each gene across all samples are less 
     than defined threshold."),
        tags$li(strong("Mean:"), "Filters out genes where mean count for each gene are less than defined threshold."),
        tags$li(style="text-align: justify;",
        strong("CPM:"), "First, counts per million (CPM) is calculated as the raw counts divided by the 
      library sizes and multiplied by one million. Then it filters out genes where at least 
      defined number of samples is less than defined CPM threshold.")
      ),
      withMathJax(),
      p(style="text-align: justify;",
        "The expression cutoff value is determined according to the library size 
      and normalization factors with formula $$\\text{CPM} = \\frac{\\text{raw counts}}{\\text{library size} * \\text{normalization factors}} * 10^{-6}$$ 
      For example, if the cutoff CPM value is 10,
      the library size and normalization factors are estimated approximately equal to \\(\\ 3 \\text{ x} 10 ^ 6\\) and 1 for at least 4 samples, 
      then 10 CPM expression cutoff corresponds to about 30 read counts. 
      Therefore, in this example features in more than 4 samples have less than 
      30 read counts (10 CPM) is going to be low expression features
      and will be removed for batch effect correction and DE analysis."),
      p("To be able to filter out the low expression counts please press", strong("Filter"), "."), 
      
      h4("2.2. Batch Effect Correction and Normalization"),
      p("After filtering low count features, you may continue your analysis with Batch Effect 
      Detection & Correction or directly jump to Computational Profiling."),
      p(style="text-align: justify;",
       "With metadata file containing your batch correction fields 
      then you have the option to conduct ",strong("batch effect correction"), " prior to 
      your analysis. By adjusting parameters of ", strong("Options") ," box, you can investigate 
      your dataset further. These parameters of the ", strong("Options"), " box are 
      explained as followed:"),
      tags$ol(
        tags$li(style="text-align: justify;",
        strong("Normalization Method:"), "Dprofiler allows performing normalization 
      prior the batch effect correction. You may choose your normalization method 
      (among MRE, TMM, RLE, upperquartile), or if you don't want to normalize your 
      data you can select none for this item."),
        tags$li(style="text-align: justify;",
          strong("Correction Method:"), "Dprofiler uses ",
          a("ComBat", href = "https://bioconductor.org/packages/release/bioc/html/sva.html"),
          " (part of the SVA bioconductor package) or ",
          a("Harman", href = "https://www.bioconductor.org/packages/release/bioc/vignettes/Harman/inst/doc/IntroductionToHarman.html"),
          "to adjust for possible batch effect or conditional biases."),
        tags$li(style="text-align: justify;",
         strong("Treatment:"), "Please select the column that is specified in 
      metadata file for comparision, such as cancer vs control. It is named 
      condition for our sample metadata."),
        tags$li(style="text-align: justify;",
        strong("Batch:"), "Please select the column name in metadata file 
      which differentiate the batches. For example in our metadata, it is called batch.
      Upon clicking submit button, comparison tables and plots will be created on the right 
      part of the screen as shown below."),
      ),
      p("You can investigate the changes on the data by comparing following features:"),
      tags$ul(
        tags$li("Read counts for each sample"),
        tags$li("PCA, IQR and Density plot of the dataset"),
        tags$li("Gene/region vs samples data")
      ),
      p(style="text-align: justify;",
        "After batch effect correction, the user can choose from three types of analysis by clicking: "),
      tags$ul(
        tags$li(strong("Go To Computational Phenotypic Profiling")),
        tags$li(strong("Go To Cellular Compositional Profiling")),
        tags$li(strong("Go To Comparative Phenotypic Profiling"))
      ),
    )
  )
}

#' getCompProfilingText
#' 
#' Dprofiler help text
#'
getCompProfilingText <-function(){
  list(
    column(6,
      h3("3. An Iterative Algorithm for Computational Phenotypic Profiling of Bulk RNA-Seq Samples"),
      p(style="text-align: justify;",
        "Dprofiler provides methods for calculating membership scores to be used for profiling samples associated to phenotypic profiles of interest."),
      p(style="text-align: justify;", 
        "By iteratively removing samples with low scores and repeatatively testing for differentially expressed genes, Dprofiler converges if there are no more low-scored samples left in the data set. The final list of samples with high membership scores establish 
         homogeneous (pure) reference profiles of these phenotypic groups, and they are used to calculate the final score to establish the profile of 
         each individual sample. Hence, Dprofiler"),
      tags$ul(
        tags$li("analyzes the cook's distance of all genes within each samples"),
        tags$li(style="text-align: justify;",
                "Estimates the ", strong("intra-condition membership scores"), " given summarized cook's distances of all genes within each sample."),
        tags$li("conducts a DE analysis (DESeq2, EdgeR or Limma) given remaining samples within the data."),
        tags$li(style="text-align: justify;",
                "Estimates the ", strong("cross-condition membership scores"), " (based on either Silhouette and NNLS) of all samples given expression profiles 
                limited to differentially expressed genes"),
        tags$li("removes samples with low membership scores."),
        tags$li("repeat until no more samples have to be removed from the expression dataset.")
      ),
      p("Parameters that are used to conduct Computational Profiling are:"),
      tags$ul(
        tags$li(strong("Score Method:"), " The algorithm to determine membership scores of each individual sample:",
                tags$ul(
                  tags$li(style="text-align: justify;",
                          strong("Silhouette "), "estimates the score via the normalized difference between 
                          average distances to the samples of one condition and average distance to the ones of the other condition."),
                  tags$li(style="text-align: justify;",
                          strong("NNLS "), "estimates the score with a non-negative least squares regression model where each 
                          sample modeled against mean expression profiles of samples from each condition.")
                )
        ),
        tags$li(style="text-align: justify;",
                strong("Min. Score:"), " A threshold for membership scores to determine dismembered samples of the dataset,
                samples whose scores are smaller than the threshold are eliminated on each iteration. Reference profiles are calculated given 
                non-removed samples, and membership scores of all samples are calculated given reference profiles on final step. If ", strong('auto'), 
                " entered instead of a threshold between 0 and 1, the threshold for each condition is calculated by the fraction of samples within 
                each condition."
        ),
        tags$li(strong("DE Selection Method:"), " The protocol for selecting DE genes on each iteration:",
                tags$ul(
                  tags$li(strong("log2FC+Padj:"), "selects DE genes whose log fold changes are higher and adjusted p-values are lower 
                          than a predetermined value."),
                  tags$li(strong("Top n Stat:"), "selects DE genes whose test statistics are among the top highest statistics associated to all genes."),
                )
        )
      ),
      p(style="text-align: justify;", 
        "Users who wish to analyze the data further should click ", strong("Go to Compositional Profiling"), " button to initiate an RNA deconvolution using 
        the reference scRNA data set, or ", strong("Go to Comparative Profiling"), "to calculate Membership Scores given an external bulk gene expression data set.")
    ),
    column(4,
           br(),
           br(),
           tags$img(src = "www/images/ComputationalProfiling.png", width="100%", height= "100%")
    ),
    column(9,
      
      h4("3.2 DE analysis"),
      p("The goal of differential gene expression analysis is to find genes
  or transcripts whose difference in expression, when accounting for the
  variance within condition, is higher than expected by chance."),
      tags$ul(
        tags$li(
          a("DESeq2", href="https://bioconductor.org/packages/release/bioc/html/DESeq2.html"),
          "is an R package available via Bioconductor and is designed to normalize count 
          data from high-throughput sequencing assays such as RNA-Seq and test for
          differential expression (Love et al. 2014).  With multiple parameters such as
          padjust values, log fold changes, plot styles, and so on, altering plots
          created with your DE data can be a hassle as well as time consuming. The
          Differential Expression Browser uses DESeq2 (Love et al., 2014)."
        ),
        tags$li(
          a("EdgeR",href="https://bioconductor.org/packages/release/bioc/html/edgeR.html"),
          "(Robinson et al., 2010) and ",
          a("Limma", href="https://bioconductor.org/packages/release/bioc/html/limma.html"),
          "are (Ritchie et al., 2015) coupled with shiny (Chang, W. et al., 2016)
          to produce real-time changes within your
          plot queries and allows for interactive browsing of your DE results.
          In addition to DE analysis, Dprofiler also offers a variety of other plots
          and analysis tools to help visualize your data even further."),
      ),
      p("Parameters that are used to conduct DE analysis are:"),
      tags$ul(
        tags$li(strong("DESeq2"),
                tags$ol(
                  tags$li(strong("fitType:"), " either 'parametric', 'local', or 'mean' for the type of fitting of 
                          dispersions to the mean intensity. See estimateDispersions for description."),
                  tags$li(strong("betaPrior:"), " whether or not to put a zero-mean normal prior on the non-intercept 
                          coefficients See nbinomWaldTest for description of the calculation of 
                          the beta prior. By default, the beta prior is used only for the Wald test, 
                          but can also be specified for the likelihood ratio test."),
                  tags$li(strong("testType:"), " either 'Wald' or 'LRT', which will then use either Wald significance tests 
                          (defined by nbinomWaldTest), or the likelihood ratio test on the difference in 
                          deviance between a full and reduced model formula (defined by nbinomLRT).")
                )
          
        )
      ),
      tags$ul(
        tags$li(strong("EdgeR"),
                tags$ol(
                  tags$li(strong("Normalization:"), " calculate normalization factors to scale the raw library sizes. Values 
                          can be 'TMM','RLE','upperquartile','none'."),
                  tags$li(strong("Dispersion:"), " either a numeric vector of dispersions or a character string indicating 
                          that dispersions should be taken from the data object."),
                  tags$li(strong("testType:"), "ExactTest or glmLRT.",
                          tags$ul(
                            tags$li(strong("exactTest:")," computes p-values for differential 
                                    abundance for each gene between two samples, conditioning on the total 
                                    count for each gene. The counts in each group are assumed to follow a 
                                    binomial distribution. "),
                            tags$li(strong("glmLRT:")," fits a negative binomial generalized 
                                    log-linear model to the read counts for each gene and conducts 
                                    genewise statistical tests.")
                          )
                  ),
                )
                
        )
      ),
      tags$ul(
        tags$li(strong("Limma"),
                tags$ol(
                  tags$li(strong("Normalization:"), " calculate normalization factors to scale the raw library sizes. Values 
                          can be 'TMM','RLE','upperquartile','none'."),
                  tags$li(strong("Fit Type:"), " fitting method; 'ls' for least squares or 'robust' for robust regression."),
                  tags$li(strong("Norm. Bet. Arrays:"), " normalization Between Arrays; Normalizes expression intensities so that the 
                          intensities or log-ratios have similar distributions across a set of arrays.")
                )
                
        )
      )
    )
  )
}

#' getCompCellularText
#' 
#' Dprofiler help text
#'
getCompCellularText <-function(){
  list(
    column(8,
           h3("4. Cellular Composition Analysis using Reference scRNA Datasets"),
           
           p(style="text-align: justify;",
             "Dprofiler provides methods for Deconvolution of Bulk RNA expression data sets using annotated reference single cell RNA datasets where
              the percentage of each cell type or state (i.e. cellular composition) in each bulk RNA sample is estimated. Dprofiler incorporates three
              distinct RNA deconvolution methods to elucidate cellular compositions of bulk samples:"), 
             tags$ul(
               tags$li(a("MuSIC", href="https://xuranw.github.io/MuSiC/articles/MuSiC.html")),
               tags$li(a("BisqueRNA", href="https://rdrr.io/cran/BisqueRNA/f/vignettes/bisque.Rmd")),
               tags$li(a("MuSIC", href="https://meichendong.github.io/SCDC/index.html"))
             ),
           p("Parameters that are used to conduct Cellular Composition Analysis are:"),
           tags$ul(
             tags$li(strong("Select Annotation:"), " identification type that specifies the reference profiles such as cell states, cell types, or sub cell type."),
             tags$li(strong("Identifications:"), " a list of reference cellular profiles specified in Select Annotation (e.g. Keratinocytes, Melanoytes etc.)"),
             tags$li(strong("Samples:"), " the metadata field that specifies samples associated with barcodes/cells (e.g. Patients, Donors etc.)"),
             tags$li(strong("Use All genes ?: "), " whether to use all genes for existing in both the bulk and reference single cell data set, 
                     if the 'No' is selected, additional parameters should be provided",
                     tags$ul(
                       tags$li(strong("Top N Markers: "), "Top N number of Cell Type/State markers (sorted by", strong("Log fold change"),") selected from the marker table"),
                       tags$li(strong("LogFC: "), "Threshold for log fold change of markers. Only markers that have", strong("larger"), " than the threshold are picked"),
                       tags$li(strong("Padj-value: "), "Threshold for p adjusted value of markers. Only markers that have", strong("smaller"), " than the threshold are picked"),
                       tags$li(strong("Pct.1: "), "Threshold for the fraction of non zero counts of each marker within each cell type/state. Only markers that have", strong("larger"), " than the threshold are picked"),
                       tags$li(strong("Pct.2: "), "Threshold for the fraction of non zero counts of each marker within remaining cell types/state. Only markers that have", strong("smaller"), " than the threshold are picked"),
                     )
             )
           ),
           h4("MuSIC"),
           p(style="text-align: justify;",
             "The MuSIC algorithm employs single cell gene expression profiles to acquire non-negative 
           least squares estimates (Wang et al.). A specific feature of MUSIC allows the proportions of closely related cell types to be correctly estimated. 
           To deal with collinearity, MuSiC employs a tree-guided procedure that recursively zooms in on closely related cell types. Rather than pre-selecting 
           marker genes from scRNA-seq based only on mean expression, MuSIC gives weight to each gene allowing for the use of a larger set of genes in 
           deconvolution. The weighting scheme prioritizes consistent genes across subjects:"), 
           tags$ul(
             tags$li(strong("(i) "), "up-weighing genes with low cross-subject variance (informative genes) and"),
             tags$li(strong("(ii) "), "down-weighing genes with high cross-subject variance (non-informative genes).")),
           p("This requirement on cross-subject consistency is critical for transferring cell type-specific gene expression information from one 
             dataset to another."),
           h4("BisqueRNA"),
           p(style="text-align: justify;",
             "The BisqueRNA algorithm incorporates single cell gene expression profiles learned from multiple donors/samples to train non-negative 
             least squares regression models (Jew et al.). BisqueRNA employs a unique normalization strategy where the algorithm learns baseline 
             mean and variance expression of each gene from the reference single cell data and uses these estimates to normalize the bulk dataset. 
             This methodology ultimately accounts for technical and experimental biases and allow a much accurate estimation of the cellular fractions 
             across submitted bulk RNAseq samples. Bisque have demonstrated that the decomposition accuracy of Bisque is robust to increasing variation 
             between the generation of the reference profile and bulk expression which is a significant issue, for example when comparing snRNA-seq 
             and bulk RNA-seq data sets."),
           h4("SCDC"),
           p(style="text-align: justify;",
             "The SCDC algorithm implements a weighted non-negative least squares regression model with subject-specific maximal variance weights 
             (Dong et al.). SCDC first captures the cross-cell variation for each gene and cell type of every individual. Within-subject variance 
             is calculated using the cross-cell variations which then used to normalize and scale the single cell read counts before RNA deconvolution. ")
    )
    # column(4,
    #        br(),
    #        br(),
    #        tags$img(src = "www/images/rnadeconvolution.png", width="100%", height= "100%")
    # )
  )
}

#' getComparativeProfText
#' 
#' Dprofiler help text
#'
getComparativeProfText <-function(){
  list(
    column(6,
           h3("5. Comparative Profiling using Reference Datasets of Bulk Samples"),
           p(style="text-align: justify;",
             "Dprofiler provides methods for calculating membership scores given phenotypic profiles of external reference bulk samples to profile
             the samples of the submitted bulk samples."),
           p("The gene expression profiles of the external reference bulk samples are often limited to genes of interest where Dprofiler
             uses an overlapping set of differentially expressed genes of submitted data set and gene profiles of reference bulk data sets to 
             compute membership scores."),
           p("Parameters that are used to conduct iterative DE analysis are:"),
           tags$ul(
             tags$li(strong("Select Series:"), " name of the dataset (or Series), if the external data imported from Dmeta, multiple datasets might be available
                     to the users."),
             tags$li(strong("Select Meta:"), " metadata variable for conditions/phenotypes."),
             tags$li(strong("Score Method:"), " the algorithm to determine membership scores of each individual sample:",
                     tags$ul(
                       tags$li(style="text-align: justify;",
                               strong("Silhouette "), "estimates the score via the normalized difference between 
                          average distances to the samples of one condition and average distance to the ones of the other condition."),
                       tags$li(style="text-align: justify;",
                               strong("NNLS "), "estimates the score with a non-negative least squares regression model where each 
                          sample modeled against mean expression profiles of samples from each condition.")
                     )
             )
           )
    ),
    column(3,
           br(),
           br(),
           br(),
           tags$img(src = "www/images/ComparativeSilhouette.png", width="100%", height= "100%")
    )
  )
}

#' getQAText
#' 
#' Dprofiler help text
#'
getQAText<-function(){
  list(
    h3("5. Frequently asked questions (FAQ)"),
    h4("5.1 Why un-normalized counts?"),
    p("DESeq2 requires count data as input obtained from 
          RNA-Seq or another high-thorughput sequencing experiment 
          in the form of matrix values. Here we convert un-integer 
          values to integer to be able to run DESeq2. The matrix values 
          should be un-normalized, since DESeq2 model internally corrects for 
          library size. So, transformed or normalized values such as counts 
          scaled by library size should not be used as input. Please use edgeR 
          or limma for normalized counts."),
    h4("5.2 Why am I getting error while uploading files?"),
    p("* Dprofiler supports tab, comma or semi-colon separated files. However spaces or characters in numeric regions not supported and causes an error while uploading files. It is crutial to remove these kind of instances from the files before uploading files."),
    p("* Another reason of getting an error is using same gene name multiple times. This may occurs after opening files in programs such as Excel, which tends to automatically convert some gene names to dates (eg. SEP9 to SEP.09.2018). This leads numerous problems therefore you need to disable these kind of automatic conversion before opening files in these kind of programs."),
    p("* Some files contain both tab and space as an delimiter which lead to error. It is required to be cleaned from these kind of files before loading."),
    h4("5.3 Why some columns not showed up after upload?"),
    p("If a character in numeric area or space is exist in one of your column, either column will be eliminated or you will get an error. Therefore it is crutial to remove for these kind of instances from your files before uploading."),
    h4("5.4 Why am I getting error while uploading CSV/TSV files exported from Excel?"),
    p("* You might getting an error, because of using same gene name multiple times. This may occurs after opening files in programs such as Excel, which tends to automatically convert some gene names to dates (eg. SEP9 to SEP.09.2018). Therefore you need to disable these kind of automatic conversion before opening files in these kind of programs."),
    h4("5.5 Why can't I see all the background data in Main Plots?"),
    p("In order to increase the performance, by default 10% of non-significant(NS) genes are used to generate plots. We strongly suggest you to use all of the NS genes in your plots while publishing your results. You can easily change this parameter by clicking **Main Options** button and change Background Data(%) to 100% on the left sidebar."),
    h4("5.6 Why am I getting error when I click on DE Genes in Go Term Analysis?"),
    p("To start ", strong("Go Term"), " analysis, it is important to select correct organism from ", strong("Choose an organism"), " field. After selecting other desired parameters, you can click ", strong("Submit")," button to run Go Term analysis. After this stage, you will able to see", strong(" categories")," regarding to your selected gene list in the ", strong("Table")," Tab. Once you select this category, you can click DE Genes button to see gene list regarding to selected category."),
    h4("5.7 How to download selected data from Main plots/QC Plots/Heatmaps?"),
    p("First, you need to choose ", strong("Choose dataset"), " field as ",strong("selected")," under ",strong("Data Options")," in the left sidebar. When you select this option, new field: ",strong("The plot used in selection")," will appear under ", strong("Choose dataset")," field. You need to specify the plot you are interested from following options: Main plot, Main Heatmap, QC Heatmap. Finally you can click ", strong("Download Data"), " button to download data, or if you wish to see the selected data, you can click ",strong("Tables")," tab.")
  )
}
UMMS-Biocore/dprofiler documentation built on Oct. 16, 2022, 11:37 a.m.