knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

Creating a configuration for a file in wide data format

Frequently, the data are available in tables, where the rows represent proteins, and the columns do represent samples. The example shows such a table. The first column contains, the protein id, while the other columns store the intensities for sample A, B, C.

df <- data.frame(protein_Id = c("tr|A|HUMAN","tr|B|HUMAN","tr|C|HUMAN","tr|D|HUMAN"),
                 Intensity_A = c(100,10000,10,NA),
                 Intensity_B = c(NA, 9000, 20, 100),
                 Intensity_C = c(200,8000,NA,150),
                 Intensity_D = c(130,11000, 50, 50))
df

This table can be converted into a table in the long format using:

table_long <- tidyr::pivot_longer(df, starts_with("Intensity_"),names_to = "Sample", values_to = "Intensity")
table_long

In addition you will need a table with the Sample annotations. In this example with have two groups A, B.

annot <- data.frame(Sample = c("Intensity_A", "Intensity_B", "Intensity_C", "Intensity_D"), Group = c("A","A","B","C"))

Now you can annotate the samples in the table with the Intensities.

table_long <- dplyr::inner_join(annot, table_long)

We create a AnalysisTableAnnotation and start annotating the data frame, that is specifying which column contains wich type of information.

atable <- prolfqua::AnalysisTableAnnotation$new()
atable$fileName = "Sample"
atable$workIntensity = "Intensity"

The columns identifying the measured features, which are proteins, peptides or precursors, are described using the named list hierarchy. The values of the list are the column names, while the names are arbitrary as long as they are valid R column names. Here we use the same names as the column names.

The list factors, is used to point to the columns containing the factors of your analysis (Group).

atable$hierarchy[["protein_Id"]]    <-  "protein_Id"
atable$factors[["Group"]] <- "Group"
config <- prolfqua::AnalysisConfiguration$new(atable)
analysis_data <- prolfqua::setup_analysis(table_long, config)
lfqdata <- prolfqua::LFQData$new(analysis_data, config)
lfqdata$hierarchy_counts()
smrz <- lfqdata$get_Summariser()
smrz$plot_hierarchy_counts_sample()

Creating a configuration for a file in long data format.

Given for example a Peptide Quantification Report generated by Spectronaut (a table in long format), we demonstrate how to create a configuration that is required to use it with prolfqua. To do this, an AnalysisTableAnnotation has to be configured and some fields (fileName, hierarchy, factors, workingIntensity) need to defined. The configuration object describes the columns in the long table so that prolfqua functions know which columns to use.

dataLongFormat <- prolfqua::sim_lfq_data(Nprot = 20, PEPTIDE = TRUE)
head(dataLongFormat)

We create a Table annotation object and start annotating the data we read. Since in this example we eventually want to do more filtering on data quality we will also define the ident_qValue in this AnalysisTableAnnotation.

atable <- prolfqua::AnalysisTableAnnotation$new()
atable$fileName = "sample"
atable$workIntensity = "abundance"

The columns identifying the measured features, which are proteins, peptides or precursors, are described using the named list hierarchy. The values of the list are the column names, while the names are arbitrary as long as they are valid R column names. Here we use the same names as the column names.

The list factors, is used to point to the columns containing the factors of your analysis (group). Here, we rename the column "R.Condition" to "Marker". In figures and legends generated by prolfqua the name "Marker" will then be used and not "R.Condition". The data.frame can also contain more than one factor.

atable$hierarchy[["proteinID"]]    <-  "proteinID"
atable$hierarchy[["peptideID"]]    <-  "peptideID"
atable$factors[["group"]] <- "group"

Lastly, we create an Analysis parameter object, and the Analysis Configuration. The function setup_analysis, creates from data frame in long format a data.frame compatible with your configuration. We can now run most of the function in the package using the data and configuration.

config <- prolfqua::AnalysisConfiguration$new(atable)
analysis_data <- prolfqua::setup_analysis(dataLongFormat, config)

prolfqua::summarize_hierarchy(analysis_data, config)

Now the analysis_data object is ready to generate the LFQData class instance. This object is the start for further analysis.

lfqdata <- prolfqua::LFQData$new(analysis_data, config)

With this, it is possible for example to use the get_Summariser function to visualize and summarise the data efficiently.

smrz <- lfqdata$get_Summariser()
smrz$plot_hierarchy_counts_sample()

The prolfqua package is described in [@Wolski2022.06.07.494524].

Session Info

sessionInfo()

References



wolski/prolfqua documentation built on May 12, 2024, 10:16 p.m.