For area = 0: 1. Check treated as 0 left 315 Proteins 2. Check treated as missing value left 317 Proteins
Before applying T-test: There are proteins that only have one group, this make t-test failed. Identify proteins who have two groups and perform statistical test only on those proteins.
For applying T-test: non-transform data vs log-transform data 1. non-transform data: (i) t-test for example protein IGKC and its plot (ii) Perform t-test for the whole proteins and extract their P-value (iii) Use p.adjusted function with flase discovery rate and filter the protein who has adjusted p-value < 0.05, discard the high p-value proteins. (iv) 170 Proteins are statisticallt significant
t-test: (i) since not a large amount of zero intensity through all the obervations, decide to add a small number 0.001 to our data (ii) do log transformation on our new value data and then can perform t-test (iii) same as non-transform data, we left 129 proteins that are statistically significant
Conclusion: need log-transformation, before transformation, the data is right skewed and after log-transformation, the distribution of our data seems to have normal distribution with few outliers. (to do : )
28 Jan To write up a description for (with some individual protein plots): - the effect of applying a log transform to the values - looking at reliability with and without log transform - looking at between group differences with and without log transform
21 Jan 2022 -t-tests / wilcoxon rank sum with and without log transform - function to compute statistical tests for each protein - logistic regression
Going back to the reliability:
7 Jan 2022: Look at the difference between reliability if proteins with Area=0 are treated as 0 vs if they are treated as missing (NA)
Look for candidate biomarkers with the following tests: - t-tests - Wilcoxon test - logistic regression - test of proportion for ANY protein
Control for multiple testing
Which proteins discriminate between groups? Which test(s) are most appropriate?
Programming tasks:
8 October improved the import function
1 October 2021 package initialised
A good reference for github and R: https://happygitwithr.com/new-github-first.html
A good reference for writing R packages: https://r-pkgs.org/
A package to import data from Proteome Discoverer, screen and identify proteins of interest
Step 1. Import & wrangle the data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.