MQDataReader-class | R Documentation |
This class is used to read MQ data tables using MQDataReader::readMQ()
while holding
the internal raw file –> short raw file name mapping (stored in a member called
'fn_map') and updating/using it every time MQDataReader::readMQ()
is called.
file |
(Relative) path to a MQ txt file. |
filter |
Searched for "C" and "R". If present, [c]ontaminants and [r]everse hits are removed if the respective columns are present.
E.g. to filter both, |
type |
Allowed values are: "pg" (proteinGroups) [default], adds abundance index columns (*AbInd*, replacing 'intensity') "sm" (summary), splits into three row subsets (raw.file, condition, total) "ev" (evidence), will fix empty modified.sequence cells for older MQ versions (when MBR is active) "msms_scans", will fix invalid (negative) scan event numbers Any other value will not add/modify any columns |
col_subset |
A vector of column names as read by read.delim(), e.g., spaces are replaced by dot already. If given, only columns with these names (ignoring lower/uppercase) will be returned (regex allowed) E.g. col_subset=c("^lfq.intensity.", "protein.name") |
add_fs_col |
If TRUE and a column 'raw.file' is present, an additional column 'fc.raw.file' will be added with
common prefix AND common substrings removed ( |
check_invalid_lines |
After reading the data, check for unusual number of NA's to detect if file was corrupted by Excel or alike |
LFQ_action |
[For type=='pg' only] An additional custom LFQ column ('cLFQ...') is created where zero values in LFQ columns are replaced by the following method IFF(!) the corresponding raw intensity is >0 (indicating that LFQ is erroneusly 0) "toNA": replace by NA "impute": replace by lowest LFQ value >0 (simulating 'noise') |
... |
Additional parameters passed on to read.delim() |
colname |
Name of the column (e.g. 'contaminants') in the mq.data table |
valid_entries |
Vector of values to be replaced (must contain all values expected in the column – fails otherwise) |
replacements |
Vector of values inserted with the same length as |
Since MaxQuant changes capitalization and sometimes even column names, it seemed convenient to have a function which just reads a txt file and returns unified column names, irrespective of the MQ version. So, it unifies access to columns (e.g. by using lower case for ALL columns) and ensures columns are identically named across MQ versions:
alternative term new term ----------------------------------------- protease enzyme protein.descriptions fasta.headers potential.contaminant contaminant mass.deviations mass.deviations..da. basepeak.intensity base.peak.intensity
We also correct 'reporter.intensity.*' naming issues to MQ 1.6 convention, when 'reporter.intensity.not.corrected' is present. MQ 1.5 uses: reporter.intensity.X and reporter.intensity.not.corrected.X MQ 1.6 uses: reporter.intensity.X and reporter.intensity.corrected.X
Note: you must find a regex which matches both versions, or explicitly add both terms if you are requesting only a subset of columns!
Fixes for msmsScans.txt: negative Scan Event Numbers in msmsScans.txt are reconstructed by using other columns
Automatically detects UTF8-BOM encoding and deals with it (since MQ2.4).
Example of usage:
mq = MQDataReader$new() d_evd = mq$readMQ("evidence.txt", type="ev", filter="R", col_subset=c("proteins", "Retention.Length", "retention.time.calibration"))
If the file is empty, this function shows a warning and returns NULL. If the file is present but cannot be read, the program will stop.
Wrapper to read a MQ txt file (e.g. proteinGroups.txt).
A data.frame of the respective file
Replaces values in the mq.data member with (binary) values.
Most MQ tables contain columns like 'contaminants' or 'reverse', whose values are either empty strings
or "+", which is inconvenient and can be much better represented as TRUE/FALSE.
The params valid_entries
and replacements
contain the matched pairs, which determine what is replaced with what.
Returns TRUE
if successful.
getInvalidLines()
Detect broken lines (e.g. due to Excel import+export)
When editing a MQ txt file in Microsoft Excel, saving the file can cause it to be corrupted, since Excel has a single cell content limit of 32k characters (see http://office.microsoft.com/en-001/excel-help/excel-specifications-and-limits-HP010342495.aspx) while MQ can easily reach 60k (e.g. in oxidation sites column). Thus, affected cells will trigger a line break, effectively splitting one line into two (or more).
If the table has an 'id' column, we can simply check the numbers are consecutive. If no 'id' column is available, we detect line-breaks by counting the number of NA's per row and finding outliers. The line break then must be in this line (plus the preceeding or following one). Depending on where the break happened we can also detect both lines right away (if both have more NA's than expected).
Currently, we have no good strategy to fix the problem since columns are not aligned any longer, which leads to columns not having the class (e.g. numeric) they should have. (thus one would need to un-do the linebreak and read the whole file again)
[Solution to the problem: try LibreOffice 4.0.x or above – seems not to have this limitation]
@return Returns a vector of indices of broken (i.e. invalid) lines
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.