analyze.MSfile: Meta-analysis of Mass Spectrometry data file/frame

Description Usage Arguments Details Value Examples

View source: R/AnnoMass.R

Description

Annotates MS data file with respect cellular localisation

Usage

1
analyze.MSfile(MSfile,Annotation=NULL,Metadata="Christoforou",annotation.ID=1,data.ID=2,markers=3,group_names=NULL,clusters=NULL,output.data="annotated_data.txt",output.cdt="heatmap.cdt",output.roc="roc.pdf",sep="\t",method="kmeans",metric="euclidean",iter.max=100,nstart=1,group=NULL,subset=NULL,sort.by=1,cluster.metadata=FALSE,overlap=NULL)

Arguments

MSfile

data.frame or character vector of path to the text file(s) with MS data. See details.

Annotation

data.frame or character of path to the text file with Annotation data. If NULL the AnnotationAM data is used (?AnnotationAM). If custom annotation file is used the first column must be uniprot ID.

Metadata

character vector, name(s) of MS data present in the package see ?Metadata

annotation.ID

integer. Column index of protein ID (such as uniprot ID) in the Annotation data.frame). IDs must match to those in annotation.ID column. Default for Annotation=NULL is genename

data.ID

integer. Column index of protein ID (such as uniprot ID) in the MSfile(s) or data.frame respectively. IDs must match to those in annotation.ID column

markers

integer vector. Column indices of cellular localizations in the Annotation data.frame. If a vector is provided the annotation is pefermed separately with respect all indicated columns

group_names

optional. character vector with names of the studies in the MSfile(s). See details

clusters

positive integer or NULL. Number of clusters to be created. If NULL number of clusters is estimated as nrow(Data)%/%5

output

character. prefix of files where the results will be stored. output files: output_table.txt - annoataed data with cluster assignement ouput_javatree.cdt - heatmap for java treeview output_pr.pdf - precision-recall curves. see ?plot.rocAM otput_pr_abs.pdf - number of assigned proteins agains cluster precision

sep

to specify a character which delimits fields. default is tab-delimited text.

method

chracter. clustering algorithm to be used. acceptable values are "kmeans", "pam" or "order". Pam (partitioning around medoids is more robust, but could be time consuming. "crude" just orders the data with respect the first column (splitting the ties in oposite direction with respect following columns) a divides the sequence in desired number of clusters. Usefull for simple fractionation methods see example section.

metric

character. metric to be used for building distance matrix for clustering. acceptable values for "pam" are "euclidean","manhattan" or "correlation". For "kmeans" see ?Kmeans argument "method"

iter.max

The maximum number of iterations allowed.

nstart

how many random sets should be chosen

group

vector of positive integers, 0 or NULL. Which studies should be used for clustering. if NULL all studies are used, if 0 none is and only Metadata are clustered

subset

vector of integers or NULL. Lines in the data.frame (see ?get.data) used for clustering. if NULL all lines are used

sort.by

integer. with respect to which annotation (see markers argument) should be the output sorted. This marker set will be exported as reference into the cdt file gene description.

cluster.metada

logical. Should we cluster the Metadata?

overlap

if null only proteins detected in all studies and Metadata will be used for clustering. If integer (k) only proteins detected at least k studies (which are used for clustering - see parameters cluster.metadata and group) will be used for clustering

Details

the data are clustered and annotated wih rescpect to chosen annonation(s); see ?get.clusters for the meaning of "assigned_location" and "main_component". data in outupfiles (_table and _treeview) are sorted first with respect to the assigned_location (in the order in data(levelsC) data.frame a ties are splitted first with recpect to the number of annotations in the clusters and then with respect to precision_assigned_location score. The values of markers[sort.by] annotation are used.

Value

Object of class AnnoMass

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
##See the vignette and corresponding Analysis option for each example for explanation

###################################################
### example 1
###################################################
file2<-system.file("extdata","Data_Fig2a.txt",package="MetaMass")
analyze.MSfile(MSfile = file2, overlap=2, output = "Fig2a")
##cluster with respect MSfile only (cluster.metadata=FALSE by default)



###################################################
###example 2
###################################################
file2<-system.file("extdata","Data_Fig2a.txt",package="MetaMass")
analyze.MSfile(MSfile = file2, overlap=2, output = "Fig2acurves", markers = c(3:8))


###################################################
### example 3
###################################################
file2<-system.file("extdata","Data_Fig2a.txt",package="MetaMass")
analyze.MSfile(MSfile = file2, overlap=2, output = "Fig2aUniGOoverlap", markers = 4)


###################################################
### example 4
###################################################
study4<-system.file("extdata","Carvalho.txt",package="MetaMass")
analyze.MSfile(MSfile = study4, overlap=1, output = "study4", markers = 8)


###################################################
### example 5
###################################################
study4_9_10<-system.file("extdata",c("Carvalho.txt","Bileck.txt","Thakar.txt"),package="MetaMass")
analyze.MSfile(MSfile = study4_9_10,  overlap=2, output = "study4910")

stuchly/MetaMass documentation built on Nov. 14, 2019, 10:58 p.m.