ClusterFind: ClusterFind
In iPAC: Identification of Protein Amino acid Clustering

Description Usage Arguments Details Value Note References Examples

View source: R/ClusterFind.R

ClusterFind is the main method of the iPAC package. It identifies clusters of mutated amino acids while taking into account the protein structure.

ClusterFind(mutation.data, position.data, method = "MDS", alpha = 0.05, 
			MultComp = "Bonferroni", Include.Culled = "Y", Include.Full = "Y", 
			create.map = "Y", Show.Graph = "Y", Graph.Output.Path = NULL,
			Graph.File.Name = "Map.pdf", Graph.Title = "Mapping", 
			OriginX = min(position.data[, 4]), OriginY = min(position.data[, 5]),
			OriginZ = min(position.data[, 6]))

`mutation.data`	A matrix of 0's (no mutation) and 1's (mutation) where each column represents an amino acid in the protein and each row represents an individual sample (test subject, cell line, etc). Thus if column i in row j had a 1, that would mean that the ith amino acid for person j had a nonsynonomous mutation.
`position.data`	A dataframe consisting of five columns: 1) Residue Name, 2) Amino Acid number in the protein, 3) Side Chain, 4) X-coordinate, 5) Y-coordinate and 6) Z-coordinate. Please see get.Positions and get.AlignedPositions for further information on how to construct this matrix.
`method`	You can select whether you want a "MDS" or "Linear" approach in order to map the protein into a 1D space.
`alpha`	The significance level used in the NMC calculation. Please see Ye. et. al. for more information.
`MultComp`	The multiple comparisons adjustment used in the NMC calculation. Possible options are "None", "Bonferroni" and "BH". Please see Ye. et. al. for more information.
`Include.Culled`	If "Y", the standard NMC algorithm will be run on the protein after removing the amino acids for which there is no positional data.
`Include.Full`	If "Y", the standard NMC algorithm will be run on the full protein sequence.
`create.map`	If "Y", a graphical representation of the the dimension reduction from 3D to 1D space will be created (though not necessarily displayed).
`Show.Graph`	If "Y", the graph representation will be displayed. Warning: You must be running R in a GUI environment, otherwise, an error will occur.
`Graph.Output.Path`	If you would like the picture saved atomatically to the disk, specify the output directory here. The Graph.File.Name variable must be set as well.
`Graph.File.Name`	If you would like the picture saved automatically to the disk, specify the output file name. The Graph.Output.Path variable must be set as well.
`Graph.Title`	The title of the graph to be created.
`OriginX`	If the "Linear" method is chosen, this specifies the x-coordinate part of the fixed point.
`OriginY`	If the "Linear" method is chosen, this specifies the y-coordinate part of the fixed point.
`OriginZ`	If the "Linear" method is chosen, this specifies the z-coordinate part of the fixed point.

The linear method fixes a point, defined by the parameters OriginX, OriginY, OriginZ, and then calculates the distance from each amino acid to that point. The graph produced by ClusterFind (if requested), shows these distances as dotted green lines. The length of the green line is used to reorder the protein, with the amino acid that corresponds to the shortest green line being ordered first and the amino acid corresponding to the longest green line being ordered last.

Additional methods will be available in future versions of this package.

`Remapped`	This shows the clusters found while taking the 3D structure into account.
`OriginalCulled`	This shows the clusters found if you run the NMC algorithm on the canonical linear protein, but with the amino acids for which we don't have 3D positional data removed.
`Original`	This shows the clusters found if you run the NMC algorithn on the canonical linear protein with all the amino acids.
`MissingPositions`	This shows which amino acids are present in the mutation matrix but for which we do not have positions. These amino acids are cut from the protein when calculating the Remapped and OriginalCulled results.

If no significant clusters are found, a "NULL" will be returned for the appropriate section (Remapped, OriginalCulled, or Original).

If you want the graph to just display on a new R graphics devices without saving it to the disk, simply set the Graph.Output.Path or the Graph.File.Name parameters to be NULL while leaving both the create.map and the Show.Graph parameters to be "Y".

If you are running this algorithm on a terminal with no GUI (such as a computational cluster), set Show.Graph to "N" as R will not be able to open a new graphics device. However, you can still have the graphics saved for later viewing by setting the Graph.Output.Path and Graph.File.Name variables.

If ClusterFind displays the message "Error in 0:(n - j) : NA/NaN argument", this most likely signifies that after removing the amino acids for which there is no positional data, the mutation data matrix is all 0's. For instance, if all the mutations occured on the 5th amino acid in the protein, and we did not have 3D positional information for that amino acid, the mutation data for the remaining positions would be all 0's. An error message in this situation will be displayed in the results section as well. In such cases, the user should run the original NMC algorithm only (see nmc) or select an alternative protein structure.

When unmapping back to the original space, the end points of the cluster in the mapped space are used as the endpoints of the cluster in the unmapped space.

Ye et. al., Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics. 2010. doi:10.1186/1471-2105-11-11.

#Extract the data from a CIF file and match it up with the canonical protein sequence.
#Here we use the 3GFT structure from the PDB, which corresponds to the KRAS protein.
CIF<-"https://files.rcsb.org/view/3GFT.cif"
Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta"
KRAS.Positions<-get.Positions(CIF,Fasta, "A")

#Load the mutational data for KRAS. Here the mutational data was obtained from the
#COSMIC database (version 58). 
data(KRAS.Mutations)

#Identify and report the clusters using the default MDS method.
ClusterFind(mutation.data=KRAS.Mutations, 
							position.data=KRAS.Positions$Positions,
							create.map = "Y",Show.Graph = "Y")
							
#Identify and report the clusters using the linear method.	
ClusterFind(mutation.data=KRAS.Mutations, 
							position.data=KRAS.Positions$Positions,
							create.map = "Y",Show.Graph = "Y", method = "Linear")