UPMASKdata: Run UPMASK in a data frame
In UPMASK: Unsupervised Photometric Membership Assignment in Stellar Clusters

Description Usage Arguments Value Author(s) References Examples

UPMASKdata executes the UPMASK method on a data frame, and returns another data frame as output, including the membership analysis result as additional columns.

UPMASKdata is a method for performing membership assignment in stellar clusters. The distributed code is prepared to use photometry and spatial positions, but it can take into account other types of data as well. The method is able to take into account arbitrary error models (the used must rewrite the takeErrorsIntoAccount function), and it is unsupervised, data-driven, physical-model-free and relies on as few assumptions as possible. The approach followed for membership assessment is based on an iterative process, dimensionality reduction, a clustering algorithm and a kernel density estimation.

UPMASKdata(dataTable, positionDataIndexes=c(1,2),
photometricDataIndexes=c(3,5,7,9,11,19,21,23,25,27),
photometricErrorDataIndexes=c(4,6,8,10,12,20,22,24,26,28), threshold=1, 
classAlgol="kmeans", maxIter=25, starsPerClust_kmeans=25, nstarts_kmeans=50, 
nRuns=8, runInParallel=FALSE, paralelization="multicore", independent=TRUE, 
verbose=FALSE, autoCalibrated=FALSE, considerErrors=FALSE, 
finalXYCut=FALSE, nDimsToKeep=4, dimRed="PCA", scale=TRUE)

`dataTable`	a data frame with the data to perform the analysis
`positionDataIndexes`	an array of integers indicating the columns of the data frame containing the spatial position measurements
`photometricDataIndexes`	an array of integers with the column numbers containing photometric measurements (or any other measurement to go into the PCA step)
`photometricErrorDataIndexes`	an array of integers with the column numbers containing the errors of the photometric measurements
`threshold`	a double indicating the thresholding level for the random field analysis
`classAlgol`	a string indicating the type of clustering algorithm to consider. Only k-means is implemented at this moment (defaults to kmeans)
`maxIter`	an integer the maximum amount of iterations of the outer loop before giving up convergence (usually it is not necessary to modify this)
`starsPerClust_kmeans`	an integer with the average number of stars per k-means cluster
`nstarts_kmeans`	an integer the amount of random re-initializations of the k-means clustering method (usually it is not necessary to modify this)
`nRuns`	the total number of individual runs to execute the total number of outer loop runs to execute
`runInParallel`	a boolean indicating if the code should run in parallel
`paralelization`	a string with the type of paralilization to use. the paralelization can be: "multicore" or "MPIcluster". At this moment only "multicore" is implemented (defaults to multicore).
`independent`	a boolean indicating if non-parallel runs should be completely independent
`verbose`	a boolean indicating if the output to screen should be verbose
`autoCalibrated`	a boolean indicating if the number of random field realizations for the clustering check in the position space should be autocalibrated (experimental code, defaults to FALSE).
`considerErrors`	a boolean indicating if the errors should be taken into account
`finalXYCut`	a boolean indicating if a final cut in the XY space should be performed (defaults to FALSE)
`nDimsToKeep`	an integer with the number of dimensions to consider (defaults to 4)
`dimRed`	a string with the dimensionality reduction method to use (defaults to PCA. The only other options are LaplacianEigenmaps or None)
`scale`	a boolean indicating if the data should be scaled and centered

A data frame with the original data used to run the method and additional columns indicating the classification at each run, as well as a membership probability in the frequentist sense.

Alberto Krone-Martins, Andre Moitinho

Krone-Martins, A. & Moitinho, A., A&A, v.561, p.A57, 2014

## Not run: 
# Analyse a simulated open cluster using spatial and photometric data 
# Load the data into a data frame
fileNameI <- "oc_12_500_1000_1.0_p019_0880_1_25km_120nR_withcolors.dat"
inputFileName <- system.file("extdata", fileNameI, package="UPMASK")
ocData <- read.table(inputFileName, header=TRUE)

# Example of how to run UPMASK using data from a data frame
# (serious analysis require at least larger nRuns)
posIdx <- c(1,2)
photIdx <- c(3,5,7,9,11,19,21,23,25,27)
photErrIdx <- c(4,6,8,10,12,20,22,24,26,28)

upmaskRes <- UPMASKdata(ocData, posIdx, photIdx, PhotErrIdx, nRuns=2, 
                        starsPerClust_kmeans=25, verbose=TRUE)

# Create a simple raw plot to see the results
pCols <- upmaskRes[,length(upmaskRes)]/max(upmaskRes[,length(upmaskRes)])
plot(upmaskRes[,1], upmaskRes[,2], col=rgb(0,0,0,pCols), cex=0.5, pch=19)

# Clean the environment
rm(list=c("inputFileName", "ocData", "posIdx", "photIdx", "photErrIdx", 
          "upmaskRes", "pCols"))

## End(Not run)