R/data.R
In multimix: Fit Mixture Models Using the Expectation Maximisation (EM) Algorithm

#' Prostate cancer patient data
#'
#' Data on 475 prostate cancer patients
#' 
#' There are twelve pre-trial covariates measured on each patient,
#' seven may be taken to be continuous, four to be discrete, and one 
#' variable (SG) is an index nearly all of whose values lie between 
#' 7 and 15, and which could be considered either discrete or continuous. 
#' We will treat SG as a continuous variable.
#'
#' A preliminary inspection of the data showed that the sizeof the 
#' primary tumour (SZ) and serum prostatic acid phosphatase (AP) were 
#' both skewed variables. These variables have therefore been transformed. 
#' A square root transformation was used for SZ, and a logarithmic
#' transformation was used for AP to achieve approximate normality. 
#' (As for correlation, skewness over the whole data set does not 
#' necessarily mean skewness within clusters. But when clusters were 
#' formed, within-cluster skewness was observed for these variables.) 
#' 
#' Observations that had missing values in any of the twelve pretreatment 
#' covariates were omitted from furtheranalysis, leaving 475 out of the 
#' original 506 observations available. 
#' 
#' The categorical variable \code{Patient activity} had 4 levels: 'Normally 
#' Active', 'Bed rest below 50% of waking hours', 'Bed rest 50% of waking hours
#' or more', and 'Confined to bed'. The numbers of the 475 in these groups were
#'  428, 32, 12, and 3. The least active two groups are grouped in our data,
#' giving 3 groups of size 428, 32, and 15.
#'
#' @format A data.frame with 475 rows and 12 columns:
#' \describe{
#' \item{age}{Age in years}
#' \item{wt}{Weight in pounds}
#' \item{pf}{Patient activity}
#' \item{hx}{Family history of cancer}
#' \item{sbp}{Systolic blood pressure}
#' \item{dbp}{Diastolic blood pressure}
#' \item{ekg}{Electrocardiogram code}
#' \item{hg}{Serum haemoglobin}
#' \item{sz}{Size of primary tumour}
#' \item{sg}{Index of tumour stage and histolic grade}
#' \item{ap}{Serum prostatic acid phosphatase}
#' \item{bm}{Bone metastatses}
#' }
#' 
#' @usage 
#' data(cancer.df)
#'
#' @source D.P. Byar and S.B. Green 'The choice of treatment for cancer patients
#'   based on covariate information - application to prostate cancer', Bulletin
#'   du Cancer 1980: 67:477--490, reproduced in D.A. Andrews and A.M. Herzberg
#'   'Data: a collection of problems from many fields for the student and
#'   research worker' p.261--274 Springer series in statistics, Springer-Verlag.
#'   New York.
"cancer.df"

#' Contraceptive Method Choice data
#' 
#' This dataset is a subset of the 1987 National Indonesia Contraceptive
#' Prevalence Survey. The cases are 1473 married women who were either not 
#' pregnant or do not know if they were at the time of interview. 
#' 
#' The variables 'age' (in years) and 'nborn' (ranging from 0 to 16) would
#' normally be treated as continuous; 'nborn' is skew and might well
#' be transformed. The remaining 8 variables are categorical. 
#' 
#' The variables 'edu', 'eduh' and 'sol' take values '1,2,3,4', #' they are 
#' ordinal with 1 = low and 4 = high. The variable 'husocc' takes the
#' same 4 values, but it is not clear if the order has any significance.
#' 
#' The variables 'islam', 'working', and 'medex' are binary-valued with
#' 0=Non-Islam, 1=Islam for 'islam';  0=Yes, 1=No for 'working'; and 0=Good, 
#' 1=Not good for 'medex'.
#' 
#' The variable 'method' is ternary: 1=No-use, 2=Long-term, 3=Short-term.
#' 
#' @format A data.frame with 1473 rows and 10 columns:
#' \describe{
#' \item{age}{Wife's age}
#' \item{edu}{Wife's education}
#' \item{eduh}{Husband's education}
#' \item{nborn}{Number of children ever born}
#' \item{islam}{Wife's religion}
#' \item{working}{Wife is now working?}
#' \item{husocc}{Husband's occupation}
#' \item{sol}{Standard-of-living index}
#' \item{medex}{Media exposure}
#' \item{method}{Contraceptive method used}
#' }
#' 
#' @usage 
#' data(cmc.df)
#'   
#' @source Tjen-Sien Lim 'Contraceptive Method Choice' 1997,
#'    UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. 
#'   Irvine, CA: University of California, School of Information and Computer 
#'   Science.
"cmc.df"

Any scripts or data that you put into this service are public.

multimix documentation built on Jan. 22, 2023, 1:13 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

multimix
Fit Mixture Models Using the Expectation Maximisation (EM) Algorithm

R/data.R
In multimix: Fit Mixture Models Using the Expectation Maximisation (EM) Algorithm

Try the multimix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multimix Fit Mixture Models Using the Expectation Maximisation (EM) Algorithm

R/data.R In multimix: Fit Mixture Models Using the Expectation Maximisation (EM) Algorithm

Try the multimix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multimix
Fit Mixture Models Using the Expectation Maximisation (EM) Algorithm

R/data.R
In multimix: Fit Mixture Models Using the Expectation Maximisation (EM) Algorithm