Description Usage Arguments Details Value Author(s) Examples
The functions tupleFreqs
and discparcoord
are
the workhorse functions in the
package, calculating frequency counts to be used in the graphs and
displaying them.
1 2 3 4 5 6 7 | tupleFreqs(dataset,k=5,NAexp=1.0,countNAs=FALSE,saveCounts=FALSE,
minFreq=NULL,accentuate=NULL,accval=100)
clsTupleFreqs(cls=NULL, dataset, k=5, NAexp=1, countNAs=FALSE)
discparcoord(data, k=5, grpcategory=NULL, permute=FALSE,
interactive = TRUE, save=FALSE, name="Parcoords", labelsOff=TRUE,
NAexp=1.0,countNAs=FALSE, accentuate=NULL, accval=100, inParallel=FALSE,
cls=NULL, differentiate=FALSE, saveCounts=FALSE, minFreq=NULL)
|
data |
The data, in data frame or matrix form. |
k |
The number of tuples to return. These will be the |
grpcategory |
Grouping column/variable. |
permute |
If TRUE, randomly permute the columns before plotting. |
interactive |
If TRUE, use interactive plotting, allowing for interactively readjusting column order and scrubbing/brushing. |
save |
If this is TRUE and interactive mode is on, saved plot will be available from the browser. |
name |
The name for the plot. |
labelsOff |
If TRUE, labels are off. This only comes into effect when interactive=FALSE. |
NAexp |
Scale for NA counts. |
countNAs |
If TRUE, count NA values. |
accentuate |
Character expression specifying the property to accentuate. |
accval |
Value to accentuate. |
inParallel |
If TRUE, calculate tuple frequencies in parallel. |
differentiate |
If TRUE, randomize coloring to differentiate overlapping lines. |
saveCounts |
If TRUE, save the tuple counts to the file ‘tupleCounts’. |
minFreq |
The smallest frequency to be displayed. |
dataset |
The dataset to process, a data frame or data.table. |
cls |
Cluster to be used if |
Tuple tabulation is performed by tupleFreqs
, or in large
cases, in parallel by clsTupleFreqs
. The display is done by
discparcoord
.
The k
most- or least-frequent tuples will be reported,
with the latter specified via negative k
. Optionally,
tuples with NA values will count less, but weigh toward
everything that has existing numbers in common with it.
If continuous variables are present, then in most cases, either
convert to discrete using discretize
or use
freqparcoord.
The data will be converted into a data.table if it is not already in
that form. For this and other reasons, it is advantageous to have the
data in that form to begin with, say by using data.table::fread
to read the data.
Optionally, tuples that partially match a full tuple pattern except for NA
values will add a partial count to the frequency count for the full
pattern. If for instance the data consist of 8-tuples and a row in the
data matches a given 8-tuple pattern in 7 of 8 components, this row
would add a count of 7/8 to the frequency for that pattern. To reduce
this weight, use a value greater than 1.0 for NAexp
. If that
value is 2, for example, the 7/8 increment will be 7/8 squared.
The functions tupleFreqs
and clsTupleFreqs
return an
object of class c('pna','data.frame')
, with each row
consisting of a tuple and its count. In addition the object will
have attributes k
and minFreq
.
The function discparcoord
returns an object of class
c('plotly','htmlwidget')
. Printing the object causes display
of the graph.
Norm Matloff <matloff@cs.ucdavis.edu>, Vincent Yang <vinyang@ucdavis.edu>, and Harrison Nguyen <hhnguy@ucdavis.edu>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ## Not run:
data(Titanic)
# Find frequencies in parallel
discparcoord(Titanic, inParallel=TRUE)
## End(Not run)
## Not run:
data(hrdata)
input1 = list("name" = "average_montly_hours",
"partitions" = 3, "labels" = c("low", "med", "high"))
input = list(input1)
# this will discretize the data by partitioning average monthly
# hours into 3 parts called low, med, and high
hrdata = discretize(hrdata, input)
print('first few discretized tuples')
# first line should be 0.38,0.53,2,low,3,0,1,00,sales,low
head(hrdata)
print('first few most-frequent tuples')
# first line should be 0.40,0.46,2,...,11
tupleFreqs(hrdata,saveCounts=FALSE)
# account for NA values and plot with parallel coordinates
discparcoord(hrdata)
# same as above, but with scrambled columns
discparcoord(hrdata, permute=TRUE)
# same as above, but show top k values
discparcoord(hrdata, k=8)
# same as above, but group according to profession
discparcoord(hrdata, grpcategory="sales")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.