panelDesc: micromapST panel description data.frame structure

panelDescR Documentation

micromapST panel description data.frame structure

Description

The panelDesc data.frame provides the micromapST function with the information required to process the statsDFrame data and panelData data.frames and to generate the required linked micromap plot.
It specifies which columns in the statsDFrame data.frame contain the data for each glyph column, the column types, labels, reference values and text, and when more complex data is needed by a glyph (boxplot and time series) what the name of the data structure..

  Example
    panelDesc = data.frame(
        type=c("mapcum","id","dotconf","dotconf"),
        lab1=c("","","White Males","White Females"),
        lab2=c("","","Rate and 95% CI","Rate and 95% CI"),
        lab3=c("","","Deaths per 100,000","Deaths per 100,000"),
        col1=c(NA,NA,"Rate",9), 
        col2=c(NA,NA,4,11),
        col3=c(NA,NA,5,12),
        colSize=c(NA,NA,5,5),
        refVals=c(NA,NA,NA,wflungUS[,1]),
        refTexts=c(NA,NA,NA,"US Rate"),
        panelData=c("","","","")
     

The panelDesc data.frame (which does not have to be named "panelDesc", any name will do) provides the means of defining how many columns to create, the type of glyph per column, where the data required by the glyph is located in the statsDFrame (column number or name) or the name of a supplemental data structure when the glyph is boxplots or time series (via the panelData list entry), the column titles, and the column's reference value and label for the link micromap generation.

In the following description the term "AREA" represents the geographic unit being mapped and associated with data in the statsDFrame. The naming used must match the border group specified. If the border group of "USStatesDF" is used, the areas are U.S. States and DC and 51 data rows must be present. If the border group of "USSeerDF" is used, the areas are U.S. Seer areas as defined by NCI and the number of data rows can be 9, 11, 13, 17 or 18. In all cases, the abbreviations and names defined in the border group dataset must be used in preparing the statsDFrame and panelData structures.

Glyph Types

The type vector defines the type of glyph to be used for each column. The available glyphs are:

Map types:

"map", "mapcum","maptail","mapmedian"

State or Area ID and/or Name:

"id"

Ranking:

"rank"

Graphical Type:

"dot", "dotse","dotconf", "dotsignf", "bar", "arrow", "ts", "tsconf","scatdot", "segbar", "normbar", "ctrbar", "boxplot"

The following provides a description of each panel type:

map

- US map with active areas colored

mapcum

- US map with active areas colored and previously active area highlighted generating an accumulation from top to bottom

maptail

- US map with active areas colored and previously active area highlighted until the median area, then the reverse to the end (areas that have not been active are highlighted.)

mapmedian

- US map with active areas colored. Maps above the median area have areas with values above the median highlighted. Maps below the median area have areas with values below the median highlighted. This helps define the above and below median area groups.

id

- generates a column with a colored identifier (a square) and the area or area name or abbreviation.

rank

- number the area in rank order, sequentially.

arrow

- an arrow between two values with a head.

bar

- a single bar chart.

boxplot

- a boxplot per area with box, upper and lower whiskers and outliers.

dot

- a dot for a single value.

dotse

- a dot for a single value and its standard error.

dotconf

- a dot for a single value and its confidence interval.

dotsignf

- a dot for a single value with an indicator of its significants.

ts

- a time series line for up to 30 sets "x" and "y" values for each area. The TS glyph can have X-Axis labels formated as numbers or dates.

tsconf

- a time series line for a up to 30 sets of "x", "y" and upper "y" and lower "y" values as a confidence interval band for each area. The TSConf glyph can have X-Axis labels formated as numbers or dates.

segbar

- a horizontal stacked (segmented) bar plot starting at 0 for 2 to 9 bars.

normbar

- a stacked bar plot where the data is normalized for each area by dividing the bar segment values by the sum of the values for all of the bars. Up to 9 bars are supported.

ctrbar

- a stacked bar plot where the bar segments are centered around the 0.Up to 9 bars are supported.


scatdot

- a set of points for each area with an "x" and "y" value.

Labels (Column Headers and Footers)

micromapST supports up to 3 column labels or titles: lab1, lab2 and lab3, where lab1 and lab2 are header titles for the column. lab3 is the footer title for the column. All titles are optional. lab3 is used to indicate the unit of measure at the bottom of the columns, but is not limited to this use. For example:

     lab1=c("Col1-Title", "Col2=Title", "Col3-Title" ) # 1st title for columns
     lab2=c("Col1-Sub",   "Col2-Sub",   "Col3-Sub"   ) # 2nd title for columns
     lab3=c("Col1-Footer","Col2-Footer","Col3-Footer") # Footer title for columns
    

lab4 is used only when time series or scatter dot glyphs are used to provide a Y axis title for the column. All label-title vectors are optional and only required when an title or label is needed.

Data References

Depending on the type of glyph selected for the column, 1 to 3 data values for each area may be required: The col1, col2 and col3 vectors serve as indexes to columns in the statsDFrame data.frame passed in the arguments of the micromapST function call. The values can be either the numeric number of the row in statsDFrame data.frame or the column name. If no index is required, the entry should be set to NA.

If the glyph requires one value, then only the col1 index is used and the col2 and col3 indexes are set to NA if present . If 2 values are required, then col1 and col2 indexes are used and the col3 index is set to NA, if present. If 3 values are required, then col1, col2, and col3 indexes are used.

The statsDFrame column indexes can be provided as an integer or the column name. If the integer value is less than 1 or greater than the number of columns in statsDFrame or a column name is used that does not exist in statsDFrame, the micromapST function will stop and generate an error message.

Glyph Meaning col1 col2 col3 panelData
Name
arrow Arrow Beginning Ending Values NA NA
Values (arrow head)
bar Horizontal Bar end NA NA NA
bar values
(length)
segbar Horizontal Values for Values for NA NA
stacked first (left the last
bar -most) segment (right-most)
(length) bar segment
(length)
normbar Horizontal Values for Values for NA NA
stacked first (left- last (right-
bar, nor- most) bar most,bar
malized to segment segment
total 100% (length) (length)
ctrbar Horizontal Values for Values for NA NA
stacked first (left- last (right-
bar, cen- most) bar most,bar
tered on segment segment
the middle (length) (length)
bar
boxplot Horizontal NA NA NA Name of
box plot output
list from
call to
boxplot(...plot=F)
dot Dot Values for NA NA NA
dots
dotconf Dot with Values Values of Values for NA
confidence for dots lower limits upper limits tab
interval
line
dotse Dot with Values for Standard NA NA
line length dots errors
+/- standard
error
dotsignf Dot Values for P value NA NA
overprinted dots associated
if not with dot
significant
scatdot Scater plot Values on Values on NA NA
of dots horizontal vertical
(x) axis (y) axis
ts Time Series NA NA NA Name of array
(line) plot with dimensions
of c(51,t,2),
where t = #
of time points
(max 15), x values
in [,,1], y values
in [,,2]
tsconf Time Series NA NA NA Name of array
(line) plot with dimensions
with confidence of c(51,t,4), as ts
limits lower limit is
[,,3] amd the
upper limit is
[,,4]

The panelData data.frame is only used when a glyph requires more data per area than can be provided by the statsDFrame columns. Only glyphs using this vector are boxplots and time series.

In the case of the boxplot glyph, the boxplot function with plot=F is used to generate the boxplot statistical details for each area. The name of the resulting list of 51 sets of boxplot statistics (one for each area) is placed in the panelData data.frame element for the boxplot column.

For the time series and time series with confidence interval, the glyphs require a 3 dimensional array of data. The first dimension ([area,,]) represents the areas. The second dimension ([,t,]) ranges from 2 to n. There is no upper limit, but 200-250 samples is a practical limit. One for each data point. The third dimension ([,,v]) provides the values at data point t for area st. [,,1] is the x axis value. For time series, is usually just the value 1 to n to order the y values. [,,2] is the median y value. For time series with confidence intervals: [,,3] is the lower value y and [,,4] is the upper value y.

Reference Lines

Reference lines can be created in arror, bar, dot, dotconf, dotse, and segbar glyphs by specifying the reference values in the RefVal= vector. A label appearing at the bottom of the column can be specified using the RefTxt= vector in the panelDesc data.frame.

Format

The parameters in the panelDesc data.frame structure are:

type=

The types of graphics for each column of panels can be specified by the following keywords in the "type variable":

The following are the type of glyphs that can be specified in the type vector:

Map types:

"map", "mapcum","maptail","mapmedian"

State ID and/or Name:

"id"

Glyph Type:

"dot", "dotse","dotconf", "dotsignf", "bar", "arrow", "ts", "tsconf","scatdot", "segbar", "normbar", "ctrbar", "boxplot"

The following provides a description of each panel type:

map

- a non-highlighted map

mapcum

- maps show the accumulated areas top to bottom

maptail

- maps show the accumulated areas from the top and bottom toward median area.

mapmedian

- the maps above the median highlight the areas above the median area and maps below the median highlight areas below the median area based on the sorting variable.

id

- generates a column with a color identifier (a filled in square) and the area abbreviation or name. The plotNames parameter in the micromapSEER call controls whether the area's full name or 2 character abbreviation is displayed.

rank

- sequentially number areas from 1 (highest rank) to "n" (lowest rank)

arrow

- an arrow from value 1 to value 2 with value 2 the head of the arrow.

bar

- a bar for a single set of values, The values can be positive or negative.

boxplot

- a boxplot for each area using a data.frame generated by the boxplot function with plot=F. The name of the boxplot data.frame is passed to micromapSEER using the panelData vector.

dot

- a dot for a single value using one set of values.

dotse

- a dot for a single value and its standard error using two values.

dotconf

- a dot for a single value and its confidence interval using three values.

dotsignf

- a dot for a single value overlaid if value is not significant using two values: value for dot and P value.

ts

- a time series line plot for each area. The glyph use the panelData vector to get the name of a three (3) dimensional array the data for the plot. The array contains one entry per area, 1 to 30+ data points and the x and y values. See section on panelData below for more details. A reasonable upper limit to the number of points is between 200-300. Only a few will be selected to be used as X-Axis labels. The format of the X-Axis label is controled by the "xIsDate" attribute on the array being set to TRUE. If the "xIsDate" attribute is not set to TRUE, the X-Axis will be formated as numeric and axisScaling can be preformed. If the "xIsDate" attribute is TRUE, the default date format of " or less than 90, a short date format will be used of " The x-axis date feature will override the specification of the axisScale call parameters on time series glyph columns.

tsconf

- a time series line and confidence interval band for each area. The glyph use the panelData vector to get the name of a three (3) dimensional array the data for the plot. The array contains one entry per area, 1 to n data points and the x, y, lower y and upper y values. See section on panelData below for more details. A reasonable upper limit to the number of points is between 200-300. The format of the X-Axis label is controled by the "xIsDate" attribute on the array. If the "xIsDate" is set to TRUE, the X-Axis values will be format using the default date format of " date format of " TRUE, the X-Axis will be formated as numeric and axisScaling can be preformed. The x-axis date feature will override the specification of the axisScale call parameters on time series glyph columns.

segbar

- a horizontal stacked (segmented) bar plot starting at 0 using data in the statsDFrame data.frame. The col1 and col2 columns are used to indicate the first and last columns in the statsDFrame data.frame that contain the contiguous bar segment values (lengths). For example: The data for a 5 segment bar glyph is in columns 4 through 8 in the statsDFrame (5 columns). col1 is set to 4 to identify the first column and col2 is set to 8 to identify the last column in the sequence. Column names may be used, but the column identified in col1 must preceed the column identified in col2.

normbar

- a stacked bar plot where the data is normalized for each area by dividing the bar segment values by the sum of the values for all of the bars. The stacked bar plot for each area then ranges from 0 to 100% (edge to edge). The col1 and col2 columns are used to identify the first and last columns for bar data in the statsDFrame in the same way as for the "segbar" glyph (see above.)

ctrbar

- a stacked bar plot where the bar segments are centered around the middle of the data. If there is an even number of segments, the 0 point is between the lower half and the upper half of the segments. If there is an odd number of segments, the center is the midpoint of the middle segment. The other segments are plotted to the left and right of the center point. The col1 and col2 columns are used to indicate the first and last columns in the statsDFrame data.frame that contain the contiguous bar segment values. (See "segbar" type above for more information.)

scatdot

- a set of 51 points with an x and y value per area. All points are plotted in each panel with the key areas in the panel highlighted. col1 indicates statsDFrame column containing the x values and col2 indicates the column containing the y values.

Example: type=c("id","map","rank", "boxplot") To specify a micromapSEER with three columns, left to right, containing the area label, a map and a boxplot.

col1=, col2=, col3=

Vectors of index numbers or names of columns in statsDFrame data.frame to be used as data for graphics. The uses of these three vectors are defined below:

any "map" type, id, boxplots, ts, and tsconf

glyphs do not use the col1, col2, or col3 vectors to locate data in the statsDFrame data.frame. If these vectors are present, the corresponding entires should be NA for the respective columns.

dot

uses col1 to specify a single data column in statsDFrame data.frame to be ploted.

bar

uses col1 to specify the data column in statsDFrame data.frame for the length of the bar. The data value can be positive or negative.

dotse

uses col1 and col2 to specify the data columns in statsDFrame data.frame to be used as the estimate and standard error values, respectively.

dotsignf

uses col1 and col2 to specify the data columns in statsDFrame data.frame to be used as the value for the dot and its associated P value.

arrow

uses col1 and col2 to specify the data columns in statsDFrame data.frame for the beginning and end values of the arrow.

segbar, normbar, ctrbar

uses col1 and col2 to specify the first and last columns in the statsDFrame data.frame. The statsDFrame data.frame columns from col1 to col2 are used for the length values of each bar in the glyph. col1 must preceed col2 in the statsDFrame data.frame. The minimum number of data columns is 2 columns with a maximum of 9 columns.

scatdot

uses col1 and col2 to specify the x and y values respectivefully for a dot for each of the 51 areas and DC in a scatter dot plot.

dotconf

uses col1, col2, and col3 to specify the data columns in statsDFrame data.frame for the estimate value, lower confidence interval, and upper confidence interval values.

See the table above.

colSize=

A numeric vector used to specify the proportional width size of a glyph column in relation to all other glyph columns. If used, values must be included for all glyph columns except for the map and id glyphs, which are fixed width columns. The width of a glyph column is determined by summing all of the colSize values and dividing the sum into the value for each glyph column to yield a percentage of the available width to be allocated to each column. For example: colSize=c(NA,NA,10,10,5,15), does not affect columns 1 and 2. The percentages for columns 3 through 6 are 25%, 25%, 12.5% and 37.5%. If 4 inches of space is available, the columns will be allocated: 1, 1, 0.5, and 1.5 inches. The column widths are still regulated by the minimum and maximum column widths set in the package. If a value is missing for non-map or id glyph, the package will a value equal to the average of the provided values.

lab1=, lab2=

Character vectors provide the two column labels (titles) lines at the top of each column. If no label is required, use "" for a blank line.

lab3=

Character vector used as a label at the bottom of each column. This is typically used to show units of measure. If no label is required, use "" for a blank line.

lab4=

Character vector used as the vertical (y) axis label for ts, tsconf, and scatdot glyphs. If no label is required, use "" for a blank line.

refVals=

Is a list of object names providing the reference values for each graphic column. The reference value is displayed as a dashed vertical line for each panel in the specified column.

refTexts=

Is a list of 1 or 2 labels to be displayed at the bottom of each column to identify the reference value.

panelData=

List of object names containing the boxplot data list and/or an array of time series data for each area. If boxplot and time series data are not used in a column, then associated object names should be NA.

For boxplot data, each row name in the boxplot list must be the area abbreviation (2 character) for the area associated with the data. There must be the same number of rows as in the name table and statsDFrame table. Each row must be data produced by the boxplot function. The area location identifier used in the statsDFrame data and must be placed in the boxplot$names (names) attribute for that set of boxplot data to be able to associate the individual boxplots to each area.

For the time series glyph (ts), the data must be a three (3) dimensional array. The first dimension [st,,] represent one entry for each area (1 to 51). The second dimension [,t,] indexes up to 30+ data points for the area. The third dimension [,,v] are the data point values at each data point. [,,var{1}] is the x value and [,,2] is the median y value for the data point. The rownames associated with the first dimension must be the area location ids used in the statsDFrame table to link the elements of this structure the presentation order of the areas.

For the time series with confidence intervals glyph (tsconf), the array is extended to include: [,,3] and [,,4] for the lower y and upper y values.

For time series data, the order of the first dimension of the array must match the area order in the statsDFrame. For example, the data in dataArray[1,,] is the the area identified in statsDFrame[1,]

The Date feature allows the caller to request the TS X-Axis labels be formated at dates. This requires the data in the TS array has valid date data as the X data. These are numbers based on 1970-1-1 being day zero in the computer calendar. There are many functions in R to convert to and from characters and date variable. In the past, before this feature, users had to do work-a-rounds by using year numbers or year and faction numbers. Once you have inserted the date X values into the array [,,1], modify the class of the array to add the "Date" class. micrpmapST will inspect the array and find the "Date" class, flag it for internal operations and remove it. The date format of " the date format will be changed to " The date feature is only available on the Time Series Glyphs.

If axisScale is set to "s" or "sn", they will be ignored for any TS glyph using the date feature.

Details

The panelDesc data.frame is used to describe the content of the micromapST plot to the function. It contains the index of the data in the statsDFrame data.frame, the types of graphics to be used in each column, titles, column headers, reference values and labels, etc.

Note

A descriptor may be omitted if none of the panel plots need it.

Author(s)

Daniel B. Carr, George Mason University, Fairfax VA, with contributions from Jim Pearson and Linda Pickle of StatNet Consulting, LLC, Gaithersburg, MD

See Also

micromapST


micromapST documentation built on Nov. 12, 2023, 5:06 p.m.