View source: R/visualization.R
plot_distr | R Documentation |
This function plots distributions of items (a bit like an histogram) which can be easily conditioned over.
plot_distr(
fml,
data,
moderator,
weight,
sorted,
log,
nbins,
bin.size,
legend_options = list(),
top,
yaxis.show = TRUE,
yaxis.num,
col,
border = "black",
mod.method,
within,
total,
mod.select,
mod.NA = FALSE,
at_5,
labels.tilted,
other,
cumul = FALSE,
plot = TRUE,
sep,
centered = TRUE,
weight.fun,
int.categorical,
dict = NULL,
mod.title = TRUE,
labels.angle,
cex.axis,
trunc = 20,
trunc.method = "auto",
...
)
fml |
A formula or a vector. If a formula, it must be of the type:
|
data |
A data.frame: data set containing the variables in the formula. |
moderator |
Optional, only if argument |
weight |
Optional, only if argument |
sorted |
Logical: should the first elements displayed be the most frequent? By default this is the case except for numeric values put to log or to integers. |
log |
Logical, only used when the data is numeric. If |
nbins |
Maximum number of items displayed. The default depends on the number of moderator cases. When there is no moderator, the default is 15, augmented to 20 if there are less than 20 cases. |
bin.size |
Only used for numeric values. If provided, it creates bins of
observations of size |
legend_options |
A list. Other options to be passed to |
top |
What to display on the top of the bars. Can be equal to "frac" (for
shares), "nb" or "none". The default depends on the type of the plot. To disable
it you can also set it to |
yaxis.show |
Whether the y-axis should be displayed, default is |
yaxis.num |
Whether the y-axis should display regular numbers instead of
frequencies in percentage points. By default it shows numbers only when the data
is weighted with a different function than the sum. For conditionnal distributions,
a numeric y-axis can be displayed only when |
col |
A vector of colors, default is close to paired. You can also use “set1” or “paired”. |
border |
Outer color of the bars. Defaults is |
mod.method |
A character scalar: either i) “split”, the default for
categorical data, ii) “side”, the default for data in logarithmic form
or numeric data, or iii) “stack”. This is only used when there is more
ù than one moderator. If |
within |
Logical, default is missing. Whether the distributions should be
scaled to reflect the distribution within each moderator value. By default it
is |
total |
Logical, default is missing. Whether the distributions should be
scaled to reflect the total distribution (and not the distribution within each
moderator value). By default it is |
mod.select |
Which moderators to select. By default the top 3 moderators in terms of frequency (or in terms of weight value if there's a weight) are displayed. If provided, it must be a vector of moderator values whose length cannot be greater than 5. Alternatively, you can put an integer between 1 and 5. This argument also accepts regular expressions. |
mod.NA |
Logical, default is |
at_5 |
Equal to |
labels.tilted |
Whether there should be tilted labels. Default is |
other |
Logical. Should there be a last column counting for the observations
not displayed? Default is |
cumul |
Logical, default is |
plot |
Logical, default is |
sep |
Positive number. The separation space between the bars. The scale depends on the type of graph. |
centered |
Logical, default is |
weight.fun |
A function, by default it is |
int.categorical |
Logical. Whether integers should be treated as categorical variables. By default they are treated as categorical only when their range is small (i.e. smaller than 1000). |
dict |
A dictionnary to rename the variables names in the axes and legend.
Should be a named vector. By default it s the value of |
mod.title |
Character scalar. The title of the legend in case there is a
moderator. You can set it to |
labels.angle |
Only if the labels of the x-axis are tilted. The angle of the tilt. |
cex.axis |
Cex value to be passed to biased labels. By defaults, it finds automatically the right value. |
trunc |
If the main variable is a character, its values are truncaded to
|
trunc.method |
If the elements of the x-axis need to be truncated, this is the truncation method. It can be "auto", "right" or "mid". |
... |
Other elements to be passed to plot. |
Most default values can be modified with the function setFplot_distr
.
This function returns invisibly the output data.table containing the processed data
used for plotting. With the argument plot = FALSE
, only the data is returned.
Laurent Berge
To plot temporal evolutions: plot_lines
. For boxplot: plot_box
.
To export graphs: pdf_fit
, png_fit
,
fit.off
.
# Data on publications from U.S. institutions
data(us_pub_econ)
# 0) Let's set a dictionary for a better display of variables
setFplot_dict(c(institution = "U.S. Institution", jnl_top_25p = "Top 25% Pub.",
jnl_top_5p = "Top 5% Pub.", Frequency = "Publications"))
# 1) Let's plot the distribution of publications by institutions:
plot_distr(~institution, us_pub_econ)
# When there is only the variable, you can use a vector instead:
plot_distr(us_pub_econ$institution)
# 2) Now the production of institution weighted by journal quality
plot_distr(jnl_top_5p ~ institution, us_pub_econ)
# You can plot several variables:
plot_distr(1 + jnl_top_25p + jnl_top_5p ~ institution, us_pub_econ)
# 3) Let's plot the journal distribution for the top 3 institutions
# We can get the data from the previous graph
graph_data = plot_distr(jnl_top_5p ~ institution, us_pub_econ, plot = FALSE)
# And then select the top universities
top3_instit = graph_data$x[1:3]
top5_instit = graph_data$x[1:5] # we'll use it later
# Now the distribution of journals
plot_distr(~ journal | institution, us_pub_econ[institution %in% top3_instit])
# Alternatively, you can use the argument mod.select:
plot_distr(~ journal | institution, us_pub_econ, mod.select = top3_instit)
# 3') Same graph as before with "other" column, 5 institutions
plot_distr(~ journal | institution, us_pub_econ,
mod.select = top5_instit, other = TRUE)
#
# Example with continuous data
#
# regular histogram
plot_distr(iris$Sepal.Length)
# now splitting by species:
plot_distr(~ Sepal.Length | Species, iris)
# idem but the three distr. are separated:
plot_distr(~ Sepal.Length | Species, iris, mod.method = "split")
# Now the three are stacked
plot_distr(~ Sepal.Length | Species, iris, mod.method = "stack")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.