knitr::opts_chunk$set(comment = NA)
You will analyse the storms
table which comes with the tidyverse
package.
Make sure you put library( tidyverse )
in the R chunk at the top of your R Markdown file as shown here below:
library( tidyverse )
After the library has been loaded you will have access to the table in the storms
variable.
Each row of storms
table is an observation of a storm recorded at a certain moment (date and time) at a geographical location (lat
, long
). Some additional storm features (wind
speed, pressure
, ...), classifications (status
, category
) and a name
are also included.
For more details you may consult the help on storms tibble with ?storms
but the following column description is sufficient for the SSA:
name
: Name of the storm.year
, month
, day
, hour
: Date and time of the observation.lat
, long
: Geographical location of the storm centre (numbers).wind
: Wind speed (number, in knots).pressure
: Pressure at the storm's centre (number, in millibars).tropicalstorm_force_diameter
(or ts_diameter
in older versions of tidyverse
library): Storm diameter (number, in nautical miles).status
: Storm classification (a factor, many levels).category
: Storm category (a number, range: -1..5; many values are missing).Note, that a single storm is usually observed multiple times (so one storm may be described in multiple rows).
Here is a random part of the table (some columns are omitted):
set.seed(1234L) bind_rows( storms %>% filter( status == "hurricane" ) %>% sample_n( 3L ), storms %>% filter( status != "hurricane" ) %>% sample_n( 3L ) ) %>% select( name, year, month, lat, long, status, category, wind, pressure ) %>% arrange( year, month )
Out of all storm measurements with non-missing category
value, calculate the percentage of the storm observations that have category
at least 4
. Find how to use round
to round the result to 2 decimal places. Assign the result to the largeCategoryPercentage
variable.
# largeCategoryPercentage <- ...
# 1p the condition >= (at least 4) is correct # 1p the number of observations with category known correct # 1p the percentage correct # 1p the rounding correct largeCategoryPercentage <- ( storms %>% filter( !is.na( category ), category >= 4 ) %>% nrow() ) / ( storms %>% filter( !is.na( category ) ) %>% nrow() ) * 100 largeCategoryPercentage <- round( largeCategoryPercentage, 2 ) largeCategoryPercentage
Take the data from the status
column and change the order of levels such that the first three levels are ("tropical storm", "tropical depression", "hurricane")
(in exactly this order).
Then, produce a table of counts of the number of observations for each storm status
level.
Store the result in statusCounts
variable.
Note: Do not modify the original storms
table (a changed table may not work in other questions).
# statusCounts <- ...
# 1p some fct levels reordering is done # 1p the order of levels is correct # 1p the table of counts is correct statusCounts <- storms$status %>% fct_relevel( "tropical storm", "tropical depression", "hurricane" ) %>% fct_count() statusCounts
Create a list with some summaries of the storms
table and assign this list to the variable stormsSummary
. The list should have the following three elements:
obsNum
-- the number of observations in the storms
table,avgWind
-- the mean of observed wind
speeds (force removal of missing values),uniqueNames
-- a character vector of names from the name
column with duplicates removed, sorted in alphabetical order.# stormsSummary <- ...
# 1p there is a list # 1p elements in the list have names # 1p obsNum is correct (nrow) # 1p mean is calculated # 1p NAs are skipped in mean calculation # 1p names are uniqued # 1p names are sorted stormsSummary <- list( obsNum = nrow( storms ), avgWind = mean( storms$wind, na.rm = TRUE ), uniqueNames = storms$name %>% unique() %>% sort() ) stormsSummary
Create a new tibble stormsNoSummer
that contains all observations from storms
except those that were made in a summer. Consider 21st of June to be the first day of summer and 22nd of September to be the last day of summer.
# stormsNoSummer <- ...
# 1p any filtering is done # 1p the filtering is correct for months < 6 # 1p the filtering is correct for months == 6 # 1p the filtering is correct for months == 7,8 # 1p the filtering is correct for months == 9 # 1p the filtering is correct for months > 9 stormsNoSummer <- storms %>% filter( month < 6 | ( month == 6 & day < 21 ) | ( month == 9 & day > 22 ) | month > 9 ) stormsNoSummer
Build a tibble reporting the fastest wind and the lowest pressure observed over all years in each month
. Report also the total number of observations for each month
. During the min/max calculations force omitting possible missing values in the respective columns.
The final table should have four columns: month
, fastestWind
, lowestPressure
, obsNum
and it should be sorted in descending order of the number of observations (the most frequent at the top row). Store the result in the variable stormsByMonth
.
# stormsByMonth <- ...
# 1p grouping is good # 1p obsNum is correct # 1p lowestPressure is correct (NAs removed) # 1p fastestWind is correct (NAs removed) # 1p the table is sorted # 1p the table is sorted in descending order stormsByMonth <- storms %>% group_by( month ) %>% summarise( fastestWind = max( wind, na.rm = TRUE ), lowestPressure = min( pressure, na.rm = TRUE ), obsNum = n() ) %>% arrange( desc( obsNum ) ) stormsByMonth
Create a tibble stormsByStatusAndMonth
that contains a cross-tabulation of status
and month
. The result should be a table with status
represented by rows, month
in columns, and table values representing the number of observations for each combination of month
and status
values. Some entries in the crosstable will be NA
: check the manual and fill them with zeros.
# stormsByStatusAndMonth <- ...
# 1p counting is correct # 1p spreading is used # 1p spreading is correct # 1p NAs are replaced by zeroes stormsByStatusAndMonth <- storms %>% count( status, month ) %>% pivot_wider( names_from = month, values_from = n, values_fill = 0L ) #spread( month, n, fill = 0L ) stormsByStatusAndMonth
Wind speed in the wind
column is given in knots. Create a new column windKPH
that expresses wind speed in km/h (1 knot = 1.852 km/h). Then, create a new column windCategory
that contains a factor with levels "low"
, "medium"
, "high"
(exactly in that order). The levels should be determined by the windKPH
column values: "low"
for windKPH
< 75, "medium"
for windKPH
< 150, and "high"
otherwise. The final table should only have columns: name
, windCategory
and windKPH
(exactly in this order). Store the result in the variable stormsWithWindCategory
.
# stormsWithWindCategory <- ...
# 1p windKPH column added # 1p windKPH is correct # 1p windCategory column added # 1p windCategory at least one category has correct condition # 1p windCategory all categories have correct conditions # 1p windCategory is a factor # 1p windCategory has correct levels # 1p the table has correct columns # 1p the table has correct column order stormsWithWindCategory <- storms %>% mutate( windKPH = wind * 1.852 ) %>% mutate( windCategory = case_when( windKPH < 75 ~ "low", windKPH < 150 ~ "medium", TRUE ~ "high" ) %>% factor( levels = c( "low", "medium", "high" ) ) ) %>% select( name, windCategory, windKPH ) stormsWithWindCategory
Based on the storms
tibble create a box plot:
pressure
.aes(...)
instead of wind
use factor(wind)
(to make wind
a categorical variable).gray
box fill and blue
colour."Pressure [millibars]"
and horizontal to "Wind speed [knots]"
.# ggplot( ... ) + ...
# 1p geom_boxplot is created # 1p the vertical axis is correct # 1p the horizontal axis is correct # 1p both titles are correct # 1p the theme is correct # 1p the colour is set # 1p the fill is set ggplot( storms ) + aes( x = factor( wind ), y = pressure ) + geom_boxplot( fill = "gray", color = "blue" ) + labs( y = "Pressure [millibars]", x = "Wind speed [knots]" ) + theme_bw()
For this scatter plot take from storms
only the rows with a missing tropicalstorm_force_diameter
(or ts_diameter
) value. Use long
for the horizontal axis and lat
for the vertical. Use transparency level of 0.5 and point size of 0.75. Colour points according to wind
. Finally, use the colour scale with green
for low and red
for high wind
values.
# ggplot( ... ) + ...
# 1p geom_point is used # 1p rows with missing ts_diameter are selected # 1p the horizontal axis is correct # 1p the vertical axis is correct # 1p the transparency is set # 1p the point size is set # 1p wind is used for colour in aes # 1p the colour scale is set filteredStorms <- storms %>% filter( is.na( tropicalstorm_force_diameter ) ) ggplot( filteredStorms ) + aes( x = long, y = lat, color = wind ) + geom_point( alpha = 0.5, size = 0.75 ) + scale_color_gradient( low = "green", high = "red" )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.