Cloud Cover in the Pacific Arctic Region

Aandishah Tehzeeb Samara, Serina Khalifa, Clare Gaffey

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
# library(asc346)

_Summary

As climate change causes rapid warming across the world, the effect is the direst in the Arctic which is warming twice as fast as the rest of the world. It is predicted that by the year 2050, Arctic summers will be sea ice free. The complexity of the Earth’s climate system and its feedback entails that the effects of melting of sea ice have coupled effects in the hydroclimate. As bright sea ice melts exposing the dark deep ocean underneath, the surface transforms from a high albedo surface to a low albedo surface. This change in albedo is the driver for the increased warming in the Arctic. With more open ocean exposed, combined with the warmer temperatures, the surface is more prone to evaporation. Increase in ocean water evaporation leads to more water vapor available for cloud formation. Cloud cover is one of the largest barriers to creating accurate future climate models, since their impacts on the hydroclimate are difficult to predict and account for. Clouds, with their high albedo, reflect more sunlight, but also act as barrier that traps heat inside the Earth’s surface. In the Arctic especially, clouds play different roles during different season as the type of clouds change seasonally. These changes in cloud structure impact their role in the feedback systems. By analyzing the vertical structure of the clouds in the Arctic, it would give us insight as to how the changes in climate change and sea ice have impacted it.

_(1) What influences different altitude cloud cover in the Pacific Arctic? Does this relationship change seasonally?

_(2) How has the Pacific Arctic Region Cloud Cover changed over time?

_(3) How has the energy budget changed over time?

Approach and Method: An outline of the analytical methods and code you plan to use, including the names of key packages that you will draw on. This section should be composed of the following sub-sections:

The most crucial packages for this project will be raster, ncdf4, trend, and XGBoost.

https://cran.r-project.org/web/packages/trend/index.html https://cran.r-project.org/web/packages/ncdf4/index.html https://cran.r-project.org/web/packages/xgboost/index.html

The ncdf4 package will be used to read, slice and write the NARR and chlorophyll data. Raster will be used to plot the data outputs from NARR.

The trend package will be used to run the non-parametric trend tests and change point detections. In particular, the functions that will be used are as follows:

library(here) # set up working direcoty
here::here() # set working directory for images
knitr::include_graphics(here("images/trend_tests.png"))

To rank influencing factors of low altitude cloud cover, a gradient boosting regression model will be run for all of the available data, as well as separately run per season (JJA, SON, DJF, MAM as summer-spring). The XGBoost package will be used for this step. The analysis will follow the 80-10-10 procedure, using 80% of the data for training the model, 20% for testing. An accuracy assessment will be conducted for the models using the mean absolute error metric.

Data: A brief (~250 words) description and visualization of the datasets you will be using. That means spatial plots of the main datasets and their key values, and, as a bonus, a plot of summary statistics, e.g. a histogram or boxplot of one of the more importants variables in the dataset.

The primary dataset used in this analysis will be cloud cover at three altitude levels (low, medium, and high). These datasets will be downloaded as NetCDFs from NARR and will include auxiliary datasets for the regression model. The additional NARR auxiliary variables include wind speed at 10m, air temperature, evaporation, geopotential height, and relative humidity. The NARR datasets are available from 1979-2021. Additional datasets used will be monthly MODIS (2002-2021) chlorophyll-a concentrations and monthly composited SB2 sea ice concentration available from 1978-2020.

# created the netcdf and flipped it using the https://rpubs.com/boyerag/297592
# additional netcdf tasks shown in https://pjbartlein.github.io/REarthSysSci/netCDF.html#get-a-single-time-slice-of-the-data-create-an-r-data-frame-and-write-a-.csv-file
library(sp) # package for spatial manipulation
library(ncdf4) # package for netcdf manipulation
library(raster) # package for raster manipulation
library(rgdal) # package for geospatial analysis
library(trend) #package for running trend analysis
library(ggplot2) # package for plotting
library(tidyverse) # to manipulate for plotting

Download the cloud dataset

destfile <- here::here("external/data/hcdc.mon.mean.nc")
url <-  "https://downloads.psl.noaa.gov/Datasets/NARR/Monthlies/monolevel/hcdc.mon.mean.nc"
#this should work but didn't:
#nc_data <- download.file(url = url, destfile = destfile) 
# Work around to download the data: 
browseURL(url) 

Unpack the netcdf

#Once you downloaded the data, change this to match the data location
ppath <- here::here("external/data/hcdc.mon.mean.nc")
nc_data <- nc_open(ppath) 
nc_data # take a look at the file


# assign variables to objects
lon <- ncvar_get(nc_data, "x")
lat <- ncvar_get(nc_data, "y", verbose = F)
highCC <- ncvar_get(nc_data, "hcdc")
t <- ncvar_get(nc_data, "time_bnds")
#^ this also worked t <- ncvar_get(nc_data, "time")

Get some information from the High Cloud Cover NetCDF

# get summary statistics for high cloud fraction area for all time periods
summary(highCC)

#find out the lengths and dimensions of each variable
vars <- list(lon, lat, highCC, t)
x <- lapply(vars, dim)
x

# Let's see what is the visual spread of our data.
# To do so, we need to create a dataframe from the nc.
# first, reshape to a matrix
# matrix (nlon*nlat rows by 2 cols) of lons and lats
lonlat <- as.matrix(expand.grid(lon,lat))

# get a single slice of layer
m <- 1
tmp_slice <- highCC[,,m]
tmp_vec <- as.vector(tmp_slice)
length(tmp_vec)

# create dataframe and add names 
df <- data.frame(cbind(lonlat,tmp_vec))
names(df) <- c("lon","lat", paste("hcdc", as.character(m), sep="_"))

# take a look at our new dataframe
head(na.omit(df), 10)

#convert to data matrix
dm <- data.matrix(df)
head(dm)

ts(data = dm, start = 1, end = numeric(), frequency = 1,
   deltat = 1, ts.eps = getOption("ts.eps"), class = , names = )

Box Plot

# boxplot at month #1
df %>% ggplot() + geom_boxplot(aes(x = lat, y = hcdc_1)) + xlab(NULL) + 
  ylab("High Cloud Area Fraction") + ggtitle("The First Month")
knitr::include_graphics(here("images/boxplot.png"))

Prepare for plotting

#Enter Path to data below
# The data cannot be saved in the github repo
# To download the 
ppath <- here::here("external/data/") # point to dataset folder
ncpath <- ppath

#High Cloud Area Fraction
ncname1 <- "hcdc.mon.mean"
ncfname1 <- paste(ncpath, ncname1, ".nc", sep="")
dname1 <- "hcdc"

# open a netCDF file
ncin <- nc_open(ncfname1)

#grab time slice from raster brick instead of array:
var.nc1<-brick(ncfname1,varname=dname1)
var.nc1
plot(var.nc1[[1:12]]) #plot first 12 maps

m <- 1 #which time slice do we want to view (can use this to create a LOOP later) 1-504
#subset extracts a single layer from the raster brick
tmp_slice_r<-subset(var.nc1,m)
dim(tmp_slice_r)
plot(tmp_slice_r)

#create color palettes:
temp.palette <- rev(colorRampPalette(c("darkred","red","orange",
                                       "lightyellow","white","azure",
                                    "cyan","blue","darkblue"))(100))

TIME <- as.POSIXct(substr(var.nc1@data@names, start=2, stop=20), format="%Y.%m.%d")

#Create a title for plot: take TIME[m] string and how many characters from the left to keep in title? 7 or 10
ttl <- paste(dname1,"_", substr(TIME[m], 1, 7),sep="")

#test it
spplot(tmp_slice_r,  main = ttl, col.regions=temp.palette)

tmp_slice_dm <- data.matrix(var.nc1)


myraster.mk <- calc(myraster,function(x){MannKendall(rev(x))$sl})

#save it
writeRaster(tmp_slice_r, ttl, "IDRISI", overwrite=TRUE)
knitr::include_graphics(here("images/prelim_subplots.png"))
knitr::include_graphics(here("images/hcdc2021_09.png"))
Code: A bullet point summary of the analysis and coding approach that you propose to follow. For teams, this section should include a description of which member will be responsible for each bullet point.

Step 1. Create a shapefile to map the Pacific Arctic Region [Serina], Get the NARR [Aandishah], Chlorophyll and Sea Ice Concentration data [Clare]

Step 2. Crop NARR [Aandishah], sea ice and chlorophyll [Clare] data to match extent. Detrend the data to remove changes from other external forcing. [Aandishah]

Step 3. Reproject [Aandishah] and resample the Chlorophyll and sea ice data to match the extent of the NARR Data [Clare]

Step 4. Aggregate all the data monthly and seasonally [Clare] [Aandishah]

Step 5. Create a time series Linear Regression Maps for each month and season for each variable (1979-2020). [Aandishah], Calculate the energy budget using the SW and LW [Serina]

Step 6. Create Correlation Graphs between each variable and sea ice [Aandishah], XGBoost for the Cloud Cover [Clare]

Step 7. Extract values of the average of each variable from within the DBO sites and plot Line graphs to understand the change over time. [Aandishah]

Step 8. Create the Visualization of all the maps and make them a e s t h e t i c [Aandishah, Clare and Serina]

Timelines: Provide a timeline for when each portion of the analysis will be completed. These timelines should be constructed relative to the time period of presentations (during the last two weeks of class) and final project submission (during exam week). For teams, names should be associated with each step on the timeline.
knitr::include_graphics(here("images/timeline.png"))


agroimpacts/Artic-Cloud-Change-Dynamics documentation built on Jan. 1, 2022, 9:18 p.m.