A LIM is constructed as a set of linear relationships between the various members of the ecosystem and attempts to model the flows between any two of these members. The equations constrain the magnitude of each flow through the ecosystem. These linear relationships are codified into three separate matrix equations: one describes the precise equality relationships
$A⋅x = b$ (1)
one describes equality relationships known with some confidence
$E⋅x = f$ (2a)
$E\cdot\sigma_x = \sigma_f$ (2b)
and one describes inequality relationships
$G⋅x ≥ h$ (3)
Due to the nature of most natural systems of interest to modellers, the number of flows is generally much larger than the number of equations constraining them. In order words the $dim(\vec{x}) > dim(\vec{b})+dim(\vec{f})+dim(\vec{h})$. Even if this is not strictly true, the presence of inequalities adds additional degrees of freedom.
Therefore LIM-problems are generally underconstrained with many degrees of freedom possible. The selection of a solution from the high-dimensional space of possible solutions perscribed by the underconstrained system of equations is an important step in the model pipeline. Early models faced the high-diensionality of the solution space with an analytical selection metric, namely the L$_2$-minimum norm.
The L2 minimum norm selects a single solution from the high-dimensional solution space based on the minimization of the sum of the flows through the system. While parsiminous, this selection criteria has been found to less than ideal in many cases for it push the inequality equations to their extremes (Stukel 2011). The strength of this selection metric is the speed at which the solution can be computed. Even for very large models with many degrees of freedom, this analytical selction criterion can be applied and a solution found within a few seconds on any modern computer. This efficiency allows the model to be run numerous times and to be expanded to encompass more flows than would be possible otherwise. The l2-minium norm cannot handle equation 2 values naturally and instead appends them to eq 1 so that the model is required to perfectly satisfy those observations.
The selection algorithm used in this model is not the L2 minimum norm and is instead a Markov Chain Monte Carlo (MCMC) technique. This method initally starts by calculating the l2-minimum norm, and then proceeds to stochastically search the solution space with a random walk and weighted by deviation of the model from the observed values of equation 2.
For CCELIM, the flows, x⃗, are in units of mg C per day per m2.
Eq. 1 ensures mass-balance to and from each ecosystem member since carbon is an exactly conserved quantity within the ecosystem. The linear relations of Eq. 2 contain data taken from both the field and laboratory. These include measured respiration rates, grazing rates, fecal pellet flux, mortality rates, vertical migration and estimates of carbon fixation. Since all of these flows are known with some confidence interval (SD), the model is permitted to deviate from these equalities. The degree to which the model over (under) estimates these values is based on a probabilistic formula and weighted by the measurement’s standard deviation (SD).
The inequalities of Eq. 3 constrain the model based on accepted physiological relations, which include respiration rates based on ingestion rate and biomass measurements, excretion based on grazing rates, assimilation efficiencies and gross growth efficiencies (GGE).
Eq. 1 and 3 together constitute an underdetermined system whereby an infinite number of valid x solutions exist. Using a Markov Chain Monte Carlo method, this solution space can be statistical interrogated while integrating the soft constraints of Eq. 2. Since the selection criterion for accepting new solution is a Markov Chain implementation, each new solution is based off the previous solution. The Monte Carlo acceptance model for the CCELIM is
$P(accepting) = e-0.5(SSR_new-SSR_old)$ (4)
Where $SSR_{new}$ and $SSR_{old}$ are the summed squared residuals (Eq. 5) between the given solution and the measured values.
$SSR(sol) = Σ (sol_i-measured_i)^2$ (5)
Since LIMs are well suited to problems with well-itemized flows or pathways with unknown relationships and magnitudes, they have found to be of grat utility in environmental studies. For example, the chemical reactions between methane and other simple, atmospheric constituents can be readily itemized based on simple kinetic experiements. So it would stand to reason that the fate and overall chemistry of methane in the atmosphere should be well understood and constrained by numerical accuracy, but this is far from the truth. Uncertainties in global methane budget are large, and its fate is still largely based on simplistic estimations. Using both wide-ranging and disperate data sources, LIM offers the possibility of integrating localized mesauremrnets, laboratory rate measurments and hypothsized estiamtes to generate a logically consistent model of the fate of methane in the atmosphere. Below we will walk through the methedological steps in approaching such a problem (see original paper from Hein and Crutzen 1997).
Accurately accounting for the flow of groundwater into or out of an area is a difficult yet important problem in many ares around the world. Groundwater plays a role in agriculture, coastal stormwater management, and even in such unfortunate cases as nuclear spills.
Direct measuremernt of groundwater fluxes is impossible due to the depths involved, the scales which range from mm to km and the emphemeral behaviour of fluids. A complete groundwater budger often contains large uncertainties even with course abstractions. Techniques used to interrogate flows involve direct measurements of salinity, temperature, radioactive tracers, passive tracers, micro-magnetic changes and radar to name but a few. Integrating such a diverse set of measurmernts into coherent models, such as simple mixing models, often lead to contradicting assumptions or incomparabilties.
Inverse modeling allows these assumptions to be codefied and for all measurements to contribute to the final product.
Although it is critically important in forming responsible management plans, interrogating and understanding ecological food webs is a notoriously difficult task (\cite{Raffaelli2006}). Often the nutrient flows are obscured by complex community dynamics that vary widely in both temporal and spatial scales. Compounding the uncertainty is a dearth of available metrics that are assailable to direct measurement. One such example might be the grazing rate of Microzoa (MIC) on Heteronanoflagellates (HNF). The overall MIC grazing rate can be readily assayed (\cite{Taniguchi2012}), but class specific rates are rarely interrogated in the field. Neither quantity can be measured \emph{in situ}, which introduces further sources of error and another layer of assumption. Due to this complex topology and the dynamic nature of life, ecological food webs are difficult to quantify.
Marine plankton communities are even more difficult to process compared to their terrestrial doppelgangers due to the plethora of species and the relative scales (\cite{Nencioli})--both large in the case of the water column and small in the case of the phytoplankter--experienced by the oceanographer. Inverse modeling is one avenue of research that permits a deep integration of both the theory and the data (\cite{van}).
Inverse modeling is a data regression technique where a set of structural and physical constraints are combined with field and laboratory data to render simulated ecosystem networks. An abundance of previous work has been done in both developing the methods (\cite{Stukel2,Saint-Beat2013}) as well as applying them to a variety of areas (\cite{Vezina1988,Richardson2004,van}).
The California Current Ecosystem LTER (CCE LTER) is a study region in the coastal upwelling zone of the California current (Figure \ref{LTER}). Having it's beginnings as the California Cooperative Oceanic Fisheries Investigations (CalCOFI) in 1949 (\cite{Ohman2003}), the CCE LTER hosts reliable data sets for the past 65 years. The study area has changed over the years, but it has always focused on the waters from San Diego to San Fransisco and reaching out from shore to nearly 300 km. This region is highly variable in terms of productivity and community composition, and it has been studied extensively in terms of transport processes (\cite{Auad}) and fisheries management (\cite{Field, Coleman}). Nonetheless, an integration of the geophysical with that of the biological has remained elusive.
As an upwelling zone, the potential for massive carbon draw-down and sequestration to the sediments due to primary productivity is high. Yet the same processes that make the region particularly productive in terms of Chl \emph{a} also obscure the efficiency of the biological pump.
An important focus of the CCE LTER since it's official launch in 2004 is to identify and interrogate the abrupt and massive changes in community composition throughout time (\cite{Goericke2015}). These abrupt and non-linear transitions in ecosystem composition are critical to formulating intelligent fishery management plans, yet the unlaying mechanisms remain elusive (\cite{Ohman2004}). Furthermore, the mechanisms and their dynamics may help reveal the impacts of climate change for this region (see \cite{Ohman2013} for an introduction). This project, in particular, aims to identify the community structure on a functional level while providing a common framework by which to understand the large-scale changes in community composition over both time and space.
The input to the model is a specific dataframe with the following fields. $Aa consists of a matrix for the measured, or approximate, equations. Dimensions for this matrix are (number of flox x number of approximate equations).
To generate the appropriately structured dataframe for the model, the function ReadModel() takes as arguments the excel file and the various counts it needs.
#model = ReadModel() #summary(model)
An initial solution using the L2 minimum norm was used to initialize the random walk for each set of data. The random walk was then performed for 100 million steps (100 Ms) with a burn-in period of 2 Ms. The burn-in period allowed the model an opportunity to navigate away from the initial solution before generating final solutions. Since the optimal acceptance ratio for a MCMC has been calculated to be 0.234 when using the Metropolis criterion (Roberts et al. 1999), a jump length of 1.3 was used to approximate that acceptance ratio (median 0.218, IRQ 0.094). To ensure convergence in the modeled distribution, two criteria were required. Plotting the density of the first half and second half of the solution set from each run ensured qualitative convergence of the distribution. In addition, the plot of GPP over the course of the model run provided information on finer scale trends (e.g. if the burn-in length was sufficient).
During the course of the model run, a solution would be saved once every 20,000 steps leading to a final output of 5,000 solutions.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.