knitr::opts_chunk$set( echo = TRUE, eval = !as.logical(Sys.getenv("R_SOILDB_SKIP_LONG_EXAMPLES", unset = "TRUE")), message = FALSE, warning = FALSE, fig.height = 7, fig.width = 7 )
The Soil Data Access (SDA) system provides REST-based access to the SSURGO and STATSGO2 databases maintained by USDA-NRCS. Instead of downloading large databases, SDA allows you to query exactly what you need via a web service.
soilDB provides several functions that help with generating requests to the
web service at a low level, as well as higher-level functions that can return
complex data structures like the SoilProfileCollection from the aqp package
or spatial object types.
SDA provides access to a SQL Server database containing a variety of soil information:
The SDA REST endpoint is available at:
https://sdmdataaccess.sc.egov.usda.gov/tabular/post.rest
All queries to SDA are executed through the SDA_query() function in soilDB,
which handles the HTTP POST requests and JSON response parsing automatically.
SDA contains both the detailed Soil Survey Geographic Database (SSURGO) and the generalized U.S. General Soil Map (STATSGO2).
CA630, IN005).'US'.See the Data Filters section below for details on filtering out STATSGO2 data.
Use SDA for current data from NRCS without manual downloads, queries across multiple survey areas, quick one-time analyses, or access to interpretations and aggregated properties. Use local SSURGO databases for repeated queries to the same survey areas (faster), offline access, complete survey databases with all tables, or direct SQLite manipulation.
The same get_SDA_*() functions documented in this vignette can be applied to
local SSURGO databases by passing the dsn parameter. See the Local SSURGO
vignette for downloading and creating local SSURGO
GeoPackage databases, and for examples of using these functions with local data.
library(soilDB)
For spatial data examples, use sf and terra, which are both optional but
recommended.
install.packages(c("sf", "terra"))
SDA_query() FunctionThe SDA_query() function executes SQL queries directly against the SDA
database:
# Simple query to get the first 5 map units from an area q <- " SELECT TOP 5 mapunit.mukey, mapunit.nationalmusym, legend.areasymbol, mapunit.musym, mapunit.muname FROM legend INNER JOIN mapunit ON mapunit.lkey = legend.lkey WHERE legend.areasymbol = 'CA630' ORDER BY mapunit.musym " result <- SDA_query(q) head(result)
If SDA_query() errors for any reason, the result will not be a data.frame with data. Instead it will be an object of class "try-error". You can handle the possibility of error using inherits(). For example:
result <- SDA_query(q) # check if the result is an error response if (!inherits(result, 'try-error')){ head(result) } else { # stop with error message extracted from result stop("Error occured! ", result[1]) }
You can set up a loop to retry a query that fails in the event of intermittent
network or server issues. You should have a brief period of wait time
(Sys.sleep()) between requests.
For repeated failures, it is best to increase that wait time in between each try up to, for example, a maximum of 3 attempts. The following example uses "exponential backoff" which increases the wait time in between requests incrementally from ~3 seconds to ~20 seconds over the course of several attempts.
n_tries <- 1 while (n_tries <= 3){ result <- SDA_query(q) # check if the result is an error response if (!inherits(result, 'try-error')){ head(result) # if no error, break out of the loop break } else { # on error, increment the number of tries # and sleep (time in seconds) Sys.sleep(exp(n_tries)) n_tries <- n_tries + 1 if (n_tries > 3) { stop("Error occured! ", result[1]) } } }
SDA uses Microsoft SQL Server Transact-SQL (T-SQL) dialect.
Key differences of T-SQL from SQLite and other SQL dialects that are relevant to queries used in soilDB include:
| Feature | T-SQL | SQLite | Notes |
|---------|-----------|--------|-------|
| Row limit | TOP 10 | LIMIT 10 | Place after SELECT |
| NULL functions | ISNULL(col, 0) | IFNULL(col, 0) | Different function names |
| String concatenation | col1 + col2 | col1 || col2 | Different operators |
When using SDA_query() with a local SQLite database (via dsn parameter),
soilDB automatically converts some T-SQL syntax to SQLite equivalents to allow
for both to work. This behavior is triggered when the dsn argument is
specified. You can view the queries (and save them to run yourself) by passing
the query_string=TRUE argument.
Many custom SSURGO queries follow a standard pattern of relationships from legend, to mapunit, to component, to a related table to component such as component horizon.
In SSURGO data model primary and foreign key physical column names all end with
the word "key". The primary key is a unique identifier of a particular record.
Foreign keys are references to primary keys of other tables and define the relationships between tables that can be used for joins.
Here we select a handful of values from all of the above mentioned tables
corresponding to the first 10 component horizon (chorizon) in the SDA table.
# Get map unit, component, and horizon data q <- " SELECT TOP 10 mu.mukey, leg.areasymbol, mu.musym, co.compname, co.comppct_r, ch.hzname, ch.hzdept_r, ch.hzdepb_r, ch.claytotal_r FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey WHERE leg.areasymbol = 'CA630' AND co.majcompflag = 'Yes' AND ch.claytotal_r IS NOT NULL ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r " result <- SDA_query(q) head(result, 10)
We supplied all of the filtering constraints in a WHERE clause for clarity,
with joins only using primary and foreign keys.
In general, the same constraints can be expressed as part of a join to relevant tables, which can be more efficient in complex queries.
One of the most important considerations in SSURGO queries is data completeness: not all relationships in the SSURGO database are populated for all soils or components. Understanding when to use LEFT JOIN vs. INNER JOIN is critical for correct analysis.
Use INNER JOIN when the relationship must exist for meaningful results:
# These relationships are always present in SSURGO: # - mapunit always belongs to a legend (survey area) # - component always belongs to a mapunit q <- " SELECT TOP 10 leg.areasymbol, mu.musym, co.compname, co.comppct_r FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey WHERE leg.areasymbol = 'CA630' ORDER BY mu.musym, co.comppct_r DESC " result <- SDA_query(q) # Every row represents a valid component with all parent relationships head(result)
Use LEFT JOIN when a parent record may exist without child records. Two
example cases:
Case 1: Components Without Horizons
Some soil components in SSURGO do not have associated horizon data. These
are often "non-soil" components (compkind "Miscellaneous Area") or minor
components (majcompflag "No") which may lack detailed horizon description:
# Some components have NO horizons defined # INNER JOIN would exclude these components entirely # LEFT JOIN preserves components even without horizon data q <- " SELECT leg.areasymbol, mu.mukey, mu.musym, co.compname, co.comppct_r, ch.chkey, ch.hzdept_r, ch.hzdepb_r FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey WHERE leg.areasymbol = 'CA630' AND co.majcompflag = 'Yes' ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r " result <- SDA_query(q) # Check for components without horizons: # Rows with NA in horizon columns indicate a component with no horizon data if (is.data.frame(result) && nrow(result) > 0) { components_no_horizons <- result[is.na(result$chkey), ] if (nrow(components_no_horizons) > 0) { cat('Found', nrow(components_no_horizons), 'components without horizon data:\n') print(components_no_horizons[, c('musym', 'compname', 'comppct_r')]) } }
Comparison: INNER JOIN would have excluded those components:
# Using INNER JOIN on horizons drops components that lack horizon data q_inner <- " SELECT leg.areasymbol, mu.mukey, mu.musym, co.compname, co.comppct_r, ch.desgnmaster, ch.hzdept_r, ch.hzdepb_r FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey INNER JOIN chorizon AS ch ON ch.cokey = co.cokey WHERE leg.areasymbol = 'CA630' AND co.majcompflag = 'Yes' ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r " result_inner <- SDA_query(q_inner) # number of rows in LEFT JOIN result nrow(result) # number of rows in INNER JOIN result nrow(result_inner)
Case 2: Horizons Without Rock Fragments
Similarly, not all horizons have child table rock fragment volume recorded. An empty chfrags table is conventionally used for horizons with no fragments, but in some cases may be used for soils that have non-zero fragment weight fractions (for example, those with human artifacts). When you need to include horizons regardless of fragment availability, use LEFT JOIN:
# Some horizons have NO rock fragment data recorded # LEFT JOIN preserves horizons even without fragment information q <- " SELECT mu.musym, co.compname, ch.desgnmaster, ch.hzdept_r, ch.hzdepb_r, rf.fragsize_h, rf.fragvol_r FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey LEFT JOIN chfrags AS rf ON rf.chkey = ch.chkey WHERE leg.areasymbol = 'CA630' AND co.majcompflag = 'Yes' ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r " result <- SDA_query(q) # Check for horizons without fragments: if (is.data.frame(result) && nrow(result) > 0) { horizons_no_frags <- result[!duplicated(result$chkey) & is.na(result$fragsize_h), ] if (nrow(horizons_no_frags) > 0) { cat('Found', nrow(horizons_no_frags), 'horizons without rock fragment data:\n') print(horizons_no_frags[, c('musym', 'compname', 'desgnmaster', 'fragvol_r')]) } }
| Situation | Join Type | Reason | |-----------|-----------|--------| | Query result must have values at all levels | INNER | Filter out incomplete records | | Parent may exist without children (components without horizons, horizons without fragments) | LEFT | Preserve parent records even when children are missing | | Calculating aggregations (COUNT, SUM, AVG) with missing data | Depends | LEFT includes zero-occurrence groups; INNER ignores them | | Applying WHERE filters to child tables | Typically LEFT | INNER JOIN combined with WHERE can lose parents unintentionally |
Tip: When using LEFT JOIN, always check for NA values in child table columns to identify records with missing child data.
SDA supports standard SQL aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.):
# Get component statistics per map unit q <- " SELECT mu.mukey, mu.musym, mu.muname, COUNT(co.cokey) AS n_components, SUM(co.comppct_r) AS total_comppct, AVG(co.comppct_r) AS avg_comppct FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey WHERE leg.areasymbol = 'CA630' GROUP BY mu.mukey, mu.musym, mu.muname ORDER BY mu.musym " result <- SDA_query(q) head(result)
Here we use SQL GROUP BY to define the columns used to determine unique
groups. Other columns that are not part of the GROUP BY statement need to be
included in some sort of aggregation expression to be used in SELECT clause.
When calculating map unit weighted averages using custom SQL, be aware that
comppct_r (Component Percentage) does not always sum to 100% for a map unit
due to minor component inclusions that may not be exported or missing data.
A robust weighted average should normalize weights by the sum of component percentages present in the data, rather than assuming a sum of 100. This is especially important if you are relying on INNER JOIN to chorizon table, as many mapunits have at least some components with no horizon data.
When querying SDA, especially without specific map unit keys or area symbols, it is often important to exclude STATSGO2 data to prevent mixing these two distinct datasets.
Common filter to exclude STATSGO2:
WHERE areasymbol != 'US'
When aggregating soil properties, it is sometimes important to exclude layers that do not represent mineral soil so that depth-weighted averages and other statistics are not skewed.
Note that if zero values are present for a soil property they can drag down
averages. If you include a weighting factor in a custom query for the thickness
of a layer with a NULL value, that is essentially the same as if the property
value were 0.
The two most common types of non-mineral soil layers to exclude are:
Bedrock and other Root-Limiting Layers: Below the "bottom of the soil" (e.g., R, Cr horizons).
Dry Surface Organic Layers (Duff): High-carbon, low bulk density, dry organic (O) horizons
Since SSURGO data spans decades of standards, no single column perfectly identifies these non-soil layers.
The safest approach is to use the new computed horztype column in combination
with in lieu texture class codes. Currently horztype only supports
identification of dry organic layers on the surface, but it may be extended to
other material types in the future.
1. Filter Bedrock:
Bedrock layers often have NULL property values, which is not much of a problem
provided there are data available for the overlying soil layers; but you will
need to exclude the thickness of the bedrock layers for accurate depth-weighted
average calculations.
The horztype column does not capture bedrock. You must filter by texture code:
/* Exclude Bedrock */ AND texturerv NOT IN ('BR', 'WB', 'UWB')
This includes the modern "bedrock" (BR), as well as the obsolete "weathered bedrock" and "unweathered bedrock". For the purposes of excluding non-soil bedrock material, they can be treated as equivalent and will cover many cases.
There are a variety of other "in lieu" texture codes that are used to identify cemented materials ("CEM", "IND", "MAT") which may or may not have other texture information and may also need to be considered for exclusion.
While it might be tempting to just filter on horizon designation (e.g. excluding R or Cr layers explicitly), as discussed below this is not completely reliable on its own.
2. Filter Dry Organics (Duff):
The horztype column is currently primarily for identifying "Dry Organic"
layers. This is a new column that has been programmatically populated to help
users filter out a common set of data.
Combine horztype with texture codes like "SPM" (which is the
least-decomposed organic soil material: dry fibric material) for more robust
confirmation of filtering logic. You can also include "MPM" and "HPM" if you
are interested in excluding even more decomposed dry organic materials:
/* Exclude "Duff" / Dry Organics */ AND ISNULL(horztype, '') != 'Dry Organic' AND texturerv NOT IN ('SPM', 'MPM', 'HPM')
The population of O horizons in SSURGO is inconsistent due to evolving data collection standards. While modern surveys often include numerical data for organic surface layers, legacy data may only mention them in descriptions without populated properties.
Retain these layers when analyzing data completeness or identifying diagnostic features like folistic epipedons. However, for analyses focused on mineral soil properties, excluding them is often necessary to prevent skewing depth-weighted averages.
Note: We generally want to keep "Wet Organic" layers (PEAT, MPT [mucky peat], and MUCK) as these may be diagnostic for Histosols, Histic epipedons etc., which are important concepts for identification of hydric soils.
3. Filter by Horizon Designation
It is also possible, but not completely reliable due to soil survey vintage and patterns in historical data population, to filter by horizon designation.
For instance, a logical statement to exclude the master horizon designation "O" for organics:
AND desgnmaster != 'O'
While using horizon designations is conceptually appealing, it is problematic when it is the only logic you are using. Horizon designations were not historically populated for the aggregate soil component data (SSURGO).
In the last two decades layers in the chorizon table have been assigned
morphologic horizon designations, but this has not been done retroactively. Some
older soil surveys still use "H" as the master horizon designation for all
layers e.g. H1, H2, H3 instead of A, B, C. If this is the case the only way to
identify bedrock, or organic layers, is to use other data elements such as in
lieu texture class.
This section demonstrates filtering horizon data based on material types.
We will start with a custom query that targets components with "Histic" taxonomic subgroup formative element.
q <- " SELECT TOP 10 component.compname, ch.hzname, ch.hzdept_r, ch.hzdepb_r, ch.texturerv, ch.horztype, ch.claytotal_r, ch.om_r FROM chorizon AS ch INNER JOIN component ON component.cokey = ch.cokey AND taxsubgrp LIKE '%Histic%' ORDER BY component.cokey, hzdept_r " result <- SDA_query(q) result
Here is a sample query to select mineral and wet organic soil layers:
q <- " SELECT TOP 10 ch.hzname, ch.hzdept_r, ch.hzdepb_r, ch.texturerv, ch.horztype, ch.claytotal_r, ch.om_r FROM chorizon AS ch WHERE -- 1. Exclude Bedrock ch.texturerv NOT IN ('BR', 'WB', 'UWB') -- 2. Exclude Dry Organic / Duff AND ISNULL(ch.horztype, '') != 'Dry Organic' AND ch.texturerv NOT IN ('SPM', 'MPM', 'HPM') ORDER BY cokey, hzdept_r " result <- SDA_query(q) head(result)
In contrast to bedrock, organic soil materials are soil materials, whether they are wet or dry--but it is important to know when you may need to look past them. Organic layers have very low bulk density and very high organic carbon content, both of which can skew "mineral soil" averages if included in depth-weighted calculations.
Take for example the K factor for estimating erosion hazard: users often want properties for an exposed mineral soil surface rather than that of an organic layer over the mineral soil surface. In general, the concept of K factor does not apply to the organic surface.
To demonstrate, we will look at the whole SSURGO database. We count the number
of horizons with master designation "O" and horizon top depth 0 within groups
of different combinations of Kw and Kf, and sort in decreasing order.
q <- " SELECT COUNT(chkey) AS n, kwfact, kffact FROM chorizon WHERE desgnmaster = 'O' AND hzdept_r = 0 GROUP BY kwfact, kffact ORDER BY n DESC " result <- SDA_query(q) result$percent <- round(prop.table(result$n) * 100, 1) head(result, 10)
This query shows that many O horizons at the soil surface have NULL K factor values.
The take away from this should be: if your analysis relies on having K factor everywhere you will either need to filter out the O horizons, or come up with a way to impute a value to replace the NULL values.
Rock fragments are particles larger than 2 mm. In SSURGO, fragment data are stored in several places both aggregated up to the horizon level, and in child tables of horizon.
Fragments are classified by size class (e.g. gravel, cobbles, stones, boulders) and recorded as volume percentages. Derived weight fractions are also stored at the horizon level.
# Query horizons with their rock fragment content q <- " SELECT TOP 20 mu.musym, co.compname, ch.hzname, ch.hzdept_r, ch.hzdepb_r, rf.fragkind, rf.fragsize_h, rf.fragvol_r FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey LEFT JOIN chfrags AS rf ON rf.chkey = ch.chkey WHERE leg.areasymbol = 'CA630' AND co.majcompflag = 'Yes' ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r, rf.fragsize_h " result <- SDA_query(q) head(result, 20)
Above you can see multiple fragment size classes can exist in a single horizon
(a many-to-one relationship). Horizons without fragments generally have no
records in the chfrags table.
chfrags table contains individual fragment records per horizon and size class (e.g. at least one each for gravel, cobbles, stones, boulders, where present)
fragsize_h indicates upper end of fragment size range for a specific record
fragvol_r is the representative volume percentage for that specific record
Sum fragment volumes to get total rock fragment content per horizon:
# Total rock fragment content per horizon q <- " SELECT mu.musym, co.compname, co.comppct_r, ch.hzname, ch.hzdept_r, ch.hzdepb_r, SUM(rf.fragvol_r) AS calc_fragvoltot_r, COUNT(DISTINCT rf.fragsize_h) AS n_frag_classes FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey LEFT JOIN chfrags AS rf ON rf.chkey = ch.chkey WHERE leg.areasymbol = 'CA630' AND co.majcompflag = 'Yes' GROUP BY mu.musym, co.compname, co.comppct_r, ch.hzname, ch.hzdept_r, ch.hzdepb_r, ch.chkey ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r " result <- SDA_query(q) head(result)
Note: T-SQL is strict about aggregation syntax. Every column in the SELECT clause that is not wrapped in an aggregate function (SUM, COUNT, etc.) MUST be listed in the GROUP BY clause.
Use get_SDA_property() to retrieve rock fragment properties with aggregation:
# Get rock fragment weight percentages (3-10 inch class) frag_3_10 <- get_SDA_property( property = "Rock Fragments 3 - 10 inches - Rep Value", # equivalent to 'frag3to10_r' method = "Dominant Component (Numeric)", # dominant component, weighted average by depth areasymbols = "CA630", top_depth = 0, bottom_depth = 100 # depth range for averaging ) head(frag_3_10) # get fragment volume (fragvoltot_l, fragvoltot_r, fragvoltot_h) fragvol_total <- get_SDA_property( property = "fragvoltot_r", method = "Weighted Average", # weighted by component percentage and depth top_depth = 0, bottom_depth = 100, areasymbols = "CA630" ) head(fragvol_total)
Note: Fragment weight fraction properties in chorizon (like frag3to10_r and fraggt10_r) are always populated, but currently the fragvoltot_* columns are not and data availability depends on the vintage of the specific SSURGO map unit data.
The SDA system has important hard limits:
Record Limit: Queries that return more than 100,000 records are rejected
Result Size: Response serialization limited to approximately 32 Mb
Memory Limit: Queries generally cannot exceed approximately 300 Mb memory usage
Timeouts and Complexity: Long-running queries may timeout or hit other internal limits during query planning
Note on Local vs. Remote: These limits primarily apply to the Remote SDA API. When querying a local SQLite database (e.g. created via createSSURGO() or downloadSSURGO()), you are generally unrestricted by record counts or timeouts. Complex tabular queries can often be run on entire states or the whole country locally without the need for chunking or simplification.
This query will return >100k records (all components across all of SSURGO and STATSGO) (bad):
SELECT * FROM component
This query selects a specific component name pattern in WHERE clause (good):
SELECT * FROM component WHERE compname LIKE 'Miami%'
Use a TOP clause to test with small result set first
SELECT TOP 100 * FROM chorizon
TOP can be used after sorting with ORDER BY to easily find the values
associated with an ordered sequence. For instance, if you ORDER BY comppct_r
DESC (order by component percentage in descending order), then take TOP 1 you
will get the records associated with the dominant component. This is a very
common aggregation scheme available as an option in most tools that use SSURGO
data.
TOP is also generally useful for testing queries, or checking table result structure for a few resulting rows. If testing is successful, remove TOP, add filtering with WHERE or in JOIN condition (see below), or increase the TOP limit.
Efficient queries often perform filtering operations as part of their JOIN statements. This can improve server-side query planning and reduce overhead that is not needed for the final result.
Return aggregated results (fewer rows) instead of raw data and constrain musym in join to mapunit instead of WHERE:
SELECT mu.mukey, COUNT(*) AS n_hz, AVG(ch.claytotal_r) AS avg_clay FROM component co INNER JOIN mapunit mu ON mu.mukey = co.mukey AND INNER JOIN chorizon ch ON ch.cokey = co.cokey GROUP BY mu.mukey WHERE compname LIKE 'Musick%'
Efficient queries also are explicit about columns in the SELECT clause.
While SELECT * is fine to use for testing, it is prone to issues if
schemas change, as well as returning more data than is necessary in
most cases.
SELECT * will return all columns, usually including many columns you
do not need:
SELECT TOP 1 * FROM mapunit INNER JOIN component ON mapunit.mukey = component.mukey WHERE compname LIKE 'Musick%'
Use SELECT column1, column2, ... for concise results, with no
duplicated columns in complex joins:
SELECT TOP 1 mapunit.mukey, musym, muname, compname, localphase, comppct_r, majcompflag, compkind, cokey FROM mapunit INNER JOIN component ON mapunit.mukey = component.mukey WHERE compname LIKE 'Musick%'
If a column name occurs multiple times in different tables, you need specify
which one you want it to come from with <tablename>.<columname> syntax.
If you do need all columns from a specific table use <tablename>.* syntax, for
example here we get all columns from chfrags, and specific columns from
chorizon and component:
SELECT hzname, chfrags.*, component.cokey FROM chfrags INNER JOIN chorizon ON chorizon.chkey = chfrags.chkey INNER JOIN component ON component.cokey = chorizon.cokey WHERE compname LIKE 'Musick%'
makeChunks() for large vectors (see next section)makeChunks() is a helper function in the `soilDB`` package. It takes a vector
of values as input, and assigns each value a group (chunk) number. You can then
calculate the unique chunk numbers and iterate over them to process data in
bite-sized pieces. This is covered extensively in the next section.
Check and filter for non-NULL - SDA uses IS NULL / IS NOT NULL:
SELECT TOP 10 * FROM chorizon WHERE claytotal_r IS NOT NULL
Use ISNULL() to provide defaults such as converting missing clay content to 0
SELECT TOP 10 chkey, claytotal_r, ISNULL(claytotal_r, 0) AS clay FROM chorizon
Use CAST() for explicit type conversion:
SELECT TOP 10 claytotal_r, CAST(claytotal_r AS INT) AS clay_int, CAST(chkey AS CHAR(10)) AS chkey_str FROM chorizon WHERE claytotal_r != CAST(claytotal_r AS INT)
Note: in example above, casting to INT is equivalent to truncating the value by removing decimals, it does not follow the rounding rules that R round() does, for example.
You can use wildcards with LIKE operator:
SELECT * FROM mapunit WHERE muname LIKE '%clay%'
Perform string concatenation with + operator:
SELECT areasymbol + '-' + musym AS areamusym FROM mapunit
When you need data for many map units, components, soil survey areas, or other entities, a single query can exceed SDA's limits.
SDA is a shared public resource, so it is generally encouraged to make requests that are as small and efficient as possible.
The makeChunks() function divides a large vector into smaller chunks for
iterative querying.
Before implementing any chunked query approach, you should also be certain that
your query can run properly on a small set, for example just single chunk, or a
result that is truncated using a TOP clause.
makeChunks()Suppose you have 5000 map unit keys to query. We can break that set up into chunks of 1000 map units, and iterate over each chunk using lapply()
large_mukey_list <- 461994:466995 # Divide into chunks of 1000 chunks <- soilDB::makeChunks(large_mukey_list, 1000) # Now loop through chunks results_list <- lapply(split(large_mukey_list, chunks), function(chunk_keys) { # Build IN clause in_clause <- soilDB::format_SQL_in_statement(chunk_keys) # construct query q <- sprintf( "SELECT mukey, musym, muname FROM mapunit WHERE mukey IN %s", in_clause ) cat(sprintf("Chunk mukey %d to %d\n", min(chunk_keys), max(chunk_keys))) SDA_query(q) }) # Combine results all_results <- do.call(rbind, results_list)
The above approach is readily converted to parallel processing, for example using future::future.lapply(), or other custom iterator functions that allow for progress reporting and other diagnostics.
If you do choose to use parallel processing, you should limit the number of concurrent connections made to the API to a reasonable number, say, 2 to 5 requests at a time, with a maximum of about 10. You should also implement proper error handling and retry mechanisms (see Handling Errors section above).
Here we query soil properties for 6 soil survey areas in chunks of 2.
test_areasymbols <- c("CA067", "CA077", "CA632", "CA644", "CA630", "CA649") # Chunk into groups of 5 chunks <- soilDB::makeChunks(test_areasymbols, 2) results_list <- lapply(split(test_areasymbols, chunks), function(chunk_keys) { # Build IN clause in_clause <- soilDB::format_SQL_in_statement(chunk_keys) q <- sprintf(" SELECT mu.mukey, mu.musym, mu.muname, co.cokey, co.compname, SUM(hzdepb_r - hzdept_r) AS sum_hz_thickness FROM legend AS leg INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey INNER JOIN component AS co ON co.mukey = mu.mukey INNER JOIN chorizon AS ch ON ch.cokey = co.cokey WHERE leg.areasymbol IN %s GROUP BY mu.mukey, mu.musym, mu.muname, co.cokey, co.compname ", in_clause) cat(sprintf("Chunk %s completed\n", paste0(chunk_keys, collapse = ", "))) SDA_query(q) }) all_results <- do.call(rbind, results_list) head(all_results)
Note: The above sample query is not a good way to determine soil thickness, as it does not exclude thickness of non-soil layers (e.g. bedrock) that should not be counted as part of the soil depth. As an exercise, use the logic from the previous sections to improve this example to filter out bedrock.
Typical queries: 1000-10000 items per chunk (safe for most queries)
Simple lookups: 10000-100000 items per chunk (less data per item)
Complex joins: 500-1000 items per chunk (more processing per item)
Monitor your query results and adjust based on SDA response times.
You may get an error:
Invalid query: The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
If you see this, first confirm that your query syntax is correct for the result you are trying to obtain, for instance by trying it on a minimal small set of data. Critically evaluate the need for joins, subqueries, and complex filtering logic. Identify any areas of repeated logic and refactor to simplify redundant steps. If your query works as expected on a small data set, and you have optimized it, then consider breaking your big query up into (more) chunks.
This document mainly focuses on details of custom spatial queries using the
low-level SDA_query() function, but there are several functions that provide
high-level convenience interface for spatial queries in soilDB. First we will
discuss converting spatial data to Well-Known Text for use in generating custom
queries. Then we will discuss the high- and low-level interfaces in soilDB.
Important: SDA requires spatial inputs (WKT) to be in WGS84 (EPSG:4326) geographic coordinates.
Passing projected coordinates (like UTM or Albers) will result in query failures or zero results.
SDA does store an alternate projected version of the geometry data (e.g.
mupolygonproj, rather than mupolygongeo). The projection is a Web Mercator
projection ("EPSG:3857") so it is best avoided in analysis, though it is an
option for visualization web map applications that lack capability to project
the data.
You can use sf to define a geometry, such as a bounding recatangle, and
convert it to WKT (Well-Known Text) for spatial queries:
library(sf) # Define a bounding box for a region of interest # (xmin, ymin, xmax, ymax) bbox <- c(-120.9, 37.7, -120.8, 37.8) bbox_sf <- st_as_sfc(st_bbox(c( xmin = bbox[1], ymin = bbox[2], xmax = bbox[3], ymax = bbox[4] ), crs = 4326)) wkt <- st_as_text(bbox_sf) wkt
The soilDB package provides two higher-level functions (SDA_spatialQuery() and
fetchSDA_spatial()) that make obtaining spatial data easier. Both of these
functions internally use SDA_query() and are often a better option for most
use cases compared to crafting a custom spatial query.
SDA_spatialQuery(): spatial inputs (e.g. point location, AOI rectangle) => spatial or tabular outputs
fetchSDA_spatial(): tabular inputs (e.g. map unit key, area symbol, ecological site ID) => spatial outputs
The functions take different inputs, and support various outputs including
SSURGO and STATSGO2 map unit polygon geometry, special feature
points/lines/polygons, soil survey area boundaries, Major Land Resource Area
(MLRA) boundaries, or simple tabular information using spatial data as input.
These functions will handle projection to "EPSG:4326" internally as long as the
input data has the correct CRS metadata.
# Example: Retrieve map unit polygons that intersect a bounding box # This handles the WKT conversion and query construction automatically # geomIntersection = TRUE clips the polygons to the bounding box polygons <- SDA_spatialQuery(bbox_sf, what = "mupolygon", geomIntersection = TRUE) polygons
Note that bbox_sf could be any geometry. Here is is a rectangular polygon,
but it could be a point, or collection of points, or lines, or similar.
SDA_spatialQuery() allows you to get tabular data based on a spatial geometry.
For instance, to simply identify what soil survey area or map unit a single point overlaps with, you can do something like the following.
# Example: get some legend and mapunit-level tabular data at a point point_sf <- sf::st_as_sf(data.frame(wkt = "POINT (-120.85 37.75)"), wkt = "wkt", crs = "EPSG:4236") ssa <- SDA_spatialQuery( point_sf, what = "areasymbol" ) ssa mu <- SDA_spatialQuery( point_sf, what = "mukey" ) mu
Above we pass a WKT string (the result of sf::st_centroid(bbox_sf)) within a
data.frame to sf::st_as_sf(). We specify the wkt argument to specify which
column the WKT string is stored in to construct our spatial query object.
This section will be expanded in the future, but for now an example application of the two higher-level functions is available in the "Dominant Ecological Site" vignette.
For optimal performance, separate your spatial and tabular queries. Fetch the
list of intersecting map unit keys (mukey) first, then use those keys to query
attribute tables.
This avoids joining heavy geometry columns with complex attribute tables, which can trigger the 32Mb serialization limit. Then we can use the WKT in a custom SDA query to get map units intersecting the bounding box:
# Step 1: Get the mukeys that intersect the bounding box q <- sprintf(" SELECT DISTINCT mu.mukey FROM mapunit AS mu INNER JOIN mupolygon AS mp ON mp.mukey = mu.mukey WHERE mp.mupolygongeo.STIntersects(geometry::STGeomFromText('%s', 4326)) = 1 ", wkt) spatial_result <- SDA_query(q) head(spatial_result, 5)
This query uses special geospatial functions defined in SQL Server to perform the geometric operations (in this case "intersection")
In addition to standard T-SQL spatial functions, SDA provides pre-defined Table-Valued Functions (TVFs) optimized for intersections.
SDA_Get_Mukey_from_intersection_with_WktWgs84(wkt): Returns mukey values intersecting the input WKT.SDA_Get_MupolygonWktWgs84_from_Mukey(mukey): Returns WKT geometry for a specific mukey.This makes the logic much simpler for the common operations of finding mukey
given a spatial location or geometry:
# Input MUST be WKT in WGS84 (EPSG:4326) q <- "SELECT * FROM SDA_Get_Mukey_from_intersection_with_WktWgs84('POINT(-120.9 37.7)')" result <- SDA_query(q)
Also, you can do the inverse (map unit key to geometry):
# Step 2: Get Geometry from Map Unit Key (mukey) # Useful for retrieving the polygon boundary for a specific map unit target_mukey <- 461994 q <- sprintf("SELECT * FROM SDA_Get_MupolygonWktWgs84_from_Mukey('%s')", target_mukey) # Result contains the mukey and the geometry in WKT format geometry_df <- SDA_query(q) head(geometry_df)
Using the sf (or terra) package, you can convert the resulting text WKT column to a
spatial object:
if (requireNamespace("sf", quietly = TRUE) && !is.null(geometry_df)) { # Parse WKT column (usually named 'mupolygongeo' in mupolygon table, but 'MupolygonWktWgs84' in TVF result) poly_sf <- sf::st_as_sfc(geometry_df$MupolygonWktWgs84, crs = 4326) plot(poly_sf, main = paste("Geometry for mukey=", target_mukey)) }
Note: The TVF function name implies the CRS is WGS84 (authority:code EPSG:4326, SRS ID 4326; or, more correctly, OGC:CRS84).
See the SDA Advanced Query Guide for more details on TVF functions and other high-level features built into SDA.
We can use the result of our spatial filtering with property queries.
Here we extract the unique map unit keys and pass as mukeys argument to
get_SDA_property()
# Step 2: Use the mukeys to fetch tabular data # First, get mukeys in bounding box spatial_mukeys <- unique(spatial_result$mukey) # Then query properties for those mukeys if (length(spatial_mukeys) > 0) { clay_in_bbox <- get_SDA_property( property = "Total Clay - Rep Value", method = "Weighted Average", mukeys = spatial_mukeys, top_depth = 0, bottom_depth = 50 ) head(clay_in_bbox) }
This shows how to get the data, see the Local SSURGO vignette
for more spatial visualization examples using sf.
get_SDA_property(): Component and Horizon PropertiesThe get_SDA_property() function queries the "SSURGO On Demand" service to
retrieve soil component and horizon properties with automatic aggregation to the
map unit level.
By default get_SDA_property() includes minor components
(include_minors=TRUE) but excludes miscellaneous areas
miscellaneous_areas=FALSE. Even miscellaneous areas that have some soil data
are excluded.
SDA has 80+ horizon properties and 40+ component properties.
Here are some common examples.
Component properties:
"Taxonomic Order", "Taxonomic Suborder", "Taxonomic Temperature Regime""Drainage Class", "Hydrologic Group""Corrosion of Steel", "Corrosion of Concrete""Range Production - Favorable/Normal/Unfavorable Year"Horizon properties:
Water: "Available Water Capacity - Rep Value", "0.1 bar H2O - Rep Value", "0.33 bar H2O - Rep Value", "15 bar H2O - Rep Value"
Bulk Density: "Bulk Density 0.1 bar H2O - Rep Value", "Bulk Density 0.33 bar H2O - Rep Value", "Bulk Density 15 bar H2O - Rep Value", "Bulk Density oven dry - Rep Value"
Texture: "Total Sand - Rep Value", "Total Silt - Rep Value", "Total Clay - Rep Value", and sand/silt sub-fractions
Chemistry: "pH 1:1 water - Rep Value", "Electrical Conductivity - Rep Value", "Cation Exchange Capacity - Rep Value", "Sodium Adsorption Ratio - Rep Value", "Calcium Carbonate - Rep Value", "Gypsum - Rep Value"
Other: "Saturated Hydraulic Conductivity - Rep Value", "Organic Matter - Rep Value", "Free Iron - Rep Value", "Oxalate Iron - Rep Value"
Alternately, you can directly use physical column names from the SSURGO schema
like "claytotal_r", "ph1to1h2o_r", "ksat_r", etc.
Dominant component (category) get the categorical value from the dominant soil component. This query is a distinct method as categorical data need to be handled differently from numeric data; the numeric method is described below.
# Get taxonomic classification from dominant component tax_order <- get_SDA_property( property = "Taxonomic Suborder", method = "Dominant Component (Category)", areasymbols = "CA630" ) head(tax_order)
Dominant component (numeric) retrieves properties from the dominant component, optionally averaged over a depth range (in centimeters):
# Get bulk density at 0-25 cm depth bulk_density <- get_SDA_property( property = c("dbthirdbar_l", "dbthirdbar_r", "dbthirdbar_h"), method = "Dominant Component (Numeric)", areasymbols = "CA630", top_depth = 0, bottom_depth = 25 ) head(bulk_density)
The weighted average method retrieves component percentage and depth-weighted
average across all components in a map unit. In this example we explicitly
exclude minor components by changing the default include_minors argument.
# Get weighted average clay content (0-50 cm) clay_avg <- get_SDA_property( property = "Total Clay - Rep Value", method = "Weighted Average", areasymbols = "CA630", top_depth = 0, bottom_depth = 50, include_minors = FALSE ) head(clay_avg)
Method `"Min/Max" is used to get minimum or maximum values across all components in a mapunit.
# Get maximum sand content in any component sand_max <- get_SDA_property( property = "Total Sand - Rep Value", method = "Min/Max", areasymbols = "CA630", FUN = "MAX", top_depth = 0, bottom_depth = 50 ) head(sand_max)
Dominant condition aggregates components with the same category value for the properties of interest, then picks the dominant percentage. This gives extra weight to categories where multiple soils in the same map unit have the same value.
# Get dominant drainage class condition drain_dominant <- get_SDA_property( property = "Drainage Class", method = "Dominant Condition", areasymbols = "CA630" ) head(drain_dominant)
Method "none" returns one row per component or component horizon, depending on
the property selected, with no map unit aggregation. This generally returns much
more data, but allows for custom aggregation outside of the SQL query, and
deeper inspection of the source data.
You cannot mix component-level and
horizon-level properties with this method. Run it twice, once for horizon-level
and once for component-level, then join the component data to the horizon data
using the cokey column to get a single data frame.
Also, the top_depth and bottom_depth arguments are deliberately ignored with
this method. Apply your own filtering to the result if you only need a specific
depth range.
# Get all pH values by component and horizon ph_all <- get_SDA_property( property = "pH 1:1 water - Rep Value", method = "None", areasymbols = "CA630" ) head(ph_all)
We can use query_string = TRUE to see the SQL generated by get_SDA_property():
q <- get_SDA_property( property = "Taxonomic Suborder", method = "Dominant Component (Category)", areasymbols = "CA630", query_string = TRUE ) # View first 500 characters of query cat(substr(q, 1, 500), "...\n")
get_SDA_interpretation(): Soil InterpretationsSoil interpretations are ratings that assess how suitable or limited a soil is
for various land uses. An interpretation for a specific component or map unit
includes a rating class, and a numeric value ("fuzzy rating"). The
get_SDA_interpretation() function retrieves these ratings with multiple
aggregation methods.
SDA contains 600+ interpretations organized by category:
| Category | Examples |
|----------|----------|
| FOR (Forestry) | Potential Seedling Mortality, Road Suitability, Harvest Suitability, Equipment Operability |
| AGR (Agriculture) | Crop Yield, Irrigation Suitability, Pesticide Pollution Potential |
| ENG (Engineering) | Septic Tank Absorption Fields, Construction Roads, Dwelling Foundations, Dam Sites |
| URB/REC (Urban/Recreation) | Trails, Playgrounds, Picnic Areas, Recreational Development |
| AWM (Animal Waste Management) | Wastewater Application Sites, Manure Storage |
| WMS (Water Management System) | Irrigation Suitability, Drainage, Wetland Rating |
| CPI (Commodity Crop Productivity Index) | National Commodity Crop Productivity Index |
Get the rating from the dominant soil component:
# Get forestry ratings for dominant component for_ratings <- get_SDA_interpretation( rulename = c("FOR - Potential Seedling Mortality", "FOR - Road Suitability (Natural Surface)"), method = "Dominant Component", areasymbols = "CA630" ) head(for_ratings)
Aggregate similar ratings, then return the dominant:
# Get dominant engineering interpretation eng_ratings <- get_SDA_interpretation( rulename = "ENG - Septic Tank Absorption Fields", method = "Dominant Condition", areasymbols = "CA649" ) head(eng_ratings)
Component-percentage-weighted average of ratings:
# Get weighted average agricultural rating agr_weighted <- get_SDA_interpretation( rulename = "AGR - Winter Wheat Yield (MT)", method = "Weighted Average", areasymbols = "MT041", include_minors = TRUE ) head(agr_weighted)
Note: in this example, a regional (Montana-specific) interpretation is used, so we used a Montana areasymbol (MT041). Some interpretations are only exported for specific states, regions, or soil survey areas.
Return one row per component with no map unit aggregation:
# Get all component-level ratings all_ratings <- get_SDA_interpretation( rulename = "FOR - Mechanical Planting Suitability", method = "None", areasymbols = "CA630" ) head(all_ratings)
Use wide_reason = TRUE to pivot reason columns into separate sub-rule columns:
# Get detailed ratings with reasons pivoted detailed <- get_SDA_interpretation( rulename = "FOR - Mechanical Planting Suitability", method = "Dominant Component", areasymbols = "CA630", wide_reason = TRUE ) head(detailed)
get_SDA_hydric(): Hydric Soil ClassificationsHydric soils are inundated or saturated long enough during the growing season to
develop anaerobic conditions. The get_SDA_hydric() function evaluates the
proportion of hydric components in a map unit.
Returns an overall classification for each map unit:
hydric_class <- get_SDA_hydric( areasymbols = c("CA077", "CA630"), method = "MAPUNIT" ) head(hydric_class) # Check unique ratings unique(hydric_class$HYDRIC_RATING)
Possible classifications:
"Nonhydric" - No hydric components
"Hydric" - All major components are hydric
"Predominantly Hydric" - Hydric components >= 50%
"Partially Hydric" - One or more major components are hydric
"Predominantly Nonhydric" - Hydric components < 50%
The output includes:
hydric_majors - Percentage of hydric major components
hydric_inclusions - Percentage of hydric minor/inclusion components
total_comppct - Total component percentage (should sum to 100)
Get hydric status of the dominant component only:
hydric_dom <- get_SDA_hydric( areasymbols = "CA630", method = "DOMINANT COMPONENT" ) head(hydric_dom)
Aggregate by hydric condition (Yes/No), then pick the dominant:
hydric_cond <- get_SDA_hydric( mukeys = c(461994, 461995, 462205), method = "DOMINANT CONDITION" ) head(hydric_cond)
Return one row per component (component-level hydric status):
hydric_all <- get_SDA_hydric( areasymbols = "CA630", method = "NONE" ) head(hydric_all)
get_SDA_muaggatt(): Map Unit Aggregate AttributesMap unit aggregate attributes are pre-computed statistics summarizing the components in a map unit. These include surface texture, slope range, permeability, and other physical properties.
muagg <- get_SDA_muaggatt( areasymbols = "CA630" ) head(muagg) str(muagg)
This function returns a data frame with one row per map unit, containing dozens
of pre-calculated aggregate properties. See the get_SDA_muaggatt()
documentation for the full list of available columns.
You can filter by area symbol, map unit key, or custom WHERE clause:
# Get aggregate attributes for specific mukeys muagg_filtered <- get_SDA_muaggatt( mukeys = c(461994, 461995, 463264) ) head(muagg_filtered)
get_SDA_pmgroupname(): Parent Material GroupsParent material groups classify the primary geological or organic origin of soil
material. The get_SDA_pmgroupname() function retrieves parent material
classifications by component.
Get parent material from the dominant component:
pm_dom <- get_SDA_pmgroupname( areasymbols = "CA630", method = "DOMINANT COMPONENT", simplify = FALSE ) head(pm_dom)
Aggregate parent materials by group, then return the dominant:
pm_cond <- get_SDA_pmgroupname( mukeys = c(461994, 461995, 462205), method = "DOMINANT CONDITION" ) head(pm_cond)
By default, simplify = TRUE groups parent materials into broader categories.
Set simplify = FALSE to get detailed group names:
# Simplified (broader groups) pm_simple <- get_SDA_pmgroupname( mukeys = c(461994, 461995), simplify = TRUE ) head(pm_simple) # Detailed (specific groups) pm_detailed <- get_SDA_pmgroupname( mukeys = c(461994, 461995), simplify = FALSE ) head(pm_detailed)
get_SDA_coecoclass(): Component Ecological ClassesEcological classes describe potential natural vegetation and ecological
conditions. The get_SDA_coecoclass() function retrieves ecological site
classifications by component.
A worked example of this function is available in the "Dominant Ecological Site" vignette.
Return component-level ecological classes without aggregation:
eco_none <- get_SDA_coecoclass( method = "None", areasymbols = "CA630" ) head(eco_none)
The default behavior targets the NRCS Ecological Site database-related entries, but there are many other ecological/vegetation class types in SSURGO.
Columns include:
ecoclassid - Ecological Class ID
ecoclassname - Full ecological class name
ecoclasstypename - Type (e.g., "NRCS Rangeland Site", "NRCS Forestland Site")
ecoclassref - Reference document
Get ecological site from the dominant component only:
eco_dom <- get_SDA_coecoclass( method = "Dominant Component", areasymbols = "CA630" ) head(eco_dom)
Aggregate ecological classes by ID, then return the dominant:
eco_cond <- get_SDA_coecoclass( method = "Dominant Condition", mukeys = c(461994, 461995, 462205), ecoclasstypename = "NRCS Forestland Site" ) head(eco_cond)
Return all ecological sites per map unit in wide format:
eco_all <- get_SDA_coecoclass( method = "All", mukeys = c(461994, 461995), threshold = 5 ) head(eco_all)
Filtering by ecological class type:
# Get rangeland sites only eco_range <- get_SDA_coecoclass( method = "Dominant Condition", areasymbols = "CA630", ecoclasstypename = "NRCS Rangeland Site" ) head(eco_range)
get_SDA_cosurfmorph(): Component Surface MorphometrySurface morphometry describes the three-dimensional shape and position of a
landscape. The get_SDA_cosurfmorph() function returns component-level
geomorphic classifications aggregated in various ways.
SDA contains four cosurfmorph tables:
| Table | Variables | Description |
|-------|-----------|-------------|
| cosurfmorphgc | geomposmntn, geomposhill, geomposflats, geompostrce | 3D geomorphic classification |
| cosurfmorphhpp | hillslopeprof | 2D hillslope position profile |
| cosurfmorphss | shapeacross, shapedown | Surface shape (convex/concave/linear) |
| cosurfmorphmr | geomicrorelief | Microrelief (slope complexity) |
# Get geomorphic position by component name geom_3d <- get_SDA_cosurfmorph( table = "cosurfmorphgc", areasymbols = "CA630" ) head(geom_3d)
Output includes columns like:
compname - Component name
geomposmntn, geomposmntn_n - Mountain position (count)
p_geomposmntn - Mountain position proportion
# Get hillslope position aggregated by area symbol geom_hill <- get_SDA_cosurfmorph( table = "cosurfmorphhpp", by = "areasymbol", areasymbols = "CA630" ) head(geom_hill)
# Get surface shape classes geom_shape <- get_SDA_cosurfmorph( table = "cosurfmorphss", areasymbols = "CA649" ) head(geom_shape)
# Get microrelief by component name geom_micro <- get_SDA_cosurfmorph( table = "cosurfmorphmr", areasymbols = "CA630" ) head(geom_micro)
fetchSDA(): Get Soil Profile CollectionThe fetchSDA() function is a high-level wrapper around SDA_query() that
simplifies the process of downloading and assembling soil profile data into a
SoilProfileCollection object from the package aqp. It automatically handles
the complex joins between map unit, component, and horizon tables.
You can fetch all components and horizons for a specific area or set of map
units using the WHERE argument:
library(aqp) # Query soil components by areasymbol and musym s <- fetchSDA(WHERE = "areasymbol = 'IN005' AND musym = 'MnpB2'") # The result is a SoilProfileCollection s # Check the horizon data head(horizons(s))
This function is very powerful for getting a quick snapshot of the soil profiles in an area.
It is worth noting that fetchSDA() returns data from both SSURGO and
STATSGO2. To limit to SSURGO, use WHERE = "areasymbol != 'US'" (see the Data
Filtering section), or specify a SSURGO area symbol.
The duplicates argument (defaulting to FALSE) controls how components that appear in multiple survey areas are handled.
duplicates = FALSE (Default): Returns only one instance of a component per nationalmusym. This eliminates "duplication" where multiple survey area (legend) map units are linked to the same source mapunit and component records. This is generally preferred for statistical analysis of soil properties.duplicates = TRUE: Returns a record for each unique map unit key (mukey). This is useful for visualization or when you need to account for every occurrence of a component across different survey area correlations.fetchLDM(): Get Laboratory DataThe fetchLDM() function provides access to the Kellogg Soil Survey Laboratory
(KSSL) Data Mart via Soil Data Access. It allows for querying laboratory data by
various criteria and returning it as a SoilProfileCollection.
You can fetch lab data by specifying an identifier (like an area symbol) and the
column it corresponds to (what argument):
# Fetch KSSL data for a specific soil survey area (CA630) # 'what' argument specifies we are searching by 'area_code' # 'tables' argument specifies which data tables to include (defaults to core tables) ldm_data <- fetchLDM("CA630", what = "area_code") # The result is a SoilProfileCollection ldm_data # Inspect site data head(site(ldm_data))
You can also search by correlated or sampled-as taxon names using a custom WHERE clause:
# Fetch lab data where the correlated or sampled name is 'Musick' or 'Holland' # CASE statement handles differences between correlated and sampled names ldm_taxon <- fetchLDM(WHERE = "(CASE WHEN corr_name IS NOT NULL THEN LOWER(corr_name) ELSE LOWER(samp_name) END) IN ('musick', 'holland')") ldm_taxon
By default, fetchLDM() retrieves a standard set of tables. You can request
specific data, such as physical properties, using the tables argument:
# Fetch physical properties for soils correlated as "Typic Argialbolls" ldm_phys <- fetchLDM(x = "Typic Argialbolls", what = "corr_taxsubgrp", tables = "lab_physical_properties") # Inspect the available horizon data columns names(horizons(ldm_phys))
fetchLDM supports retrieving data from various KSSL tables, including chemical properties, mineralogy, and x-ray analysis.
Compare the same property using different aggregation methods:
# Get clay content using three different methods methods <- c("Dominant Component (Numeric)", "Weighted Average", "Max") clay_results <- data.frame() for (method in methods) { result <- get_SDA_property( property = "Total Clay - Rep Value", method = method, areasymbols = "CA630", top_depth = 0, bottom_depth = 50 ) result$method <- method clay_results <- rbind(clay_results, result) } # compare methods for a single map unit subset(clay_results, mukey == 1865918)
Combine soil properties and interpretations for a land use suitability assessment:
# Get drainage class and hydrologic group drain_hydro <- get_SDA_property( property = c("Drainage Class", "Hydrologic Group"), method = "Dominant Condition", areasymbols = "CA630" ) # Get an engineering interpretation eng_interp <- get_SDA_interpretation( rulename = "ENG - Septic Tank Absorption Fields", method = "Dominant Condition", areasymbols = "CA630" ) # Explicitly merge on mukey (and other shared columns to avoid duplication) suitability <- merge(drain_hydro, eng_interp, by = c("mukey", "areasymbol", "musym", "muname")) head(suitability)
Combine hydric classifications with wetland-related interpretations:
# Get a generalized mapunit-level hydric classification # see ?get_SDA_hydric for details on method="mapunit" hydric_stat <- get_SDA_hydric( areasymbols = "CA077", method = "MAPUNIT" ) wet_interp <- get_SDA_interpretation( rulename = "WMS - Excavated Ponds (Aquifer-fed)", method = "Dominant Condition", areasymbols = "CA077" ) wetland_assess <- merge(hydric_stat, wet_interp, by = c("mukey", "areasymbol", "musym", "muname"), all.x = TRUE) subset(wetland_assess, rating_WMSExcavatedPondsAquiferfed < 0.6)
Here we use base R aggregate() (feel free to swap in your favorite R
aggregation method e.g. dplyr, data.table, or collapse) to summarize
across survey areas:
# Get properties for two areas props_ca630 <- get_SDA_property( property = "Total Clay - Rep Value", method = "Dominant Component (Numeric)", areasymbols = "CA630" ) props_ca649 <- get_SDA_property( property = "Total Clay - Rep Value", method = "Dominant Component (Numeric)", areasymbols = "CA649" ) # Combine all_props <- rbind(props_ca630, props_ca649) # Aggregate by area symbol area_summary <- aggregate( claytotal_r ~ areasymbol, data = all_props, FUN = function(x) { c( mean = mean(x, na.rm = TRUE), median = median(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE), n = sum(!is.na(x)) ) } ) area_summary
Use method = "None" to perform custom component-level analysis:
# Get all component properties without aggregation clay_by_comp <- get_SDA_property( property = "Total Clay - Rep Value", method = "None", areasymbols = "CA630", top_depth = 0, bottom_depth = 25, include_minors = TRUE ) head(clay_by_comp) # Calculate average clay by major vs. minor components clay_summary <- aggregate( claytotal_r ~ majcompflag, data = clay_by_comp, FUN = mean ) clay_summary
SDA_query() for custom SQL queries when predefined functions don't fit your needsTOP, ISNULL(), CAST() for SDA queriesDominant Component, Weighted Average, None, etc.) for your analysisdsn parameter to query local SSURGO databases (see Local SSURGO vignette)SDA_query() for location-based searches (or use SDA_spatialQuery())TOP 10, or a small number of inputs, to verify syntax and result table structurequery_string = TRUE to inspect generated SQLareasymbols or mukeys parametersmakeChunks() section for suggested chunk sizes)| Table | Primary Key | Parent Table | Description |
|-------|-------------|-------------|-------------|
| legend | lkey | N/A | Soil survey area (e.g., CA630) |
| mapunit | mukey | legend | Delineated soil mapping unit |
| component | cokey | mapunit | Soil component within a map unit |
| chorizon | chkey | component | Soil horizon within a component |
| cointerp | cointerpkey | component | Component-level interpretations |
| coecoclass | coecoclasskey | component | Component ecological classes |
| copmgrp | copmgrpkey | component | Component parent material groups |
| mupolygon | mupolygonkey | N/A | Map unit polygons |
| Column | Table | Description | Example |
|--------|-------|-------------|---------|
| areasymbol | legend | Soil survey area code | "CA630" |
| musym | mapunit | Map unit symbol | "7159" |
| muname | mapunit | Map unit name | "Sierra coarse sandy loam, 3 to 8 percent slopes" |
| compname | component | Component/soil series name | "Sierra" |
| comppct_r | component | Component percentage (representative) | 85 |
| majcompflag | component | Major component flag | "Yes"/"No" |
| claytotal_r | chorizon | Total clay percentage | 12.5 |
| hzdept_r | chorizon | Horizon depth top (cm) | 0 |
| hzdepb_r | chorizon | Horizon depth bottom (cm) | 25 |
| hydricrating | component | Hydric soil status | "Yes"/"No" |
| mrulename | cointerp | Interpretation rule name | "FOR - Mechanical Planting Suitability" |
| interphr | cointerp | Interpretation rating/class | "Somewhat Limited" |
| mupolygongeo | mupolygon | Map unit polygon geometry (WKT; WGS84 geographic coordinates in decimal degrees) | "POLYGON (...)" |
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.