earthUI User Guide

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

earthUI is a graphical user interface for the R earth package, which implements Multivariate Adaptive Regression Splines (MARS). It runs as a local Shiny application --- no login, no server, no accounts. You launch it from R, import a dataset (CSV or Excel), configure your model, and fit it interactively.

Three Purpose Modes

When you launch earthUI, a Purpose radio button at the top of the sidebar lets you choose one of three modes:

Getting Started

library(earthUI)
launch()

This opens the Shiny app at http://localhost:7878.

Locale & Regional Settings

earthUI supports international number, date, and CSV formatting conventions through a country-based locale system. The Country dropdown in Section 1 of the sidebar (below the file upload) selects a preset for 31 supported countries. Each preset configures:

Override Dropdowns

Below the country selector, four override dropdowns let you change individual settings without switching countries:

When you change the country, all overrides reset to that country's defaults. Changing an override only affects that one setting.

Saving Defaults

Click Save as my default to store your locale preferences globally. These defaults apply to all future sessions regardless of which data file you load. Per-file settings (target, predictors, parameters) are saved separately in the browser's local storage, but locale defaults persist across all files via an SQLite database.

MLS Input Data Requirements

For real estate workflows, your input data typically comes from an MLS export. earthUI accepts CSV and Excel files. On import, column names are automatically converted to snake_case, e.g. "Area ID" gets converted to "area_id". The CSV separator and decimal mark used during import are determined by the locale settings (see "Locale & Regional Settings" above).

Required Columns for Appraisal Mode

The full appraisal workflow (RCA + Sales Grid) benefits from these columns:

| Column | Special Type | Purpose | |:-------|:-------------|:--------| | Sale Price (or equivalent wording) | (target) | Response variable | | Contract Date | contract_date | Computes sale_age from effective date | | Listing Date | listing_date | Fallback DOM = contract date - listing date | | Days on Market | dom | Displayed in Sales Grid | | Concessions | concessions | Net SP = Sale Price - Concessions | | Living Area (SF) | living_area | Per-SF residuals (residual_sf, cqa_sf) | | Lot Size | lot_size | Grouped in "Site Size" row | | Site Dimensions | site_dimensions | Grouped with lot size | | Latitude | latitude | Rounded to 3 dp; proximity calc; Location group | | Longitude | longitude | Rounded to 3 dp; proximity calc; Location group | | Area ID | area | Grouped in "Location" row | | Actual Age | actual_age | Grouped in "Age" row | | Effective Age | effective_age | Grouped with actual age | | Address | display_only | Shown in grid; excluded from model |

Spreadsheet column names can be in a foreign language, the "special" names are in English so that the R program can give them special treatment. Otherwise, the given column names show up in the regression models, graphs and, if doing appraisals, the Intermediate Sales Grid.

Not all columns are required. earthUI adapts --- if a column is missing, the corresponding feature is simply omitted. However, that being said, for real estate pricing models certain columns are highly recommended to achieve acceptable fit:

  1. "Sale Age," which is the number of days between the contract sale date and the effective date of the appraisal or analysis. If multi-year sales history is being used, especially for periods over 5 years, sale_age often plays a central role in estimating the sale price. In fact it is often so important that without it, earthUI fails to provide any model at all.

  2. "Living Area" which also goes by names such as "Living Sqft," "GLA" (gross living area) and so on. This is also another leading determinant of sale price.

  3. "Total Bath Count" is the total number of full, quarter, half and 3/4 bathrooms. For example, two full baths and one half-bath would be a value of 2.5.

  4. "Garage Bays" or "Garage Area" --- the number of garage spaces or the garage square footage.

  5. "Lot Size" --- the land area of the property, typically in square feet or acres.

  6. "Longitude," "Latitude," and if available "Area ID." Adjustments for these will be combined under a single Location adjustment in the Sales Grid.

Data Quality

Subject Row Placement

In Appraisal mode, row 1 must be the subject property. In Market mode, placing the subject in row 1 is optional (use the "Skip first row" checkbox). In General mode, all rows are treated equally.

The Sidebar Workflow

The sidebar is organized into numbered sections:

  1. Import Data --- CSV or Excel file upload
  2. Project Output Folder --- where downloads are saved
  3. Variable Configuration --- target, predictors, data types, special columns
  4. Earth Call Parameters --- degree, penalty, pruning, CV, subset filters
  5. Fit Earth Model --- green button to run the model
  6. Download Output --- Excel file with predictions and diagnostics
  7. Calculate RCA Adjustments (appraisal only)
  8. Generate Sales Grid (appraisal only)
  9. Download Report --- HTML, Word, or PDF via Quarto

Variable Selection

The Predictor Table

Each column is listed with:

Special Column Types

In appraisal and market modes, each predictor can be assigned a special type:

Date & Time:

Monetary:

Size & Location:

Age:

Display:

Only one column per special type (except display_only). A blue badge shows the assigned type next to each variable name.

Downloading Data (Step 6)

After fitting, download an Excel file with predictions and diagnostics. This output is used in Step 7 (RCA) to assign a CQA (Condition/Quality/Appeal) rating to the subject property. The output is sorted by residual_sf and cqa_sf to help you assess where the subject falls in the ranking. If the model is good quality, then the properties should be ranked from lowest appealing to most appealing based on residual features that did not go into the regression. The middle value should be approximately 0, the lower half negative values and the upper half positive values. You should find the worst quality homes, or "fixers" near the bottom of the ranking and the nicest homes at the top. There will usually be exceptions for anomalies such as foreclosures, short sales, probate (inheritance related) sales, and quick sales needed for job change or other reasons.
Investigation of anomalies usually turns up a pertinent reason for the price anomaly.

Output Columns

| Column | Description | |:-------|:------------| | est_<target> | Model prediction (e.g., est_sale_price) | | residual | Actual - predicted | | cqa | Comparative Quality Analysis score (0-10 scale) | | residual_sf | Residual / living area (if designated) | | cqa_sf | CQA calculated from ranking via residual_sf | | <var>_contribution | Per-g-function contribution | | basis | Intercept value contribution (same for all properties) | | calc_residual | Verification column |

Column Ordering & Formatting

The ranking columns are placed leftmost: residual_sf, cqa_sf, residual, cqa. Excel formatting:

CQA Scores

CQA ranks each row's residual against all others on a 0--10 scale:

In appraisal/market modes, rows are sorted by residual_sf descending.

mgcvUI Auto-Export

On every successful fit with degree <= 2, earthUI automatically saves the full result object as an .rds file to the Project Output Folder. The filename follows the pattern <datafile>_earthUI_result_<YYYYMMDD_HHMMSS>.rds.

This file can be loaded by mgcvUI (a companion Shiny app for GAM modeling) using readRDS(). mgcvUI uses the earth model's knot locations and basis functions as starting points for GAM smooth terms, enabling a seamless transition from MARS to GAM modeling.

Models with degree > 2 are skipped because mgcvUI only supports pairwise interactions. A manual Export for mgcvUI button is also available in the sidebar for on-demand export.

RCA Calculations (Step 7, Appraisal Only)

The RCA Dialog

How It Works

  1. Comparables' CQA scores and residuals are sorted
  2. Linear interpolation maps your CQA value to a residual
  3. Subject value = model prediction + interpolated residual

Output Columns

| Column | Description | |:-------|:------------| | subject_value | Model prediction + interpolated residual | | <var>_adjustment | Subject contribution - comp contribution | | residual_adjustment | Subject residual - comp residual | | net_adjustments | Sum of all adjustments | | gross_adjustments | Sum of absolute adjustments | | adjusted_sale_price | Comp sale price + net adjustments |

Sales Comparison Grid (Step 8, Appraisal Only)

Overview

The Sales Grid is a multi-sheet Excel workbook with the subject and selected comparables in a structured format. Excel formulas compute adjustments and adjusted sale prices. Output: SalesGrid_<timestamp>.xlsx.

Comp Selection

A modal dialog shows:

Grid Layout

Each sheet has 20 columns: subject + 3 comps (5 columns each). Rows from top to bottom:

  1. Title and Headers
  2. Address
  3. APN | MLS# | DOM | Subj.Prox --- parcel, listing ID, days on market, Haversine distance (miles)
  4. Sales Price | Concess. | Net SP --- sale price, concessions, Net SP formula
  5. Regression Features header
  6. BASE VALUE --- model intercept
  7. Date of Sale | OffMkt | OnMkt --- contract date, sale age, DOM
  8. Grouped rows (conditional):
  9. "Loc: Long | Lat | Area" --- combined VC from location variables
  10. "Site Size | Dimensions" --- combined VC from site variables
  11. "Actual Age | Effective Age" --- combined VC from age variables
  12. Model variable rows --- one per predictor (excluding grouped vars)
  13. CQA|Residual --- CQA score + remaining residual formula
  14. Residual feature rows --- named + blank rows for appraiser entry
  15. Total VC / Net Adjustment
  16. Net/Gross Adjustment %
  17. Adjusted Sale Price --- formula row

Adjusted Sale Price Formula

Sheet Protection

Sheets are protected. Only the residual feature VC input cells (light yellow background) under CQA|Residual are unlocked for appraiser entry. All formulas, data values, and labels are locked.

Working with the Grid

  1. Open in Excel
  2. Review regression-derived adjustments (locked)
  3. Allocate the residual in the yellow cells (Location, View, Condition, Quality, etc.)
  4. Watch the Remaining Residual decrease and Adjusted Sale Price update
  5. Goal: allocate until Remaining Residual is near zero

Downloading Reports (Step 9)

Three formats via Quarto:

Reports include: dataset description, model specification, allowed interactions, summary metrics, model equation, coefficients, variable importance, g-function plots, correlation matrix, diagnostics, ANOVA, and raw earth output.

Demo Dataset: Appraisal_1.csv

earthUI includes a demo MLS dataset for exploring the appraisal workflow. Load it with:

demo_file <- system.file("extdata", "Appraisal_1.csv", package = "earthUI")
df <- import_data(demo_file)

Or import it directly through the Shiny app file upload.

Description

The file contains 1,502 residential sales (plus 1 subject property in row 1) from a simulated MLS export. The data represents single-family home sales in a multi-area market with a range of property sizes, ages, and locations.

This is not real data, but is based on a realistic neighborhood in Northern California. All identification information has been altered or removed.

Columns

| Column | Type | Special Type | Description | |:-------|:-----|:-------------|:------------| | weight | numeric | --- | Observation weight (0 = exclude from fitting) | | id | numeric | display_only | Internal record ID | | property_id | numeric | display_only | MLS property identifier | | listing_id | character | display_only | MLS listing number | | parcel_number | character | display_only | County assessor parcel number (APN) | | street_address | character | display_only | Property address | | city_name | character | display_only | City | | postal_code | character | display_only | ZIP code | | county_name | character | display_only | County | | contract_date | Date | contract_date | Sale contract date (computes sale_age) | | sale_age | numeric | --- | Days from contract date to effective date (pre-computed) | | coe_date | Date | display_only | Close of escrow date | | listing_status | character | display_only | Listing status (e.g., "Sold") | | sale_price | numeric | (target) | Sale price --- response variable | | rent | numeric | --- | Monthly rent (for multi-target models) | | list_price | numeric | display_only | Listing price | | original_list_price | numeric | display_only | Original listing price | | living_sqft | numeric | living_area | Gross living area in square feet | | beds_total | integer | --- | Number of bedrooms | | baths_total | numeric | --- | Total bath count (e.g., 2.5 = 2 full + 1 half) | | lot_size | numeric | lot_size | Lot size in square feet | | area_id | integer | area | MLS area identifier | | area_text | character | display_only | Area name | | age | numeric | actual_age | Property age in years | | year_built | integer | display_only | Year of construction | | latitude | numeric | latitude | Latitude (rounded to 3 dp for model) | | longitude | numeric | longitude | Longitude (rounded to 3 dp for model) | | latitude6 | numeric | display_only | Full-precision latitude | | longitude6 | numeric | display_only | Full-precision longitude | | garage_spaces | integer | --- | Number of garage bays | | fp_count | integer | --- | Number of fireplaces | | no_of_stories | numeric | --- | Number of stories | | style | character | --- | Architectural style | | view | character | --- | View type (e.g., "Neighborhood", "Hills") | | days_on_market | integer | dom | Days on market | | listing_date | Date | listing_date | Listing date | | sale_concessions | numeric | concessions | Seller concessions |

Suggested Quick Start

  1. Launch earthUI: earthUI::launch()
  2. Import Appraisal_1.csv via the file upload
  3. Set Purpose to For Appraisal
  4. Select sale_price as the target
  5. Assign special types as shown in the table above
  6. Include predictors: sale_age, living_sqft, baths_total, lot_size, area_id (as factor), age, latitude, longitude, garage_spaces
  7. Set degree to 1, click Fit Earth Model
  8. Download intermediate output (Step 6), review the CQA ranking
  9. Compute RCA adjustments (Step 7) with a CQA score of ~5.00
  10. Generate the Sales Comparison Grid (Step 8)


Try the earthUI package in your browser

Any scripts or data that you put into this service are public.

earthUI documentation built on March 26, 2026, 1:07 a.m.