# Measuring the error in calculating z In zFactor: Calculate the Compressibility Factor 'z' for Hydrocarbon Gases

## Measures of Error

To demonstrate graphically the difference between correlation and experimental data we will use the Hall-Yarborough correlation results.

library(zFactor)
library(tibble)
library(ggplot2)

zFactor:::z.plot.range("HY", interval = "fine")


## Accuracy measurement

The comparative analysis shows tables with different error measurements:

RMSE:  Root Mean Squared Error
MPE:   Mean Percentage error
MAPE:  Mean Absolute Percentage Error
MSE:   Mean Squared Error
MAE:   Mean Absolute Error
MAAPE: Mean Arc-tangent Absolute Error


where:

$a_t$ are the observed true values. In our case the Standing-Katz chart $z$ values;
$f_t$ are the calculated or predicted values (the $z$ values calculated by the correlations); and
$n$ is the number of samples

RMSE, MSE and MAE are all scale-dependent measures since their scale depend on the scale of the data. MAE is very easy to calculate an understand but it affected by large outliers. MSE is also vulnerable to outliers and in different scale than the measured units. RMSE, the square root of MSE, would be preferrable but still is sensitive to large outliers.

A non-scale dependent measure is MAPE though it could be sensitive to values close or equal to zero since it contains a denominator. To fix the problem with small numbers, a new statistical function was proposed by Kim and Kim in 2016 that applies the Arc Tangent of the MAPE which prevents infinite when finds numbers closer to zero. The geometric explanation can be found if you picture a triangle where the tangent is $| \frac {a_t - f_t} {a_t}|$, and $\theta$ the angle for such triangle. We will see that as ${a_t}$ get smaller and closer to zero, MAPE would tend to infinite, while MAAPE with $Arc Tan$ of the quotient would tend to $\frac {\pi} {2}$.

## RMSE: Root Mean Squared Error

Measure of accuracy, to compare errors of different calculation models for the same dataset.

$$RMSE = \sqrt {\sum_{t=1}^n \frac {(a_t - f_t)^2} {n}}$$

RMSE code

RMSE = sqrt(mean((z.chart - z.calc)^2))

z_hy  <- z.stats("HY")
sum_tpr <- as_tibble(z.stats("HY"))

hy <- ggplot(z_hy, aes(x = Tpr, y = RMSE, col = Tpr)) +
geom_point()  + theme(legend.position="none") +
ggtitle("HY - Root Mean Squared Error")
hy



## MAPE: Mean Absolute Percentage Error

$$MAPE = \frac {100} {n} \sum | \frac {a_t - f_t} {a_t}|$$

MAPE code

MAPE = sum(abs((z.calc - z.chart) / z.chart)) * 100 / n()

# sum_tpr <- as_tibble(z.stats("HY"))

hy <- ggplot(z_hy, aes(x = Tpr, y = MAPE, col = Tpr)) +
geom_point()  + theme(legend.position="none") +
ggtitle("HY - Mean Absolute Percentage Error")
hy



## RSS: Residual sum of Squares

$$RSS = \sum_{t=1}^n (a_t - f_t)^2$$ RSS code

RSS  = sum((z.calc - z.chart)^2)

# sum_tpr <- as_tibble(z.stats("HY"))

hy <- ggplot(z_hy, aes(x = Tpr, y = RSS, col = Tpr)) +
geom_point()  + theme(legend.position="none") +
ggtitle("HY - Residual Sum of Squares")
hy



## MAAPE: Mean Arc-tangent Absolute Error

$$MAAPE = \frac {1} {n} \sum ArcTan \, | \frac {a_t - f_t} {a_t}|$$ MAAPE code

MAE  = sum(atan(abs(z.calc - z.chart))) / n()

hy <- ggplot(z_hy, aes(x = Tpr, y = MAAPE, col = Tpr)) +
geom_point()  + theme(legend.position="none") +
ggtitle("HY - Mean Arc-tangent Absolute Error")
hy

boxplot(z_hy$MAAPE, horizontal = TRUE, main = "HY", xlab = "MAAPE")  ## RMSE vs isotherm for all correlations z_bb <- z.stats("BB") bb <- ggplot(z_bb, aes(x = Tpr, y = RMSE, color = Tpr)) + geom_point() + ylim(0, 0.4) + theme(legend.position="none") + ggtitle("Beggs-Brill") bb  boxplot(z_bb$RMSE,  horizontal = TRUE, main = "BB", xlab = "RMSE")

sum_tpr <- as_tibble(z.stats("HY"))
hy <- ggplot(sum_tpr, aes(x = Tpr, y = RMSE, col = Tpr)) +
geom_point() + ylim(0, 0.4) + theme(legend.position="none") +
ggtitle("Hall-Yarborough")
hy

sum_tpr <- as_tibble(z.stats("DAK"))
dak <- ggplot(sum_tpr, aes(x = Tpr, y = RMSE, col = Tpr)) +
geom_point() + ylim(0, 0.4) + theme(legend.position="none") +
ggtitle("Dranchuk-AbouKassem")
dak

sum_tpr <- as_tibble(z.stats("SH"))
sh <- ggplot(sum_tpr, aes(x = Tpr, y = RMSE, col = Tpr)) +
geom_point() + ylim(0, 0.4) + theme(legend.position="none") +
ggtitle("Shell")
sh

sum_tpr <- as_tibble(z.stats("N10"))
n10 <- ggplot(sum_tpr, aes(x = Tpr, y = RMSE, col = Tpr)) +
geom_point() + ylim(0, 0.4) + theme(legend.position="none") +
ggtitle("Neural-Network-10")
n10

sum_tpr <- as_tibble(z.stats("PP"))
pp <- ggplot(sum_tpr, aes(x = Tpr, y = RMSE, col = Tpr)) +
geom_point() + ylim(0, 0.4) + theme(legend.position="none") +
ggtitle("Papp")
pp

sum_tpr <- as_tibble(z.stats("HY"))
sum_tpr


### Beggs and Brill MAPE and RMSE

z.plot.range(correlation = "BB", stat = "MAPE", interval = "fine")
z.plot.range(correlation = "BB", stat = "RMSE", interval = "fine")


### Hall-Yarborough MAPE and RMSE

z.plot.range(correlation = "HY", stat = "MAPE", interval = "fine")
z.plot.range(correlation = "HY", stat = "RMSE", interval = "fine")


z.plot.range(correlation = "DAK", stat = "MAPE", interval = "fine")
z.plot.range(correlation = "DAK", stat = "RMSE", interval = "fine")


### Shell MAPE and RMSE

z.plot.range(correlation = "SH", stat = "MAPE", interval = "fine")
z.plot.range(correlation = "SH", stat = "RMSE", interval = "fine")


### Neural-Network MAPE and RMSE

z.plot.range(correlation = "N10", stat = "MAPE", interval = "fine")
z.plot.range(correlation = "N10", stat = "RMSE", interval = "fine")


### Papp MAPE and RMSE

z.plot.range(correlation = "PP", stat = "MAPE", interval = "fine")
z.plot.range(correlation = "PP", stat = "RMSE", interval = "fine")


## Try the zFactor package in your browser

Any scripts or data that you put into this service are public.

zFactor documentation built on Aug. 1, 2019, 5:04 p.m.