dahstat: Statistical summaries of the homogenized data

Description Usage Arguments Details Value See Also Examples

View source: R/depurdat.R

Description

Lists means, standard deviations, quantiles or trends, for a specified period, from series homogenized by homogen.

Usage

1
2
3
4
dahstat(varcli, anyi, anyf, anyip=anyi, anyfp=anyf, stat="me", ndc=NA, vala=2,
cod=NULL, mnpd=0, mxsh=0, prob=.5, last=FALSE, long=FALSE, lsnh=FALSE,
lerr=FALSE, relref=FALSE, mh=FALSE, pernys=100, estcol=c(1,2,4), sep=',',
dec='.', eol='\n', nei=NA, x=NA)

Arguments

varcli

Acronym of the name of the studied climatic variable, as in the data file name.

anyi

Initial year of the homogenized period.

anyf

Final year of the homogenized period.

anyip

First year of the period to analyze. (Defaults to anyi).

anyfp

Last year of the period to analyze. (Defaults to anyf).

stat

Statistical parameter to compute for the selected period:

"me":

Means (default),

"mdn"

Medians,

"max"

Maxima,

"min"

Minima,

"std"

Standard deviations,

"q"

Quantiles (see the prob parameter),

"tnd"

Trends,

"series"

Do not compute any statistics; only output all homogenized series in individual *.csv files.

ndc

Number of decimal places to be saved in the output file (1 by default).

vala

Annual values to compute from the sub-annual data:

0:

None,

1:

Sums,

2:

Means (default),

3:

Maxima,

4:

Minima.

cod

Optional vector of codes of the stations to be processed.

mnpd

Minimum percentage of original data. (0 = no limit).

mxsh

Maximum SNHT. (0 = no limit).

prob

Probability for the computation of quantiles (0.5 by default, i.e., medians). You can set probabilities with more than 2 decimals, but the name of the output file will be identified with the rounded percentile.

last

Logical value to compute statistics only for stations working at the end of the period of study. (FALSE by default).

long

Logical value to compute statistics only for series built from the longest homogeneous sub-period. (FALSE by default).

lsnh

Logical value to compute statistics from series built from the homogeneous sub-period with lowest SNHT. (FALSE by default).

lerr

Logical value to compute statistics only for series built from the homogeneous sub-period with lowest RMSE. (FALSE by default).

relref

If TRUE, statistics from reliable reference series will be also listed. (FALSE by default).

mh

If TRUE, read monthly data computed from daily adjusted series. (FALSE by default).

pernys

Number of years on which to compute trends. (Defaults to 100).

estcol

Columns of the homogenized stations file to be included in the output file. (Defaults to c(1,2,4), the columns of station coordinates and codes).

sep

String to use for separating the output data. (',').

dec

Character to use as decimal point in the output data. ('.').

eol

Line termination style. ('\n').

nei

Number of stations in the input files. (To be read from the *.rda file.)

x

Vector of dates. (To be read from the *.rda file.)

Details

Homogenized data are read from the file ‘VAR_ANYI-ANYF.rda’ saved by homogen, while this function saves the computed data for the specified period in ‘VAR_ANYIP-ANYFP.STAT’, where STAT is substituted by the stat requested statistic. An exception is when stat="q", since then the extension of the output file will be qPP, where PP stands for the specified prob probability (in percent). The output period ANYIP-ANYFP must of course be comprised within the period of the input data, ANYI-ANYF.

Parameters mnpd and mxsh act as filters to produce results only for series that have those minimum percentages of original data and maximum SNHT values. Alternatively, long, last, lsnh and lerr allow the selection of series reconstructed from the preferred homogeneous sub-period, depending on the parameter set to TRUE. However, note that in many cases the shorter sub-periods may have lower SNHT and RMSE values, and therefore parameters lsnh and lerr should be used with caution. The most advisable paramenters to select most suitable reconstructions are long for computing normal values and last for climate monitoring of new incoming data.

to select only those stations working at the end of the period studied. No selection is performed by default, listing the desired statistic for all the reconstructed series (from every homogeneous sub-period).

stat='tnd' computes trends by OLS linear regression on time, listing them in a CSV file ‘*_tnd.csv’ and their p-values in ‘*_pval.csv

If stat='series' is chosen, two text files in CSV format will be produced for every station, one with the data and another with their flags: 0 for original, 1 for infilled and 2 for corrected data. (Not useful for daily series.)

Value

This function does not return any value, since outputs are saved to files.

See Also

homogen, dahgrid.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#Set a temporal working directory and write input files:
wd <- tempdir()
wd0 <- setwd(wd)
data(Ptest)
dim(dat) <- c(720,20)
dat[601:720,5] <- dat[601:720,5]*1.8
write(dat[481:720,1:5],'pcp_1991-2010.dat')
write.table(est.c[1:5,1:5],'pcp_1991-2010.est',row.names=FALSE,col.names=FALSE)
homogen('pcp',1991,2010,std=2)
#Now run the examples:
dahstat('pcp',1991,2010)
dahstat('pcp',1991,2010,stat='tnd')
#Return to user's working directory:
setwd(wd0)
#Input and output files can be found in directory:
print(wd)

Example output

Loading required package: maps
Loading required package: mapdata

HOMOGEN() APPLICATION OUTPUT  (From R's contributed package 'climatol' 3.1.1)

=========== Homogenization of pcp, 1991-2010. (Wed Oct 30 09:03:26 2019)

Parameters: varcli=pcp anyi=1991 anyf=2010 suf=NA nm=NA nref=10,10,4 std=2 swa=NA ndec=1 dz.max=5 dz.min=-5 wd=0,0,100 snht1=25 snht2=25 tol=0.02 maxdif=0.05 mxdif=0.05 maxite=999 force=FALSE wz=0.001 trf=0 mndat=NA gp=3 ini=NA na.strings=NA vmin=NA vmax=NA nclust=100 cutlev=NA grdcol=#666666 mapcol=#666666 hires=TRUE expl=FALSE metad=FALSE sufbrk=m tinc=NA tz=UTC cex=1.2 verb=TRUE

Read 1200 items
Data matrix: 240 data x 5 stations

-------------------------------------------
Stations in the 2 clusters:

$`1`
    Z Code        Name
1 183 S031 Station_031
4 129 S051 Station_051

$`2`
    Z Code        Name
2 125 S047 Station_047
3 100 S098 Station_098
5  79 S081 Station_081

---------------------------------------------
Computing inter-station distances:  1  2  3  4


========== STAGE 1 (SNHT on overlapping temporal windows) ===========

Computation of missing data with outlier removal
(Suggested data replacements are provisional)
  Station(rank) Date: Observed -> Suggested (Anomaly, in std. devs.)
S098(3) 1991-10-01: 298.4 -> 92.9 (5.56)
S081(5) 2002-10-01: 724.68 -> 390.7 (6.02)

Performing shift analysis on the 5 series...


========== STAGE 2 (SNHT on the whole series) =======================

Computation of missing data with outlier removal
(Suggested data replacements are provisional)
  Station(rank) Date: Observed -> Suggested (Anomaly, in std. devs.)
S098(3) 2008-09-01: 111.3 -> 290.8 (-5.09)

Performing shift analysis on the 5 series...

S081(5) breaks at 2000-12-01 (26.3)

Update number of series:  5 + 1 = 6 

Computation of missing data with outlier removal
(Suggested data replacements are provisional)
  Station(rank) Date: Observed -> Suggested (Anomaly, in std. devs.)
S081-2(6) 1993-10-01: 309 -> 111.2 (5.58)

Performing shift analysis on the 6 series...


========== STAGE 3 (Final computation of all missing data) ==========

Computing inter-station weights... (done)

Computation of missing data with outlier removal
(Suggested data replacements are provisional)

The following lines will have one of these formats:
  Station(rank) Date: Observed -> Suggested (Anomaly, in std. devs.)
  Iteration Max.data.difference (Station_code)
2 -5.569 (S081-2)
3 -3.255 (S081-2)
4 -2.049 (S081-2)
5 -1.308 (S081-2)
6 -0.843 (S081-2)
7 -0.547 (S081-2)
8 -0.356 (S081-2)
9 -0.232 (S081-2)
10 -0.152 (S081-2)
11 -0.099 (S081-2)
12 -0.065 (S081-2)
13 -0.043 (S081-2)

Last series readjustment (please, be patient...)

======== End of the homogenization process, after 2.96 secs 

----------- Final computations:

ACmx: Station maximum absolute autocorrelations of anomalies
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2000  0.2275  0.2900  0.2733  0.3000  0.3500 

SNHT: Standard normal homogeneity test (on anomaly series)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.200   2.300   2.400   2.583   2.575   4.600 

RMSE: Root mean squared error of the estimated data
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  30.49   31.56   32.73   36.98   38.67   54.11 

POD: Percentage of original data
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  49.00   57.50   89.50   78.67   96.75   98.00 

  ACmx SNHT RMSE POD Code   Name         
1 0.30 2.6  40.5 98  S031   Station_031  
2 0.20 2.3  31.3 96  S047   Station_047  
3 0.21 2.3  32.2 97  S098   Station_098  
4 0.28 2.5  33.3 83  S051   Station_051  
5 0.35 1.2  54.1 49  S081   Station_081  
6 0.30 4.6  30.5 49  S081-2 Station_081-2

----------- Generated output files: -------------------------

pcp_1991-2010.txt :  This text output 
pcp_1991-2010_out.csv :  List of corrected outliers 
pcp_1991-2010_brk.csv :  List of corrected breaks 
pcp_1991-2010.pdf :  Diagnostic graphics 
pcp_1991-2010.rda :  Homogenization results. Postprocess with (examples):
   dahstat('pcp',1991,2010) #get averages in file pcp_1991-2010-me.csv 
   dahstat('pcp',1991,2010,stat='tnd') #get OLS trends and their p-values 
   dahgrid('pcp',1991,2010,grid=YOURGRID) #get homogenized grids 
   ... (See other options in the package documentation)

Mean values of pcp (1991-2010)
  written to pcp_1991-2010_me.csv 
Trend values of pcp (1991-2010), expressed in units per 100 years,
  written to pcp_1991-2010_tnd.csv 
P-values written to pcp_1991-2010_pval.csv 
[1] "/work/tmp/tmp/Rtmp3qrRat"

climatol documentation built on Aug. 6, 2019, 1:02 a.m.

Related to dahstat in climatol...