In FernandoTusell/ipv: Computation of housing price indices

knitr::opts_chunk$set(cache=FALSE, fig.width=7, fig.height=5.5)
options(width=120)

Data

library(ipv)
library(rgdal)
library(rgeos)
library(tidyverse, verbose=FALSE)
data("venta.dep")
data("alquiler.dep")

(Beware: if you are using a publicly available version of this package, such data may have been obscured, distorted and shuffled to preserve confidentiality. You will not be able to reproduce the results in this vignette, although the data provided will suffice to test the use of functions.)

Ancillary functions

The package contains some ancillary functions which are not meant to be called directly by the user(pols and CompFechas) and one (cloromap) which affords easy representation of a variable onto the map of the CAPV (Autonomous Community of the Basque Country) or parts of it.

cloromap

Tomemos las observaciones de inmuebles a la venta y contabilicemos el número de ellas que existen por cada área funcional (codificada en la variable AF): Let+s take the observations corresponding to houses for sale and count the number in each "functional are" (coded in variable AF):

Obs <- venta.dep %>%
  group_by(AF) %>%  
  summarise( Obs= n() ) %>%
  as.data.frame()

The Obs data frame has the following structure:

head(Obs)

In order to represent these numbers on a map, we need an spatial dataframe (spframe) with polygons for the spatial entities concerned (here, functional areas, as defined in AF). We can obtain these polygons as follows:

dsn <- system.file("extdata/CB_AREAS_FUNCIONALES_5000_ETRS89.shp", 
                   package = "ipv")[1]
af  <- readOGR(dsn)

We have to check that the number and correspondence of areas in Obs and in af is consistent:

af@data
unique(Obs$AF)
cbind(af@data$A_FUNC_CAS, 
      levels(sort(unique(Obs$AF))))

There is one-to-one correspondence. We rename the levels in venta.dep with the names in af.

levels(Obs$AF) <- af@data$A_FUNC_CAS

Now we can invoke:

cloromap(sp=af, var.sp="A_FUNC_CAS", df=Obs, var.df="AF",
         var="Obs", legend.title="Obs.\ndisponibles",
         graf.title="Obs. por área funcional",
         midpoint=36000)

Lo que es crucial para que lo anterior funcione es la correspondencia 1-1 entre los niveles de la variable var.sp de la dataframe espacial sp y la variable var.df de la dataframe ordinaria df. El argumento midpoint da el valor "neutro" (codificado con blanco en la escala cromática divergente empleada), que habitualmente querremos hacer coincidir con la mediana de los valores, pero podemos fijar a nuestro antojo.

Fusion

Quite often the names identifying locations are not homogeneous across sources, which makes it difficult to merge information from different files. The function Fusion tries to simplity life a bit in such situation. It can be used in both ways, with one or two dataframes.

Whe used with one dataframe (which has to be 'X', de first argument), it scans the column named in varX, performs a match with the column locX in the translation dataframe canon, and overwrites varX with the canonical names in canon (or fills an NA if a match is not possible).

This sort of invocation with a single dataframe is useful to aggregate. Assume a number of areas, some of which make part of the CAPV (Basque Autonomous Community) while others do not. If we want to aggregate CAPV and Non-CAPV areas, we could construct a translation dataframe canon as follows:

canon <- data.frame(X=c("Bilbao","San Sebastian","Vitoria","Teruel","Soria"),
                    Y=c("Bilbao","Donostia","Vitoria/Gazteiz", "Teruel","Soria"),
                    Canon=c("CAPV","CAPV","CAPV","NoCAPV","NoCAPV"))
canon

Now considera a dataframe such as:

X     <- data.frame(a=c("Bilbao","San Sebastian","Vitoria","Teruel"),
                    b=c(100,70,35,12))
X

In order to reclassify observations in CAPV and Non-CAPV, we can invoke:

Fusion(X, varX="a", locX=1, locCanon=3, canon=canon)

We may also invoke Fusion with two data frames, X and Y, each of them containing, among others, a variable "naming" records: these could be, for instance, names of municipalities, of streets, of districts, or neighbourhoods. For instance, besides X above we may have:

Y     <- data.frame(e=c("Bilbao","Donostia","Vitoria/Gazteiz","Teruel","Soria"),
                    g=c("F","G","H","J","K"))
Y

Here the names of the communities (in column e) do not match those in column a of X above, although we are referring to the same entities (e.g., Donostia is the same as San Sebastian). Merging directly the two dataframes X and Y would fail, but Fusion may be used:

Fusion(X,Y,varX="a",varY="e",locX=1, locY=2, locCanon=2, canon=canon)

Above we are saying that we want to merge X and Y using the columns a and e, after replacing their values by the canonical names in column locCanon of canon. What this in effect does is to take the names in the dataframe Y as canonical. If locCanon is not stated, the canonical names would be taken from the column namec Canon in dataframe canon; this would not make much sense in this case:

Fusion(X,Y,varX="a",varY="e",locX=1, locY=2, canon=canon)

A translation dataframe canon can contain any number of columns, affording flexibility in dealing with data from different sources, in different languages, character codes, etc.

FernandoTusell/ipv documentation built on Nov. 7, 2022, 6:03 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com