In mcavallaro/outbreak-detection:

The population estimates are based on data from the 2011 Census and refer to the usual resident population on census day, 27th March 2011.

rm(list=ls())
a = readxl::read_excel("Data/Full MOLIS dataset minus PII 20200918.xlsx")
a = as.data.frame(a)
nomi = c("FULLNO", "ORDPATSX", "AGEYEAR", "Patient Postcode", "SAMPLE_DT", "RECEPT_DT", "Reception Month", "Reception Year", "Isolation Site Decoded", "Sterile Site Y N", "emm gene type", "CITY", "ZIP", "PHE_CENTRE_NAME")
dat2 = a[, nomi]

Add latitude & longitude using postcode.to.location2:

source('R/utils.R')
#postcode.df = read.table("Data/ukpostcodes.csv", stringsAsFactors = F, sep=',', header = T)
dat2[,c("latitude", "longitude")] = t(apply(dat2, 1, postcode.to.location2))

#png('spatial_baseline_full.png', width = 3.25, height = 3.25, units = 'in', res=500, pointsize = 4)
#par(mar=c(2,2,4,2))
A=st_read("shape_file_new/Areas.shp")
crs = st_crs(A)
B = st_read("Region__December_2015__Boundaries-shp/Region__December_2015__Boundaries.shp")
B = st_transform(B, crs=crs)
dat2 = dat2[!is.na(dat2$longitude),]
smoothScatter(dat2$longitude, dat2$latitude,  ylim= c(49,60),
  xlab='longitude',
  ylab='latitude', main='Map of cumulative GAS case')
plot(st_geometry(A),  col=NA, border='#e2e2e288', lwd=0.8,  add=T)
plot(st_geometry(B),  col=NA, border='black', lwd=0.8,  add=T)
#dev.off()

Census 2011: Headcounts and Household Estimates for Postcodes in England and Wales

from https://www.nomisweb.co.uk/census/2011/postcode_headcounts_and_household_estimates.

Detail of release

This release contains summary results - 2011 Census estimates of usual residents broken down by sex and the number of households with one or more usual residents - for unit postcodes in England and Wales. In addition to this, a separate table detailing the postcode/OA splits is provided.

The postcode headcount datasets are supplied in CSV format. The two datasets have been split into four files each to make manipulation of the data easier. These files can be combined to form one dataset as required.

Table 2 only contains information on postcodes that are split across Output Areas.

Description of the data

Table 1 is laid out as follows:

Postcode Total (usual residents) Males (usual residents) Females (usual residents) Occupied_Households (Households with at least one usual resident) 7 characters Max 4 characters Max 4 characters Max 4 characters Max. 3 characters

Table 2 is laid out as follows:

Postcode OA_Code Total (usual residents) Percentage 7 characters 9 characters Max. 4 characters Max. 3 characters

The Percentage column on table 2 provides the percentage of all usual residents in the postcode who reside within the provided Output Area’s boundaries.

Table 2 has been updated so that it now includes counts of usual residents by postcode/OA split for all valid postcode splits. This addition provides the ability to aggregate totals to Output Area level.

Please Note: Postcodes with no usual residents have not been included in these tables.

The Postcodes Estimates release provides users with counts of usual residents by postcode. Therefore postcodes which do not contain usual residents are omitted from this product. There were 16,229 postcodes which did not contain usual residents on census day, 27 March 2011. These postcodes contained empty household spaces or communal establishments, workplace addresses, second addresses and households where all residents were Short-term UK residents. In order to avoid confusion, the geography look up file only lists postcodes with usual residents, and omits the 16,229 postcodes without.

Some postcode entries may display a count of zero occupied households along with a count of one or more usual residents. This is due to all usual residents living in communal establishments.

A small number of residences were initially reported in a different output area than the one their postcode was finally allocated to. As output area counts were published prior to the creation of the Postcode Estimates files, the postcodes for these residences were altered to be consistent with the published counts and avoid disclosure.

# Population_AF = read.table("Data/Postcode_Estimates_1_A_F.csv", header=T, sep = ',', stringsAsFactors = F)
# Population_GL = read.table("Data/Postcode_Estimates_1_G_L.csv", header=T, sep = ',', stringsAsFactors = F)
# Population_MR = read.table("Data/Postcode_Estimates_1_M_R.csv", header=T, sep = ',', stringsAsFactors = F)
# Population_SZ = read.table("Data/Postcode_Estimates_1_S_Z.csv", header=T, sep = ',', stringsAsFactors = F)
# Population_ = rbind(Population_AF,Population_GL,Population_MR, Population_SZ)
# #all merged into Postcode_Estimates_Table_1.csv
# Population_MR = NULL
# Population_GL = NULL

Population = read.table("Data/Postcode_Estimates_Table_1.csv", header=T, sep = ',', stringsAsFactors = F)
# > head(Population)
#   Postcode Total Males Females Occupied_Households
# 1  AL1 1AG    14     6       8                   6
# 2  AL1 1AJ   124    60      64                  51
# 3  AL1 1AR    32    17      15                  17
# 4  AL1 1AS    34    17      17                  13
# 5  AL1 1BH    52    15      37                  41
# 6  AL1 1BX    54    26      28                  19
names(Population)[1] = 'Patient Postcode'
Population[,c("latitude", "longitude")] = t(apply(Population, 1, postcode.to.location2))

write.table(Population, file='Data/Postcode_Estimates_Table_with_coordinates3.csv', quote = F, row.names = F, sep=',')
# > tmp = read.table("Data/Postcode_Estimates_Table_with_coordinates3.csv", sep=',', header = T)
# > head(tmp)
#   Patient.Postcode Total Males Females Occupied_Households latitude longitude
# 1          AL1 1AG    14     6       8                   6 51.74529 -0.328628
# 2          AL1 1AJ   124    60      64                  51 51.74450 -0.328599
# 3          AL1 1AR    32    17      15                  17 51.73973 -0.317492
# 4          AL1 1AS    34    17      17                  13 51.74907 -0.335471
# 5          AL1 1BH    52    15      37                  41 51.74685 -0.338000
# 6          AL1 1BX    54    26      28                  19 51.74902 -0.341041

#png('spatial_baseline_full.png', width = 3.25, height = 3.25, units = 'in', res=500, pointsize = 4)
#par(mar=c(2,2,4,2))
A=st_read("shape_file_new/Areas.shp")
crs = st_crs(A)
B = st_read("Region__December_2015__Boundaries-shp/Region__December_2015__Boundaries.shp")
B = st_transform(B, crs=crs)
smoothScatter(Population$longitude, Population$latitude,  ylim= c(49,60),
  xlab='longitude',
  ylab='latitude', main='Map of postocode density')
plot(st_geometry(A),  col=NA, border='#e2e2e288', lwd=0.8,  add=T)
plot(st_geometry(B),  col=NA, border='black', lwd=0.8,  add=T)
#dev.off()

rm(list=ls())
source('R/utils.R')
postcode.df = read.table("Data/ukpostcodes.csv", stringsAsFactors = F, sep=',', header = T)
Population = read.table("Data/Postcode_Estimates_Table_1.csv", header=T, sep = ',', stringsAsFactors = F)
names(Population)[1] = 'Patient Postcode'
Population[,c("latitude", "longitude")] = t(apply(Population, 1, postcode.to.location, postcode.df))
Population$Patient.Postcode = apply(Population, 1, function(x){gsub(' ', '', x[1], fixed=T)})
write.table(Population, file='Data/Postcode_Estimates_Table_with_coordinates3.csv', quote = F, row.names = F, sep=',')