Description Usage Arguments Details Value Note Author(s) References See Also Examples
The soundexBR function will return a Census-like soundex code for a string given that the Brazilian Portuguese sound system. Each soundex code consists of 4 digits long: a letter and three numbers, such as “0-000” <capital letter><digit><digit><digit>. The integers are assigned to the remaining letters of the last name. They are, therefore, a refinement index based on the way a surname sounds rather than the way it is spelled. This function was firtly outlined to work beside RecordLinkage
package. Nonetheless, soundex codes have been employed in many settings. See details bellow.
1 | soundexBR(term)
|
term |
a list, a vector or a data frame with character strings. |
The soundex is a coded surname (last name) index based on the way a surname sounds rather than the way it is spelled. Surnames that sound the same, but are spelled differently, like SOUZA and SOUSA, have the same code and are filed together. The soundex coding system was developed so that you can find a surname even though it may have been recorded under various spellings.
A character vector or matrix with the same dimensions as term
.
This function is an adaptation of the US census soundex version. See in http://www.archives.gov/research/census/soundex.html
The genealogist Dick Eastman maintain a soundex calcualtor in his website at http://www.eogn.com/soundex/.
Daniel Marcelino
Borg, Andreas and Murat Sariyar. (2012) RecordLinkage: Record Linkage in R, R package version 0.4-1, http://CRAN.R-project.org/package=RecordLinkage.
Camargo Jr. and Coeli CM (2000) Reclink: aplicativo para o relacionamento de bases de dados, implementando o método probabilistic record linkage. Cad. Saúde Pública, 16(2), Rio de Janeiro.
Daniel Marcelino (2010) Sobre dinheiro e eleições: um estudo dos gastos de campanha para o Congresso Nacional em 2002 e 2006.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | # last name with Z
first <- 'João'
last <- 'Souza'
middle <-'Santos'
soundexBR(c(first, middle, last))
# with S, instead of Z
first <- 'João'
last <- 'Sousa'
soundexBR(c(first, last))
# Miscelania
c('João Souza', 'João Sousa', 'Joao dos Santos Souza',
'John Souza') -> names
soundexBR(names)
names <- c('Ana Karolina Kuhnen',
'Ana Carolina Kuhnen', 'Ana Karolina',
'Dilma Vana Rousseff', 'Dilma Rousef')
soundexBR(names)
# Example with RecordLinkage
#Some data:
mydata1 <- data.frame(
fname <- c('Ricardo','Maria','Tereza','Pedro','José', 'Germano'),
lname <- c('Cunha','Andrade','Silva','Soares','Silva','Lima'),
age <- c(67,89,78,65,68,67),
birth <- c(1945,1923,1934,1947,1944,1945),
date <- c(20120907,20120703,20120301,20120805,20121004,20121209) )
mydata2 <- data.frame(
fname <- c('Maria','Lúcia','Paulo','Marcos', 'Ricardo', 'Germanio'),
lname <- c('Andrade','Silva','Soares','Pereira','Cunha','Lima'),
age <- c(67,88,78,60,68,80),
birth <- c(1945,1924,1934,1952,1944,1932),
date <- c(20121208,20121103,20120302,20120105,20121004,20121209) )
# Must call RecordLinkage package
## Not run: pairs <- compare.linkage(mydata1, mydata2,
blockfld = list(c(1,2,4),c(1,2)),
phonetic <- c(1,2), phonfun = soundexBR, strcmp = FALSE,
strcmpfun <- jarowinkler, exclude=FALSE,identity1 = NA,
identity2 = NA, n_match <- NA, n_non_match = NA)
print(pairs)
editMatch(pairs)
# To access information in the object:
weights <- epiWeights(pairs, e = 0.01, f = pairs$frequencies)
hist(weights$Wdata, plot = FALSE) # Plot TRUE
getPairs(pairs, max.weight = Inf, min.weight = -Inf)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.