find.aliases: Find e-mail and real name aliases

Description Usage Arguments Details Value Author(s) References

Description

Find e-mail and real name aliases and replace them in result of makeforest.

Usage

1
2
3
4
5
6
normalizeauthors(authors)
sortnames(x)
emailfirst(y)
changenames(clusters,forest,accept)
findclusters(v,distance=0.3,not.take.memory)
final(d)

Arguments

authors

A character vector of author names including e-mail adresses (third column of result of makeforest).

x

A character vector (result of normalizeauthors.

y

A character vector (result of sortnames.

clusters

A list. The first element of each list element contains the matched name and the following elements contain the aliases found.

forest

A character vector (result of emailfirst).

accept

A numeric vector containing the numbers of accepted list elements of clusters

v

A character vector (result of changenames).

distance

Numeric. Distance to be used for base::agrep. Defaults to 0.3.

not.take.memory

A list. The first element of each list element contains the matched name and the following elements contain aliases that are not correct.

d

A character vector (result of changenames).

Details

normalizeauthorsDiscard bounces, e-mail domains, companies or locations in paratheses, middle names, titles, and some numbers sortnamesIsolate firest name, last name, and e-mail name. emailfirstWrite e-mail name first, then "|", then real name. changenamesLoad dataset of already accepted aliases first (take.memory.rda). Replace all aliases by a single name. findclustersFinds similar strings based on base::agrep. changenames has to be applied afterwards. finalSome final transformations of author names.

Value

normalizeauthors

A character vector.

sortnames

A character vector.

emailfirst

A character vector.

changenames

A character vector.

findclusters

A list. First element of each list element contains matched name and following elements contain aliases found. They do not have to be correct. Most will be not correct. The list has to be manually checked.

final

A character vector. Insert it into third column of result of makeforest.

Author(s)

Angela Bohn angela.bohn at gmail.com

References

Based on Christian Bird, Alex Gourley, Prem Devanbu, Michael Gertz, and Anand Swaminathan. Mining Email Social Networks. 2006. In Proceedings of the 2006 international workshop on Mining software repositories, Shanghai, China. Pages 137-143. http://macbeth.cs.ucdavis.edu/msr06.pdf


snatm documentation built on May 2, 2019, 5:01 p.m.

Related to find.aliases in snatm...