Paper published in a book (Scientific congresses and symposiums)
Ethnicity sensitive author disambiguation using semi-supervised learning
Louppe, Gilles; Al-Natsheh, Hussein; Susik, Mateusz et al.
2015In Communications in Computer and Information Science
Peer reviewed
 

Files


Full Text
1508.07744.pdf
Author preprint (389.41 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Computer Science - Digital Libraries; Computer Science - Information Retrieval; Statistics - Machine Learning
Abstract :
[en] Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms. Among solutions to this problem, digital libraries are increasingly offering tools for authors to manually curate their publications and claim those that are theirs. Indirectly, these tools allow for the inexpensive collection of large annotated training data, which can be further leveraged to build a complementary automated disambiguation system capable of inferring patterns for identifying publications written by the same person. Building on more than 1 million publicly released crowdsourced annotations, we propose an automated author disambiguation solution exploiting this data (i) to learn an accurate classifier for identifying coreferring authors and (ii) to guide the clustering of scientific publications by distinct authors in a semi-supervised way. To the best of our knowledge, our analysis is the first to be carried out on data of this size and coverage. With respect to the state of the art, we validate the general pipeline used in most existing solutions, and improve by: (i) proposing phonetic-based blocking strategies, thereby increasing recall; and (ii) adding strong ethnicity-sensitive features for learning a linkage function, thereby tailoring disambiguation to non-Western author names whenever necessary.
Disciplines :
Computer science
Author, co-author :
Louppe, Gilles  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Big Data
Al-Natsheh, Hussein
Susik, Mateusz
Maguire, Eamonn
Language :
English
Title :
Ethnicity sensitive author disambiguation using semi-supervised learning
Publication date :
31 August 2015
Event name :
Knowledge Engineering and Semantic Web (KESW 2016)
Event date :
2016
Audience :
International
Main work title :
Communications in Computer and Information Science
Collection name :
649
Peer reviewed :
Peer reviewed
Available on ORBi :
since 28 June 2018

Statistics


Number of views
44 (1 by ULiège)
Number of downloads
68 (0 by ULiège)

Scopus citations®
 
60
Scopus citations®
without self-citations
59

Bibliography


Similar publications



Contact ORBi