Abstract :
[en] Biomedical scientific literature is an unexploited treasure. Due to the staggering number of publications it is literally intractable to gather manually all information. Automatized information extraction (IE) is therefore key. An important subtask is the recognition of names in the text as specific entities ( named entity recognition, NER). NER for genes in biomedical literature is a challenging task. This paper reports preliminary results for the identification of gene names in full text with the naive Bayes, support vector machine and the random forest algorithm, showing that there is no loss on performance compared to the gene NER restricted to abstracts.
Scopus citations®
without self-citations
1