Large expert-curated database for benchmarking document similarity detection in biomedical literature search.

Information Systems; Biochemistry, Genetics and Molecular Biology (all); Agricultural and Biological Sciences (all); General Agricultural and Biological Sciences; General Biochemistry, Genetics and Molecular Biology

Abstract :

[en] Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Disciplines :

Life sciences: Multidisciplinary, general & others

Other collaborator :

Remacle, Claire ; Université de Liège - ULiège > Département des sciences de la vie > Génétique et physiologie des microalgues

Language :

English

Title :

Large expert-curated database for benchmarking document similarity detection in biomedical literature search.

Publication date :

01 January 2019

Journal title :

Database: The Journal of Biological Databases and Curation

ISSN :

1758-0463

Publisher :

Oxford University Press, England

Volume :

2019

Pages :

1 - 67

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://academic.oup.com/database/article-pdf/doi/10.1093/database/baz085/34908546/baz085.pdf

Funders :

Griffith University
QCIF - Queensland Cyber Infrastructure Foundation

Funding text :

Griffith University Gowonda HPC Cluster

Commentary :

participation to the survey

Available on ORBi :

since 09 January 2024

Statistics

Number of views

99 (0 by ULiège)

Number of downloads

353 (0 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Anderson, D. P. (2004) Boinc: a system for public-resource computing and storage. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing. pp. 4-10.
Baeza-Yates, R., Hurtado, C. and Mendoza, M. (2004) Query recommendation using query logs in search engines. In: International Conference on Extending Database Technology. Springer. pp. 588-596.
Beel, J., Breitinger, C., Langer, S. et al. (2016) Towards reproducibility in recommender-systems research. User Model. Useradapt. Interact., 26, 69-101.
Beel, J., Gipp, B., Langer, S. et al. (2016) Research paper recommender systems: a literature survey. Int. J. Digit. Libr., 17, 305-338.
Boughorbel, S., Jarray, F., and El-Anbari, M. (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PLoS One, 12, e0177678.
Boyack, K. W., Newman, D., Duhon, R. J. et al. (2011) Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS One, 6, e18029.
Breitinger, C., Gipp, B. and Langer, S. (2015) Research-paper recommender systems: a literature survey. Int. J. Digit. Libr., 17, 305-338.
Broder, A. (2002) A taxonomy of web search. ACM Sigir Forum, 36, 3-10.
Brown, P. and Zhou, Y. (2017) Biomedical literature: testers wanted for article search tool. Nature, 549, 31.
Caragea, C., Silvescu, A., Mitra, P. and Giles, C. L. (2013) Can't see the forest for the trees?: a citation recommendation system. In: Proceedings of the 13th ACM/IEEE-CS JointConference on Digital libraries. ACM. pp. 111-114.
Garcia Castro, L. J., Berlanga, R. and Garcia, A. (2015) In the pursuit of a semantic similarity metric based on umls annotations for articles in pubmed central open access. J. Biomed. Inform., 57, 204-218.
Chen, T. T. and Lee, M. (2018) Research paper recommender systems on big scholarly data. In: Pacific Rim Knowledge Acquisition Workshop. Springer. pp. 251-260.
Cohen, T., Roberts, K., Gururaj, A. E. et al. (2017) A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 biocaddie dataset retrieval challenge. Database, 2017.
Cooper, S., Khatib, F., Treuille, A. et al. (2010) Predicting protein structures with a multiplayer online game. Nature, 466, 756.
Craswell, N. (2009) Precision at n. Springer US, Boston, MA. pp. 2127-2128.
Davis, M. (2018) Unicode text segmentation. Unicode Standard Annex, 29.
Deng, J., Dong, W., Socher, R. et al. (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 248-255.
Desai, P., Telis, N., Lehmann, B. et al. (2018) Scireader∗: a cloudbased recommender system for biomedical literature. BioRxiv, 333922.
El-Arini, K. and Guestrin, C. (2011) Beyond keyword search: discovering relevant scientific literature. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. pp. 439-447.
Errami, M., Wren, J. D., Hicks, J. M. and Garner, H. R. (2007) eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications. Nucleic Acids Res., 35, W12-W15.
Färber, M., Sampath, A. and Jatowt, A. (2019) Paperhunter: a system for exploring papers and citation contexts. In: European Conference on Information Retrieval. Springer. pp. 246-250.
Fautsch, C. and Savoy, J. (2010) Adapting the tf idf vector-space model to domain specific information retrieval. In: Proceedings of the 2010 ACM Symposium on Applied Computing. pp. 1708-1712.
Fawcett, T. (2006) An introduction to roc analysis. Pattern Recognit.ion Lett., 27, 861-874.
Fiorini, N., Canese, K., Starchenko, G. et al. (2018) Best match: new relevance search for pubmed. PLoS Biol., 16, e2005343.
Fiorini, N., Leaman, R., Lipman, D. J., and Lu, Z. (2018) How user intelligence is improving pubmed. Nature Biotechnol., 36, 937.
Fontaine, J.-F., Barbosa-Silva, A., Schaefer, M. et al. (2009) Medlineranker: flexible ranking of biomedical literature. Nucleic Acids Res., 37, W141-W146.
Gori, M. and Pucci, A. (2006) Research paper recommender systems: a random-walk based approach. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings) (WI'06). IEEE. pp. 778-781.
Hand, E. (2010) Citizen science: people power. Nature News, 466, 685-687.
Hanley, J. A. and McNeil, B. J. (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148, 839-843.
Haruna, K. and Ismail, M. A. (2016) An ontological framework for research paper recommendation. Int. J. Soft Comput., 11, 96-99.
Haruna, K., Ismail, M. A., Damiasih, D. et al. (2017) A collaborative approach for research paper recommender system. PLoS One, 12, e0184516.
Hersh, W., Buckley, C., Leone, T. J. and Hickam, D. (1994) Ohsumed: an interactive retrieval evaluation and new large test collection for research. In: SIGIR'94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. pp. 192-201.
Hersh, W. and Voorhees, E. (2009) Trec genomics special issue overview. Inform. Retrieval, 12, 1-15.
Hersh, W. R. (2005) Report on the trec 2004 genomics track. 39, 21-24.
Hersh, W. R., Cohen, A. M., Yang, J. et al. (2005) Trec 2005 genomics track overview. In: Proceedings of the Fourteenth Text REtrieval Conference.
Jansen, B. J., Spink, A., Bateman, J. and Saracevic, T. (1998) Real life information retrieval: a study of user queries on the web. In: ACM Sigir Forum, Vol. 32. ACM. pp. 5-17.
Jia, H. and Saule, E. (2018) Towards finding non-obvious papers: an analysis of citation recommender systems. preprint arXiv:1812.11252.
Jones, E., Oliphant, T. and Peterson, P. (2001) SciPy: Open Source Scientific Tools for Python. http://www.scipy.org/(7 February 2019, date last accessed).
Jones, K. S., Walker, S. and Robertson, S. E. (2000) A probabilistic model of information retrieval: development and comparative experiments: part 1. Inf. Process. Manage., 36, 779-808.
Jones, K. S., Walker, S., and Robertson, S. E. (2000) A probabilistic model of information retrieval: development and comparative experiments: part 2. Inform. Process. Manage., 36, 809-840.
ECMA International. (2013) The JSON Data Interchange Format. Technical Report Standard ECMA-404 1st Edition/October 2013. ECMA.
Jurman, G., Riccadonna, S., and Furlanello, C. (2012) A comparison of MCC and CEN error measures in multi-class prediction. PLoS One, 7, e41882.
Kans, J. (2018) Entrez direct: e-utilities on the unix command line. In: Entrez Programming Utilities Help [Internet]. National Center for Biotechnology Information, USA.
Kekäläinen, J. and Järvelin, K. (2002) Using graded relevance assessments in ir evaluation. J. Am. Soc. Inf. Sci. Technol., 53, 1120-1129.
Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. pp. 1097-1105.
Küçüktunç, O., Saule, E., Kaya, K. et al. (2013) Towards a personalized, scalable, and exploratory academic recommendation service. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). IEEE. pp. 636-641.
Larson, S. M., Snow, C. D., Shirts, M. R. and Pande, V. S. (2002) Folding@home and genome@home: using distributed computing to tackle previously intractable problems in computational biology. Computat. Genomics.
Levandowsky, M. and Winter, D. (1971) Distance between sets. Nature, 234, 34.
Lin, J. and Wilbur, W. J. (2007) Pubmed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics, 8, 423.
Lingeman, J. M. and Yu, H. (2016) Learning to rank scientific documents from the crowd. arXiv:1611.01400.
Lioma, C. and Blanco, R. (2009) Part of speech based term weighting for information retrieval. In: European Conference on Information Retrieval. pp. 412-423.
Lipscomb, C. E. (2000) Medical Subject Headings (MeSH). Bull. Med. Libr. Assoc., 88, 265.
Lu, Z. (2000) Pubmed and beyond: a survey of web tools for searching biomedical literature. Database, 2011.
Matthews, B. W. (1975) Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim. Biophys. Acta Protein Struct., 405, 442-451.
McNee, S. M., Albert, I., Cosley, D. et al. (2002) On the recommending of citations for research papers. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work. ACM. pp. 116-125.
National Institute of Standards and Technology. (2014) Data-English Relevance Judgements File List. https://trec.nist. gov/data/qrels-eng/(25 September 2018, date last accessed).
NCBI Resource Coordinators. (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 44, D7-D19.
NCBI Resource Coordinators. (2017) Database resources of the National Center for Biotechnology Information. Nucleic Acids R., 2016, D12-D17.
NCBI Resource Coordinators. (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 46, D8-D13.
Poulter, G. L., Rubin, D. L., Altman, R. B., and Seoighe, C. (2008) MScanner: a classifier for retrieving medline citations. BMC Bioinform., 9, 108.
Preminger, M. (2018) Using trec-eval. http://edu.hioa.no/pbib9200/evaluation/about%20trec-eval.pdf (25 September 2018, date last accessed).
Radev, D. R., Qi, H., Harris, W. and Fan, W. (2002) Evaluating web-based question answering systems. In: Proceedings of the Third International Conference on Language Resources and Evaluation.
Roberts, R. J. (2001) Pubmed Central: the genbank of the published literature. Proc. Natl. Acad. Sci., 98, 381.
Rose, D. E. and Levinson, D. (2004) Understanding user goals in web search. In: Proceedings of the 13th International Conference on World Wide Web. pp. 13-19.
Salton, G. and Buckley, C. (1988) Term-weighting approaches in automatic text retrieval. Inform. Process. Manage., 24, 513-523.
Sayers, E. W., Agarwala, R., Bolton, E. E. et al. (2019) Database resources of the national center for biotechnology information. Nucleic Acids Res., 47, D23-D28.
Schuemie, M. J. and Kors, J. A. (2008) Jane: suggesting journals, finding experts. Bioinformatics, 24, 727-728.
Shahmirzadi, O., Lugowski, A. and Younge, K. (2018) Text similarity in vector space models: a comparative study. arXiv:1810.00664.
Sinha, A., Shen, Z., Song, Y. et al. (2015) An overview of microsoft academic service (mas) and applications. In: Proceedings of the 24th International Conference on World Wide Web. ACM. pp. 243-246.
Sugiyama, K. and Kan, M.-Y. (2015) A comprehensive evaluation of scholarly paper recommendation using potential citation papers. Int. J. Digit. Libr., 16, 91-109.
Torres, R., McNee, S. M., Abel, M. et al. (2004) Enhancing digital libraries with techlens+. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM. pp. 228-236.
Tsatsaronis, G., Balikas, G., Malakasiotis, P. et al. (2015) An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC Bioinform., 16, 138.
Voorhees, E. M., Harman, D. K. (2005) TREC: Experiment and Evaluation in Information Retrieval, Vol. 1. MIT Press, Cambridge.
Wei, W., Marmor, R., Singh, S. et al. (2016) Finding related publications: extending the set of terms used to assess article similarity. AMIA Jt. Summits. Transl. Sci. Proc., 2016, 225.
Wilcoxon, F. (1945) Individual comparisons by ranking methods. Biometrics, 1, 80-83.
Yang, F., Zhu, J., Lun, J. et al. (2018) A keyword-based scholar recommendation framework for biomedical literature. In: 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE. pp. 247-252.
Yu, H., Liu, F. and Ramesh, B. P. (2010) Automatic figure ranking and user interfacing for intelligent figure search. PLoS One, 5, e12983.