breed assignment; classification; informative SNPs; local breeds; partial least squares; SNP panel
Abstract :
[en] Assignment of individual cattle to a specific breed can often not rely on pedigree information. This is especially the case for local breeds for which the development of genomic assignment tools is required to allow individuals of unknown origin to be included to their herd books. A breed assignment model can be based on two specific stages: (a) the selection of breed-informative markers and (b) the assignment of individuals to a breed with a classification method. However, the performance of combination of methods used in these two stages has been rarely studied until now. In this study, the combination of 16 different SNP panels with four classification methods was developed on 562 reference genotypes from 12 cattle breeds. Based on their performances, best models were validated on three local breeds of interest. In cross-validation, 14 models had a global cross-validation accuracy higher than 90%, with a maximum of 98.22%. In validation, best models used 7,153 or 2,005 SNPs, based on a partial least squares-discriminant analysis (PLS-DA) and assigned individuals to breeds based on nearest shrunken centroids. The average validation sensitivity of the first two best models for the three local breeds of interest were 98.33% and 97.5%. Moreover, results reported in this study suggest that further studies should consider the PLS-DA method when selecting breed-informative SNPs.
Disciplines :
Animal production & animal husbandry
Author, co-author :
Wilmot, Hélène ; Université de Liège - ULiège > Département GxABT > Ingénierie des productions animales et nutrition
Bormann, Jeanne; Gouvernement du Grand Duché de Luxembourg Ministère de l'Agriculture, de la Viticulture et du Développement rural > Administration des services techniques de l'agriculture > Service de la production animale
Soyeurt, Hélène ; Université de Liège - ULiège > Département GxABT > Modélisation et développement
Hubin, Xavier; awé groupe > elevéo > R&D
Glorieux, Géry; awé groupe > elevéo > Service bovins viande
Mayeres, Patrick; awé groupe > elevéo
Bertozzi, Carlo; awé groupe > elevéo
Gengler, Nicolas ; Université de Liège - ULiège > Département GxABT > Ingénierie des productions animales et nutrition
Language :
English
Title :
Development of a genomic tool for breed assignment by comparison of different classification models: Application to three local cattle breeds
Publication date :
January 2022
Journal title :
Journal of Animal Breeding and Genetics
ISSN :
0931-2668
eISSN :
1439-0388
Publisher :
Wiley, Oxford, United Kingdom
Volume :
139
Issue :
1
Pages :
40-61
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Commentary :
This is the accepted version of the following article: Wilmot H, Bormann J, Soyeurt H, Hubin X, Glorieux G, Mayeres P, Bertozzi C, Gengler N. Development of a genomic tool for breed assignment by comparison of different classification models: Application to three local cattle breeds. J Anim Breed Genet. 2022 Jan;139(1):40-61. doi: 10.1111/jbg.12643. Epub 2021 Aug 24. PMID: 34427366., which has been published in final form at https://onlinelibrary.wiley.com/doi/full/10.1111/jbg.12643
Alexander, D. H., Novembre, J., & Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655–1664. https://doi.org/10.1101/gr.094052.109
Baumung, R., Cubric-Curik, V., Schwend, K., Achmann, R., & Sölkner, J. (2006). Genetic characterisation and breed assignment in Austrian sheep breeds using microsatellite marker information. Journal of Animal Breeding and Genetics, 123(4), 265–271. https://doi.org/10.1111/j.1439-0388.2006.00583.x
Bertolini, F., Galimberti, G., Calò, D. G., Schiavo, G., Matassino, D., & Fontanesi, L. (2015). Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: Application in cattle breeds. Journal of Animal Breeding and Genetics, 132(5), 346–356. https://doi.org/10.1111/jbg.12155
Bertolini, F., Galimberti, G., Schiavo, G., Mastrangelo, S., Di Gerlando, R., Strillacci, M. G., Bagnato, A., Portolano, B., & Fontanesi, L. (2018). Preselection statistics and random forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds. Animal, 12(1), 12–19. https://doi.org/10.1017/S1751731117001355
Boulesteix, A. L., Bender, A., Bermejo, J. L., & Strobl, C. (2012). Random forest Gini importance favours SNPs with large minor allele frequency: Impact, sources and recommendations. Briefings in Bioinformatics, 13(3), 292–304. https://doi.org/10.1093/bib/bbr053
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1017/CBO9781107415324.004
Chang, C. C., Chow, C. C., Tellier, L. C. A. M., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience, 4(7), 1–16. https://doi.org/10.1186/s13742-015-0047-8
Dalvit, C., De Marchi, M., Dal Zotto, R., Gervaso, M., Meuwissen, T., & Cassandro, M. (2008). Breed assignment test in four Italian beef cattle breeds. Meat Science, 80(2), 389–395. https://doi.org/10.1016/j.meatsci.2008.01.001
Dalvit, C., De Marchi, M. D., Targhetta, C., Gervaso, M., & Cassandro, M. (2008). Genetic traceability of meat using microsatellite markers. Food Research International, 41, 301–307. https://doi.org/10.1016/j.foodres.2007.12.010
Despagne, F., Massart, L. D., & Chabot, P. (2000). Development of a robust calibration model for nonlinear in-line process data. Analytical Chemistry, 72(7), 1657–1665. https://doi.org/10.1021/ac991076k
Dimauro, C., Cellesi, M., Steri, R., Gaspa, G., Sorbolini, S., Stella, A., & Macciotta, N. P. P. (2013). Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes. Animal Genetics, 44, 377–382. https://doi.org/10.1111/age.12021
Ding, L., Wiener, H., Abebe, T., Altaye, M., Go, R. C. P., Kercsmar, C., Grabowski, G., Martin, L. J., Khurana Hershey, G. K., Chakorborty, R., & Baye, T. M. (2011). Comparison of measures of marker informativeness for ancestry and admixture mapping. BMC Genomics, 12, 622. https://doi.org/10.1186/1471-2164-12-622
Frkonja, A., Gredler, B., Schnyder, U., Curik, I., & Sölkner, J. (2012). Prediction of breed composition in an admixed cattle population. Animal Genetics, 43(6), 696–703. https://doi.org/10.1111/j.1365-2052.2012.02345.x
Funkhouser, S. A., Bates, R. O., Ernst, C. W., Newcom, D., & Steibel, J. P. (2017). Estimation of genome-wide and locus-specific breed composition in pigs. Translational Animal Science, 1(1), 36–44. https://doi.org/10.2527/tas2016.0003
Gebrehiwot, N. Z., Strucken, E. M., Marshall, K., Aliloo, H., & Gibson, J. P. (2021). SNP panels for the estimation of dairy breed proportion and parentage assignment in African crossbred dairy cattle. Genetics Selection Evolution, 53(21), 1–18. https://doi.org/10.1186/s12711-021-00615-4
Gobena, M., Elzo, M. A., & Mateescu, R. G. (2018). Population structure and genomic breed composition in an Angus-Brahman crossbred cattle population. Frontiers in Genetics, 9, 90. https://doi.org/10.3389/fgene.2018.00090
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.; Springer ed.). Springer. https://doi.org/10.1007/978-1-4419-9863-7_941
He, J., Guo, Y., Xu, J., Li, H., Fuller, A., Tait, R. G., Wu, X.-L., & Bauck, S. (2018). Comparing SNP panels and statistical methods for estimating genomic breed composition of individual animals in ten cattle breeds. BMC Genetics, 19, 56. https://doi.org/10.1186/s12863-018-0654-3
Henson, E. L. (FAO). (1992). The need for conservation. In FAO (Ed.), In situ conservation of livestock and poultry (pp. 21–36). FAO and UNEP.
Hulsegge, B., Calus, M. P. L., Windig, J. J., Hoving-Bolink, A. H., Maurice-van Eijndhoven, M. H. T., & Hiemstra, S. J. (2013). Selection of SNP from 50K and 777K arrays to predict breed of origin in cattle. Journal of Animal Science, 91(11), 5128–5134. https://doi.org/10.2527/jas.2013-6678
Hulsegge, I., Schoon, M., Windig, J., Neuteboom, M., Hiemstra, S. J., & Schurink, A. (2019). Development of a genetic tool for determining breed purity of cattle. Livestock Science, 223, 60–67. https://doi.org/10.1016/j.livsci.2019.03.002
Iquebal, M. A., Ansari, M. S., Sarika, S., Dixit, S. P., Verma, N. K., Aggarwal, R. A. K., Jayakumar, S., Rai, A., & Kumar, D. (2014). Locus minimization in breed prediction using artificial neural network approach. Animal Genetics, 45(6), 898–902. https://doi.org/10.1111/age.12208
Jolliffe, I. T. (2002). Principal components analysis. Springer series in statistics (2nd ed.). Springer-Verlag. https://doi.org/10.1016/B978-0-08-044894-7.01358-0
Josse, J., & Husson, F. (2012). Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique, 153(2), 77–99.
Judge, M. M., Kelleher, M. M., Kearney, J. F., Sleator, R. D., & Berry, D. P. (2017). Ultra-low-density genotype panels for breed assignment of Angus and Hereford cattle. Animal, 11(6), 938–947. https://doi.org/10.1017/S1751731116002457
Kersbergen, P., van Duijn, K., Kloosterman, A. D., den Dunnen, J. T., Kayser, M., & de Knijff, P. (2009). Developing a set of ancestry-sensitive DNA markers reflecting continental origins of humans. BMC Genetics, 10, 69. https://doi.org/10.1186/1471-2156-10-69
Kuehn, L. A., Keele, J. W., Bennett, G. L., McDaneld, T. G., Smith, T. P. L., Snelling, W. M., Sonstegard, T. S., & Thallman, R. M. (2011). Predicting breed composition using breed frequencies of 50,000 markers from the US Meat Animal Research Center 2,000 bull project. Journal of Animal Science, 89(6), 1742–1750. https://doi.org/10.2527/jas.2010-3530
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26. https://doi.org/10.18637/jss.v028.i05
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. In S. Imprint (Ed.), Applied predictive modeling. Spinger Nature. https://doi.org/10.1007/978-1-4614-6849-3
Lê, S., Josse, F., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25(1), 1–18. https://doi.org/10.18637/jss.v025.i01
Lewis, J., Abas, Z., Dadousis, C., Lykidis, D., Paschou, P., & Drineas, P. (2011). Tracing cattle breeds with principal components analysis ancestry informative SNPs. PLoS One, 6(4), e18007. https://doi.org/10.1371/journal.pone.0018007
Nikolic, N., Park, Y.-S., Sancristobal, M., Lek, S., & Chevalet, C. (2009). What do artificial neural networks tell us about the genetic structure of populations? The example of European pig populations. Genetics Research, 91(2), 121–132. https://doi.org/10.1017/S0016672309000093
Padilla, J. Á., Sansinforiano, E., Parejo, J. C., Rabasco, A., & Martínez-Trancón, M. (2009). Inference of admixture in the endangered Blanca Cacereña bovine breed by microsatellite analyses. Livestock Science, 122(2–3), 314–322. https://doi.org/10.1016/j.livsci.2008.09.016
Paschou, P., Ziv, E., Burchard, E. G., Choudhry, S., Rodriguez-Cintron, W., Mahoney, M. W., & Drineas, P. (2007). PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genetics, 3(9), 1672–1686. https://doi.org/10.1371/journal.pgen.0030160
Pasupa, K., Rathasamuth, W., & Tongsima, S. (2020). Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique. BMC Bioinformatics, 21, 216. https://doi.org/10.1186/s12859-020-3471-4
Pokorska, J., Kułaj, D., Dusza, M., Żychlińska-Buczek, J., & Makulska, J. (2016). New rapid method of DNA isolation from milk somatic cells. Animal Biotechnology, 27(2), 113–117. https://doi.org/10.1080/10495398.2015.1116446
Pongpanich, M., Sullivan, P. F., & Tzeng, J.-Y. (2010). A quality control algorithm for filtering SNPs in genome-wide association studies. Bioinformatics, 26(14), 1731–1737. https://doi.org/10.1093/bioinformatics/btq272
Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A., & Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38, 904–909. https://doi.org/10.1038/ng1847
Purcell, S., & Chang, C. (2019). PLINK v1.9. www.cog-genomics.org/plink/1.9/
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Maller, J., Sklar, P., de Bakker, P. I. W., Daly, M. J., & Sham, P. C. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81(3), 559–575. https://doi.org/10.1086/519795
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.r-project.org/
RStudio Team (2020). RStudio: Integrated development for R. RStudio. http://www.rstudio.com/
Schiavo, G., Bertolini, F., Galimberti, G., Bovo, S., Dall'Olio, S., Nanni Costa, L., Gallo, M., & Fontanesi, L. (2019). A machine learning approach for the identification of population-informative markers from high-throughput genotyping data: Application to several pig breeds. Animal, 14, 223–232. https://doi.org/10.1017/S1751731119002167
Soyeurt, H., Grelet, C., McParland, S., Calmels, M., Coffey, M., Tedde, A., Delhez, P., Dehareng, F., & Gengler, N. (2020). A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra. Journal of Dairy Science, 103(12), 11585–11596. https://doi.org/10.3168/jds.2020-18870
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567–6572. https://doi.org/10.1073/pnas.082099299
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. Evolution, 38(6), 1358–1370. https://doi.org/10.2307/2408641
Wilkinson, S., Wiener, P., Archibald, A. L., Law, A., Schnabel, R. D., McKay, S. D., Taylor, J. F., & Ogden, R. (2011). Evaluation of approaches for identifying population informative markers from high density SNP Chips. BMC Genetics, 12, 45. https://doi.org/10.1186/1471-2156-12-45
Wright, S. (1951). The genetical structure of populations. Annals of Eugenics, 15, 323–354. https://doi.org/10.1111/j.1469-1809.1949.tb02451.x