SoccerNet-v2; SoccerNet; Dataset; Training data; Soccer; Football; Classification; Action; Annotation; Neural network; Deep learning; Machine learning; Artificial intelligence; DeepSport
Abstract :
[en] Understanding broadcast videos is a challenging task in computer vision, as it requires generic reasoning capabilities to appreciate the content offered by the video editing. In this work, we propose SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production. Specifically, we release around 300k annotations within SoccerNet's 500 untrimmed broadcast soccer videos. We extend current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and we define a novel replay grounding task. For each task, we provide and discuss benchmark results, reproducible with our open-source adapted implementations of the most relevant works in the field. SoccerNet-v2 is presented to the broader research community to help push computer vision closer to automatic solutions for more general video understanding and production purposes.
Research center :
Montefiore Institute - Montefiore Institute of Electrical Engineering and Computer Science - ULiège Telim
Disciplines :
Electrical & electronics engineering
Author, co-author :
Deliège, Adrien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Cioppa, Anthony ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Giancola, Silvio
Seikavandi, Meisam
Dueholm, Jacob
Nasrollahi, Kamal
Ghanem, Bernard
Moeslund, Thomas
Van Droogenbroeck, Marc ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Language :
English
Title :
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos
Publication date :
June 2021
Event name :
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), CVsports
Event organizer :
IEEE
Event place :
Nashville, TN, United States
Event date :
du 19 juin 2021 au 25 juin 2021
Audience :
International
Main work title :
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Pages :
4508-4519
Peer reviewed :
Peer reviewed
Name of the research project :
DeepSport
Funders :
DGTRE - Région wallonne. Direction générale des Technologies, de la Recherche et de l'Énergie FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture
Sadiq H. Abdulhussain, Abd R. Ramli, M. I. Saripan, Basheera M. Mahmmod, Syed Al-Haddad, and Wissam A. Jassim. Methods and challenges in shot boundary detection: A review. Entropy, 20(4):214, March 2018. 2
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. YouTube-8M: A large-scale video classification benchmark. CoRR, September 2016. 2
Adrià Arbués Sangüesa, Adriàn Martín, Javier Fernández, Coloma Ballester, and Gloria Haro. Using player's bodyorientation to model pass feasibility in soccer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 3875-3884, June 2020. 1
Noboru Babaguchi, Yoshihiko Kawai, Yukinobu Yasugi, and Tadahiro Kitahashi. Linking live and replay scenes in broadcasted sports video. In ACM workshops on Multimedia, pages 205-208, November 2000. 3
John S. Boreczky and Lawrence A. Rowe. Comparison of video shot boundary detection techniques. In Storage and Retrieval for Still Image and Video Databases IV, pages 170-179, March 1996. 2
Lewis Bridgeman, Marco Volino, Jean-Yves Guillemaut, and Adrian Hilton. Multi-Person 3D Pose Estimation and Tracking in Sports. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2487-2496, June 2019. 1
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. Signature verification using a "siamese" time delay neural network. In International Conference on Neural Information Processing Systems (NIPS), pages 737-744, November 1993. 7
Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4724-4733, 2017. 5
Brandon Castellano. Pyscenedetect: Video scene cut detection and analysis tool, 2014. https://github.com/ Breakthrough/PySceneDetect. 7
Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck, Rikke Gade, and Thomas B. Moeslund. A context-aware loss function for action spotting in soccer videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 13126-13136, 2020. 2, 5, 6, 7, 8
Anthony Cioppa, Adrien Deliège, Maxime Istasse, Christophe De Vleeschouwer, and Marc Van Droogenbroeck. ARTHuS: Adaptive Real-Time Human Segmentation in Sports Through Online Distillation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2505-2514, June 2019. 1
Anthony Cioppa, Adrien Deliège, and Marc Van Droogenbroeck. A bottom-up approach based on semantics for the interpretation of the main camera stream in soccer games. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1846-1855, June 2018. 1
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. The EPIC-KITCHENS dataset: Collection, challenges and baselines. IEEE Transactions on Pattern Analysis and Machine Intelligence, May 2020. 2, 4
Deloitte. Global sports market-total revenue from 2005 to 2017 (in billion u.s. dollars). In Statista-The Statistics Portal, 2017. Retrieved October 30, 2017, from https://www.statista.com/statistics/ 370560/worldwide-sports-market-revenue/. 1
Deloitte. Market size of the european football market from 2006/07 to 2015/16 (in billion euros). In Statista-The Statistics Portal, 2017. Retrieved October 30, 2017, from https: / / www . statista . com / statistics / 261223 / european-soccer-market-total-revenue/. 1
Deloitte. Revenue of the biggest (big five) european soccer leagues from 1996/97 to 2017/18 (in million euros). In Statista-The Statistics Portal, 2017. Retrieved October 30, 2017, from https : / / www . statista.com/statistics/261218/big-fiveeuropean-soccer-leagues-revenue/. 1
Deloitte. Revenue of the top european soccer leagues (big five) from 2006/07 to 2017/18 (in billion euros). In Statista-The Statistics Portal, 2017. Retrieved October 30, 2017, from https://www.statista.com/statistics/ 261225/top-european-soccer-leagues-bigfive-revenue/. 1
Deloitte. Top-20 european football clubs breakdown of revenues 2018/19 season (in million euros). In Statista-The Statistics Portal, 2020. Retrieved October 25, 2020, from https://www.statista.com/statistics/ 271636/revenue-distribution-of-top-20-european-soccer-clubs/. 1
Dirk Farin, Susanne Krabbe, Peter de With, and Wolfgang Effelsberg. Robust camera calibration for sport videos using court models. In Storage and Retrieval Methods and Applications for Multimedia, pages 80-91, December 2003. 1
Panna Felsen, Pulkit Agrawal, and Jitendra Malik. What will happen next? Forecasting player moves in sports videos. In IEEE International Conference on Computer Vision (ICCV), pages 3362-3371, October 2017. 1
Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, and Cuong Duc Dao. The ActivityNet large-scale activity recognition challenge 2018 summary. CoRR, August 2018. 2
Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, and Shyamal Buch. ActivityNet challenge 2017 summary. CoRR, October 2017. 2
Silvio Giancola, Mohieddine Amine, Tarek Dghaily, and Bernard Ghanem. SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1711-1721, June 2018. 1, 2, 3, 4, 5, 7, 8
Lena Gorelick, Moshe Blank, Eli Shechtman, Michal Irani, and Ronen Basri. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247-2253, December 2007. 2
Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, Joanna Materzynska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, et al. The "Something Something" video database for learning and evaluating visual common sense. In IEEE International Conference on Computer Vision (ICCV), pages 5843-5851, October 2017. 2
Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, et al. AVA: A video dataset of spatio-temporally localized atomic visual actions. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 6047-6056, June 2018. 2, 4
Seyed Mohammad Hashemi and Mohammad Rahmati. View-independent action recognition: A hybrid approach. Multimedia Tools and Applications, 75(12):6755-6775, June 2016. 3
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, June 2016. 5
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 961-970, June 2015. 2, 4, 5
Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. CNN architectures for large-scale audio classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 131-135, March 2017. 5
Namdar Homayounfar, Sanja Fidler, and Raquel Urtasun. Sports field localization via deep structured models. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 4012-4020, July 2017. 1
Yichuan Hu, Bo Han, GuijinWang, and Xianggang Lin. Enhanced shot change detection using motion features for soccer video analysis. In IEEE International Conference on Multimedia and Expo (ICME), pages 1555-1558, July 2007. 2
Maxime Istasse, Julien Moreau, and Christophe De Vleeschouwer. Associative Embedding for Team Discrimination. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2477-2486, June 2019. 1
Simeon Jackman. Football shot detection using convolutional neural networks. Master's thesis, Linköping University, June 2019. 2
Hiteshi Jain, Gaurav Harit, and Avinash Sharma. Action quality assessment using siamese network-based deep metric learning. CoRR, February 2020. 3
Yudong Jiang, Kaixu Cui, Leilei Chen, Canjin Wang, and Changliang Xu. Soccerdb: A large-scale database for comprehensive video understanding. In International Workshop on Multimedia Content Analysis in Sports, pages 1-8, October 2020. 2, 4
Yu-Gang. Jiang, Jingen Liu, Amir Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Sukthankar. THUMOS Challenge: Action Recognition with a Large Number of Classes. http://crcv.ucf.edu/ THUMOS14/, 2014. 2, 4
Imran N Junejo, Emilie Dexter, Ivan Laptev, and Patrick Púrez. Cross-view action recognition from temporal selfsimilarities. In European Conference on Computer Vision (ECCV), pages 293-306, October 2008. 3
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale Video Classification with Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014. 2
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. The kinetics human action video dataset. CoRR, May 2017. 2
Gregory R. Koch. Siamese neural networks for one-shot image recognition. In International Conference on Machine Learning (ICML) Deep Learning Workshop, July 2015. 7
Maheshkumar H. Kolekar and Somnath Sengupta. Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Transactions on Broadcasting, 61(2):195-209, June 2015. 3
Hilde Kuehne, Ali Arslan, and Thomas Serre. The language of actions: Recovering the syntax and semantics of goaldirected human activities. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 780-787, June 2014. 2
Hilde Kuehne, Hueihan Jhuang, Estibaliz Garrote, Tomaso. Poggio, and Thomas Serre. HMDB: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV), pages 2556-2563, November 2011. 2
Sébastien Lefèvre and Nicole Vincent. Efficient and robust shot change detection. Journal of Real-Time Image Processing, 2:23-34, August 2007. 2
Mehrtash Manafifard, Hamid Ebadi, and Hamid Abrishami Moghaddam. A survey on player tracking in soccer videos. Computer Vision and Image Understanding, 159:19-46, June 2017. 1
Marcin Marszalek, Ivan Laptev, and Cordelia Schmid. Actions in context. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 2929-2936, June 2009. 2
Pascal Mettes, Jan C van Gemert, and Cees GM Snoek. Spot on: Action localization from pointly-supervised proposals. In European Conference on Computer Vision (ECCV), pages 437-453, October 2016. 2
Thomas B. Moeslund, Graham Thomas, and Adrian Hilton. Computer Vision in Sports. Springer, 2014. 1
Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfreund, Carl Vondrick, et al. Moments in time dataset: one million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2):502-508, February 2020. 2
Mathew Monfort, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, and Aude Oliva. Multimoments in time: Learning and interpreting models for multi-action video understanding. CoRR, November 2019. 2
Juan Carlos Niebles, Chih-Wei Chen, and Li Fei-Fei. Modeling temporal structure of decomposable motion segments for activity classification. In European Conference on Computer Vision (ECCV), pages 392-405, September 2010. 2
Fernando Nogueira. Bayesian Optimization: Open source constrained global optimization tool for Python, 2014. https : / / github . com / fmfn / BayesianOptimization. 6
Kiyotaka Otsuji and Yoshinobu Tonomura. Projection detecting filter for video cut detection. In ACM International Conference on Multimedia, pages 251-257, September 1993. 7
Kiyotaka Otsuji and Yoshinobu Tonomura. Projectiondetecting filter for video cut detection. Multimedia Systems, 1(5):205-210, March 1994. 2
Jian-quan Ouyang, Jin-tao Li, and Yong-dong Zhang. Replay scene based sports video abstraction. In International Conference on Fuzzy Systems and Knowledge Discovery, pages 689-697, August 2005. 3
Luca Pappalardo, Paolo Cintia, A. Rossi, Emanuele Massucco, P. Ferragina, D. Pedreschi, and F. Giannotti. A public data set of spatio-temporal match events in soccer competitions. Scientific Data, 6:236, October 2019. 2
Konstantinos Rematas, Ira Kemelmacher-Shlizerman, Brian Curless, and Steve Seitz. Soccer on your tabletop. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 4738-4747, June 2018. 1
Felix Richter. Super bowl can't hold the candle to the biggest game in soccer. In Statista-The Statistics Portal, 2020. Retrieved October 25, 2020, from https:// www.statista.com/chart/16875/super-bowlviewership-vs-world-cup-final/. 1
Mikel D Rodriguez, Javed Ahmed, and Mubarak Shah. Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8, June 2008. 2
Marcus Rohrbach, Sikandar Amin, Mykhaylo Andriluka, and Bernt Schiele. A database for fine grained activity detection of cooking activities. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1194-1201, June 2012. 2
Marcus Rohrbach, Anna Rohrbach, Michaela Regneri, Sikandar Amin, Mykhaylo Andriluka, Manfred Pinkal, and Bernt Schiele. Recognizing fine-grained and composite activities using hand-centric features and script data. International Journal of Computer Vision, 119(3):346-373, September 2016. 2
Olav A. Nergård Rongved, Steven A. Hicks, Vajira Thambawita, Håkon K. Stensland, Evi Zouganeli, Dag Johansen, Michael A. Riegler, and Pål Halvorsen. Real-time detection of events in soccer videos using 3D convolutional neural networks. In IEEE International Symposium on Multimedia (ISM), December 2020. In press. 2, 7
Debaditya Roy, C Krishna Mohan, and K Sri Rama Murty. Action recognition based on discriminative embedding of actions using siamese networks. In IEEE International Conference on Image Processing (ICIP), pages 3473-3477, October 2018. 3
Saikat Sarkar, Sazid Ali, and Amlan Chakrabarti. Shot classification and replay detection in broadcast soccer video. In Advanced Computing and Systems for Security, pages 57-66, February 2020. 3
Saikat Sarkar, Amlan Chakrabarti, and Dipti Prasad Mukherjee. Generation of Ball Possession Statistics in Soccer Using Minimum-Cost Flow Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2515-2523, June 2019. 1
Christian Schuldt, Ivan Laptev, and Barbara Caputo. Recognizing human actions: a local svm approach. In International Conference on Pattern Recognition (ICPR), pages 32-36, August 2004. 2
Scikit-Video Developers. Scikit-video: Video processing in python, 2015. https://github.com/scikitvideo/ scikit-video. 7
Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. Hollywood in homes: Crowdsourcing data collection for activity understanding. In European Conference on Computer Vision (ECCV), pages 510-526, October 2016. 2, 4
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, December 2012. 2
Rajkumar Theagarajan, Federico Pala, Xiu Zhang, and Bir Bhanu. Soccer: Who has the ball? Generating visual analytics and player statistics. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1830-1838, June 2018. 1
Graham Thomas, Rikke Gade, Thomas B. Moeslund, Peter Carr, and Adrian Hilton. Computer vision for sports: Current applications and research topics. Computer Vision and Image Understanding, 159:3-18, June 2017. 1
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, and Rita Cucchiara. Rms-net: Regression and masking for soccer event spotting. In International Conference on Pattern Recognition (ICPR), 2020. 2, 7
Xiaofeng Tong, Qingshan Liu, and Hanqing Lu. Shot classification in broadcast soccer video. Electronic Letters on Computer Vision and Image Analysis, 7(1):16-25, November 2008. 3
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In IEEE International Conference on Computer Vision (ICCV), pages 4489-4497, 2015. 5
Bastien Vanderplaetse and Stephane Dupont. Improved soccer action spotting using both audio and video streams. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 3921-3931, June 2020. 2, 5, 7
Kanav Vats, Mehrnaz Fani, PascaleWalters, David A Clausi, and John Zelek. Event detection in coarsely annotated sports videos via parallel multi-receptive field 1d convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 882-883, June 2020. 2, 7
Jinjun Wang, EngSiong Chng, and Changsheng Xu. Soccer replay detection using scene transition structure analysis. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages ii/433-ii/436, March 2005. 3
Philippe Weinzaepfel, Xavier Martin, and Cordelia Schmid. Human action localization with sparse spatial supervision. CoRR, May 2016. 2
Wei Xu and Yang Yi. A robust replay detection algorithm for soccer video. IEEE Signal Processing Letters, 18(9):509-512, July 2011. 3
Ying Yang and Danyang Li. Robust player detection and tracking in broadcast soccer video based on enhanced particle filter. Journal of Visual Communication and Image Representation, 46:81-94, July 2017. 1
Ying Yang, Shouxun Lin, Yongdong Zhang, and Sheng Tang. A statistical framework for replay detection in soccer video. In International Symposium on Circuits and Systems, pages 3538-3541, May 2008. 3
Serena Yeung, Olga Russakovsky, Ning Jin, Mykhaylo Andriluka, Greg Mori, and Li Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos. International Journal of Computer Vision, 126(2-4):375-389, April 2018. 2
Junqing Yu, Aiping Lei, Zikai Song, Tingting Wang, Hengyou Cai, and Na Feng. Comprehensive dataset of broadcast soccer videos. In IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 418-423, 2018. 2, 4
Ramin Zabih, Justin Miller, and Kevin Mai. A feature-based algorithm for detecting and classifying scene breaks. In ACM International Conference on Multimedia, pages 189-200, November 1995. 2
Dan Zecha, Moritz Einfalt, and Rainer Lienhart. Refining joint locations for human pose tracking in sports videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2524-2532, June 2019. 1
Hang Zhao, Antonio Torralba, Lorenzo Torresani, and Zhicheng Yan. Hacs: Human action clips and segments dataset for recognition and temporal localization. In IEEE International Conference on Computer Vision (ICCV), pages 8668-8678, October-November 2019. 2, 4
Zhao Zhao, Shuqiang Jiang, Qingming Huang, and Guangyu Zhu. Highlight summarization in sports video based on replay detection. In IEEE International Conference on Multimedia and Expo (ICME), pages 1613-1616, July 2006. 3