action spotting; artificial intelligence; soccer videos; deep learning; context-aware loss; highlights generation; sports; sport; football
Abstract :
[en] In video understanding, action spotting consists in temporally localizing human-induced events annotated with single timestamps. In this paper, we propose a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot. We benchmark our loss on a large dataset of soccer videos, SoccerNet, and achieve an improvement of 12.8% over the baseline. We show the generalization capability of our loss for generic activity proposals and detection on ActivityNet, by spotting the beginning and the end of each activity. Furthermore, we provide an extended ablation study and display challenging cases for action spotting in soccer videos. Finally, we qualitatively illustrate how our loss induces a precise temporal understanding of actions and show how such semantic knowledge can be used for automatic highlights generation.
Research Center/Unit :
Montefiore Institute of Electrical Engineering and Computer Science - Montefiore Institute ; Telim TELIM
Disciplines :
Computer science
Author, co-author :
Cioppa, Anthony ✱; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Deliège, Adrien ✱; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Giancola, Silvio ✱
Ghanem, Bernard
Van Droogenbroeck, Marc ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Gade, Rikke
Moeslund, Thomas B.
✱ These authors have contributed equally to this work.
Language :
English
Title :
A Context-Aware Loss Function for Action Spotting in Soccer Videos
Publication date :
June 2020
Event name :
IEEE Conference on Computer Vision and Pattern Recognition
Event organizer :
IEEE
Event place :
Seattle, United States - Washington
Event date :
from 14-06-2020 to 19-06-2020
Audience :
International
Journal title :
IEEE Conference on Computer Vision and Pattern Recognition. Proceedings
ISSN :
1063-6919
eISSN :
2575-7075
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), Washington, United States - District of Columbia
DGTRE - Région wallonne. Direction générale des Technologies, de la Recherche et de l'Énergie FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture KAUST Office of Sponsored Research
Commentary :
Accepted for publication in CVPR2020 main conference. Code source at https://github.com/cioppaanthony/context-aware-loss
Bayesian Optimization. https://github.com/fmfn/BayesianOptimization. Last accessed: 2019-10-20. 5
Code for BMN. https://github.com/JJBOY/BMN-Boundary-Matching-Network. Last accessed: 2019-10-30. 7, 8
Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, and Bernard Ghanem. Diagnosing error in temporal action detectors. In European Conference on Computer Vision (ECCV), September 2018. 2, 7
Humam Alwassel, Fabian Caba Heilbron, and Bernard Ghanem. Action Search: Spotting Targets in Videos and Its Application to Temporal Action Localization. In European Conference on Computer Vision (ECCV), September 2018. 2
Moez Baccouche, Franck Mamalet, Christian Wolf, Christophe Garcia, and Atilla Baskurt. Action classification in soccer videos with long short-term memory recurrent neural networks. In International Conference on Artificial Neural Networks (ICANN), September 2010. 2
Vinay Bettadapura, Caroline Pantofaru, and Irfan Essa. Leveraging contextual cues for generating basketball highlights. In ACM international conference on Multimedia (ACM-MM), October 2016. 2
Lewis Bridgeman, Marco Volino, Jean-Yves Guillemaut, and Adrian Hilton. Multi-Person 3D Pose Estimation and Tracking in Sports. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 1
Shyamal Buch, Victor Escorcia, Bernard Ghanem, Li Fei-Fei, and Juan Carlos Niebles. End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos. In British Machine Vision Conference (BMVC), September 2017. 2
Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, and Juan Carlos Niebles. SST: Single-Stream Temporal Action Proposals. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 2
Zixi Cai, Helmut Neher, Kanav Vats, David A. Clausi, and John Zelek. Temporal hockey action recognition via pose and optical flows. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 2
Joao Carreira and Andrew Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 5
Anthony Cioppa, Adrien Deliege, Maxime Istasse, Christophe De Vleeschouwer, and Marc Van Droogen-broeck. ARTHuS: Adaptive Real-Time Human Segmentation in Sports Through Online Distillation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 1
Anthony Cioppa, Adrien Deliege, and Marc Van Droogen-broeck. A Bottom-Up Approach Based on Semantics for the Interpretation of the Main Camera Stream in Soccer Games. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018. 2
Adrien Deliège, Anthony Cioppa, and Marc Van Droogen-broeck. HitNet: a neural network with capsules embedded in a Hit-or-Miss layer, extended with hybrid data augmentation and ghost capsules. CoRR, abs/1806.06519, 2018. 4, 5
Deloitte. Market size of the European football market from 2006/07 to 2015/16 (in billion euros), 2017. Retrieved October 30, 2019, from https://www.statista.com/statistics/261223/european-soccer-market-total-revenue/. 1
Ahmet Ekin, A Murat Tekalp, and Rajiv Mehrotra. Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7):796-807, 2003. 2
Dirk Farin, Susanne Krabbe, Wolfgang Effelsberg, et al. Robust camera calibration for sport videos using court models. In Storage and Retrieval Methods and Applications for Multimedia 2004, volume 5307, pages 80-91. International Society for Optics and Photonics, 2003. 1
Panna Felsen, Pulkit Agrawal, and Jitendra Malik. What will happen next? Forecasting player moves in sports videos. In IEEE International Conference on Computer Vision (ICCV), October 2017. 1
Jiyang Gao, Kan Chen, and Ram Nevatia. CTAP: Complementary Temporal Action Proposal Generation. In European Conference on Computer Vision (ECCV), September 2018. 8
Jiyang Gao, Zhenheng Yang, Kan Chen, Chen Sun, and Ram Nevatia. TURN TAP: Temporal unit regression network for temporal action proposals. In IEEE International Conference on Computer Vision (ICCV), October 2017. 2
Silvio Giancola, Mohieddine Amine, Tarek Dghaily, and Bernard Ghanem. SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018. 2, 5, 6, 7
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 5
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. 2, 5
Namdar Homayounfar, Sanja Fidler, and Raquel Urtasun. Sports field localization via deep structured models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 1
Chung-Lin Huang, Huang-Chia Shih, and Chung-Yuan Chao. Semantic analysis of soccer video using dynamic Bayesian network. IEEE Transactions on Multimedia, 8(4):749-760, 2006. 2
Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning (ICML), July 2015. 5
Maxime Istasse, Julien Moreau, and Christophe De Vleeschouwer. Associative Embedding for Team Discrimination. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 1
Haohao Jiang, Yao Lu, and Jing Xue. Automatic Soccer Video Event Detection Based on a Deep Neural Network Combined CNN and RNN. In International Conference on Tools with Artificial Intelligence (ICTAI), November 2016. 2
Yu-Gang. Jiang, Jingen Liu, Amir Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Suk-thankar. THUMOS Challenge: Action Recognition with a Large Number of Classes. http://crcv.ucf.edu/THUMOS14/, 2014. 2
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale Video Classification with Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014. 2
A.T. Kearney. Global sports market-total revenue from 2005 to 2017 (in billion U.S. dollars), 2014. Retrieved 2019-10-30 from https://www.statista.com/statistics/370560/worldwide-sports-market-revenue/. 1
Abdullah Khan, Beatrice Lazzerini, Gaetano Calabrese, and Luciano Serafini. Soccer Event Detection. In International Conference on Image Processing and Pattern Recognition (IPPR), April 2018. 2
Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR), May 2015. 5
Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, and Shilei Wen. BMN: Boundary-Matching Network for Temporal Action Proposal Generation. In IEEE International Conference on Computer Vision (ICCV), October 2019. 2, 7, 8
Tianwei Lin, Xu Zhao, and Zheng Shou. Temporal Convolution Based Action Proposal: Submission to ActivityNet 2017. CoRR, abs/1707.06750, 2017. 8
Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, and Ming Yang. BSN: Boundary Sensitive Network for Temporal Action Proposal Generation. In European Conference on Computer Vision (ECCV), September 2018. 2, 8
Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, and Shih-Fu Chang. Multi-granularity Generator for Temporal Action Proposal. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 2
Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, and Tao Mei. Gaussian Temporal Awareness Networks for Action Localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 2
Jikai Lu, Jianhui Chen, and James J. Little. Pan-tilt-zoom SLAM for Sports Videos. In British Machine Vision Conference (BMVC), September 2019. 1
Mehrtash Manafifard, Hamid Ebadi, and Hamid Abrishami Moghaddam. A survey on player tracking in soccer videos. Computer Vision and Image Understanding, 159:19-46, 2017. 1
William McNally, Kanav Vats, Tyler Pinto, Chris Dulhanty, John McPhee, and Alexander Wong. GolfDB: A Video Database for Golf Swing Sequencing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 2
Thomas B. Moeslund, Graham Thomas, and Adrian Hilton. Computer Vision in Sports. Springer, 2014. 2
AJ Piergiovanni and Michael S. Ryoo. Fine-Grained Activity Recognition in Baseball Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018. 2
Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, and Li Fei-Fei. Detecting events and key actors in multi-person videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 2
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 4
Ren Reede and Jose Joemon. Football video segmentation based on video production strategy. In European Conference on Information Retrieval (ECIR), March 2005. 2
Konstantinos Rematas, Ira Kemelmacher-Shlizerman, Brian Curless, and Steve Seitz. Soccer on Your Tabletop. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 1
Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. Dynamic Routing Between Capsules. In Advances in Neural Information Processing Systems 30 (NeurIPS), December 2017. 4, 5
Melissa Sanabria, Frédéric Precioso, Thomas Menguy, et al. A Deep Architecture for Multimodal Summarization of Soccer Games. In ACM International Conference on Multimedia (ACM-MM) Workshops, October 2019. 2
Saikat Sarkar, Amlan Chakrabarti, and Dipti Prasad Mukher-jee. Generation of Ball Possession Statistics in Soccer Using Minimum-Cost Flow Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 1
Pushkar Shukla, Hemant Sadana, Apaar Bansal, Deepak Verma, Carlos Elmadjian, Balasubramanian Raman, and Matthew Turk. Automatic Cricket Highlight Generation Using Event-Driven and Excitement-Based Features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018. 2
Gunnar A. Sigurdsson, Olga Russakovsky, and Abhinav Gupta. What actions are needed for understanding human actions in videos? In IEEE International Conference on Computer Vision (ICCV), October 2017. 2, 7
Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. Hollywood in homes: Crowdsourcing data collection for activity understanding. In European Conference on Computer Vision (ECCV), October 2016. 2
Statista. Computer vision artificial intelligence (AI) market revenues worldwide, from 2015 to 2019, by application (in million U.S. dollars), 2016. Retrieved October 30, 2019, from https://www.statista.com/statistics/641922/worldwide-artificial-intelligence-computer-vision-market-revenues/. 1
Mostafa Tavassolipour, Mahmood Karimian, and Shohreh Kasaei. Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Transactions on Circuits and Systems for Video Technology, 24(2):291-304, 2014. 2
Rajkumar Theagarajan, Federico Pala, Xiu Zhang, and Bir Bhanu. Soccer: Who Has the Ball? Generating Visual Analytics and Player Statistics. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018. 1
Graham Thomas, Rikke Gade, Thomas B. Moeslund, Peter Carr, and Adrian Hilton. Computer vision for sports: Current applications and research topics. Computer Vision and Image Understanding, 159:3-18, 2017. 1, 2
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning Spatiotemporal Features with 3D Convolutional Networks. In IEEE International Conference on Computer Vision (ICCV), December 2015. 5
Grigorios Tsagkatakis, Mustafa Jaber, and Panagiotis Tsakalides. Goal!! Event detection in sports video. Journal of Electronic Imaging, 2017(16):15-20, 2017. 2
Takamasa Tsunoda, Yasuhiro Komori, Masakazu Matsugu, and Tatsuya Harada. Football Action Recognition Using Hierarchical LSTM. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017. 2
Francesco Turchini, Lorenzo Seidenari, Leonardo Galteri, Andrea Ferracani, Giuseppe Becchi, and Alberto Del Bimbo. Flexible Automatic Football Filming and Summarization. In ACM International Conference on Multimedia (ACM-MM) Workshops, October 2019. 2
Kanav Vats, Mehrnaz Fani, Pascale Walters, David A. Clausi, and John Zelek. Event detection in coarsely annotated sports videos via 1d temporal convolutions, 2019. Preprint at https://bit.ly/3b4TiTf. 5
Ying Yang and Danyang Li. Robust player detection and tracking in broadcast soccer video based on enhanced particle filter. Journal of Visual Communication and Image Representation, 46:81-94, 2017. 1
Serena Yeung, Olga Russakovsky, Ning Jin, Mykhaylo An-driluka, Greg Mori, and Li Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos. International Journal of Computer Vision, 126(2-4):375-389, 2018. 2
Huanyu Yu, Shuo Cheng, Bingbing Ni, Minsi Wang, Jian Zhang, and Xiaokang Yang. Fine-Grained Video Caption-ing for Sports Narrative. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 2
Amir R. Zamir, Alexander Sax, William Shen, Leonidas J. Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling Task Transfer Learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 6
Dan Zecha, Moritz Einfalt, and Rainer Lienhart. Refining Joint Locations for Human Pose Tracking in Sports Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 1
Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, and Chuang Gan. Graph con-volutional networks for temporal action localization. In International Conference on Computer Vision, pages 7094-7103, 2019. 8
Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xi-aoou Tang, and Dahua Lin. Temporal action detection with structured segment networks. In IEEE International Conference on Computer Vision (ICCV), October 2017. 2