action anticipation; dataset; football; soccer; soccernet; sports; transformer; Action anticipations; Baseline methods; Broadcast video; Dataset; Football actions; Sport video; Transformer; Video broadcasts; Computer Vision and Pattern Recognition; Electrical and Electronic Engineering; SoccerNet
Abstract :
[en] Artificial intelligence has revolutionized the way we analyze sports videos, whether to understand the actions of games in long untrimmed videos or to anticipate the player s motion in future frames. Despite these efforts, little attention has been given to anticipating game actions before they occur. In this work, we introduce the task of action anticipation for football broadcast videos, which consists in predicting future actions in unobserved future frames, within a five- or ten-second anticipation window. To benchmark this task, we release a new dataset, namely the SoccerNet Ball Action Anticipation dataset, based on SoccerNet Ball Action Spotting. Additionally, we propose a Football Action ANticipation TRAnsformer (FAANTRA), a baseline method that adapts FUTR, a state-of-the-art action anticipation model, to predict ball-related actions. To evaluate action anticipation, we introduce new metrics, including mAP@d, which evaluates the temporal precision of predicted future actions, as well as mAP@8, which evaluates their occurrence within the anticipation window. We also conduct extensive ablation studies to examine the impact of various task settings, input configurations, and model architectures. Experimental results highlight both the feasibility and challenges of action anticipation in football videos, providing valuable insights into the design of predictive models for sports analytics. By forecasting actions before they unfold, our work will enable applications in automated broadcasting, tactical analysis, and player decisionmaking. Our dataset and code are publicly available at https://github.com/MohamadDalal/FAANTRA.
Research Center/Unit :
Montefiore Institute - Montefiore Institute of Electrical Engineering and Computer Science - ULiège TELIM VIULab
Disciplines :
Electrical & electronics engineering
Author, co-author :
Dalal, Mohamad; Aalborg University, Denmark
Xarles, Artur; Universitat de Barcelona, Spain ; Computer Vision Center, Spain
Cioppa, Anthony ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Giancola, Silvio; KAUST, Hong Kong
Van Droogenbroeck, Marc ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Télécommunications
Ghanem, Bernard; KAUST, Hong Kong
Clapes, Albert; Universitat de Barcelona, Spain ; Computer Vision Center, Spain
Escalera, Sergio; Aalborg University, Denmark ; Universitat de Barcelona, Spain ; Computer Vision Center, Spain
Moeslund, Thomas B.; Aalborg University, Denmark
Language :
English
Title :
Action Anticipation from Soccernet Football Video Broadcasts
Publication date :
June 2025
Event name :
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
This work has been partially supported by the Spanish project PID2022-136436NB-I00 and by ICREA under the ICREA Academia programme. This work is supported by the KAUST Center of Excellence for Generative AI under award number 5940.
Yazan Abu Farha and Juergen Gall. Uncertainty-aware anticipation of activities. In IEEE/CVF Int. Conf. Comput. Vis. Work. (ICCV Work.), pages 1197-1204, Seoul, South Korea, 2019.
Yazan Abu Farha, Qiuhong Ke, Bernt Schiele, and Juergen Gall. Long-term anticipation of activities with cycle consistency. In DAGM German Conference on Pattern Recognition, pages 159-173, 2021.
Ruslan Baikulov. Solution for SoccerNet ball action spotting challenge 2023. https://github. com/lRomul/ball-action-spotting, 2023.
Kai-Shiang Chang, Wei-Yao Wang, and Wen-Chih Peng. Where will players move next? dynamic graphs and hierarchical fusion for movement forecasting in badminton. In AAAI, pages 6998-7005, Washington, D. C., USA, 2023.
Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck, Rikke Gade, and Thomas B. Moeslund. A context-aware loss function for action spotting in soccer videos. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 13123-13133, Seattle, WA, USA, 2020.
Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk, Zuzanna Mróz, Szymon Lukasik, Michal Halón, Hassan Mkhallati, Adrien Deliège, Carlos Hinojosa, Karen Sanchez, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Ghanem Bernard, Marc Van Droogenbroeck, Adam Gorski, Albert Clapés, Andrei Boiarov, Anton Afanasiev, Artur Xarles, Atom Scott, ByoungKwon Lim, Calvin Yeung, Cristian Gonzalez, Dominic Rüfenacht, Enzo Pacilio, Fabian Deuser, Faisal Sami Altawijri, Francisco Cachón, HanKyul Kim, Haobo Wang, Hyeonmin Choe, Hyunwoo J Kim, Il-Min Kim, Jae-Mo Kang, Jamshid Tursunboev, Jian Yang, Jihwan Hong, Jimin Lee, Jing Zhang, Junseok Lee, Kexin Zhang, Konrad Habel, Licheng Jiao, Linyi Li, Marc Gutíerrez-Pérez, Marcelo Ortega, Menglong Li, Nikita Lopatto, Milosz an Kasatkin, Norbert Nemtsev, Nikolay an Oswald, Oleg Udin, Pavel Kononov, Pei Geng, Saad Ghazai Alotaibi, Sehyung Kim, Sergei Ulasen, Sergio Escalera, Shanshan Zhang, Shuyuan Yang, Sunghwan Moon, Thomas B. Moeslund, Vasyl Shandyba, Vladimir Golovkin, Wei Dai, Won-Taek Chung, Xinyu Liu, Yongqiang Zhu, Youngseo Kim, Yuan Li, Yuting Yang, Yuxuan Xiao, and Zhihao Cheng, Zehua an Li. SoccerNet 2024 challenges results. arXiv, 2024.
Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Hakan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei Zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, and Ziyu Meng. SoccerNet 2023 challenges results. Sports Eng., 27 (2): 1-18, 2024.
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100. Int. J. Comput. Vis., 130 (1): 33-55, 2021.
Roeland De Geest, Efstratios Gavves, Amir Ghodrati, Zhenyang Li, Cees Snoek, and Tinne Tuytelaars. Online action detection. In Eur. Conf. Comput. Vis. (ECCV), pages 269-284. 2016.
Adrien Deliège, Anthony Cioppa, Silvio Giancola, Meisam J. Seikavandi, Jacob V. Dueholm, Kamal Nasrollahi, Bernard Ghanem, Thomas B. Moeslund, and Marc Van Droogenbroeck. SoccerNet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), pages 4503-4514, Nashville, TN, USA, 2021.
Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi, and Romain Hérault. COMEDIAN: Self-supervised learning and knowledge distillation for action spotting using transformers. In IEEE/CVF Winter Conf. Appl. Comput. Vis. Work. (WACVW), pages 518-528, Waikoloa, HI, USA, 2024.
Yazan Abu Farha, Alexander Richard, and Juergen Gall. When will you do what? Anticipating temporal occurrences of activities. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 5343-5352, Salt Lake City, UT, USA, 2018.
Panna Felsen, Pulkit Agrawal, and Jitendra Malik. What will happen next? forecasting player moves in sports videos. In IEEE Int. Conf. Comput. Vis. (ICCV), pages 3362-3371, Venice, Italy, 2017.
Antonino Furnari and Giovanni Maria Farinella. Rollingunrolling LSTMs for action anticipation from first-person video. IEEE Trans. Pattern Anal. Mach. Intell., 43 (11): 4021-4036, 2021.
Harshala Gammulle, Simon Denman, Sridha Sridharan, and Clinton Fookes. Forecasting future action sequences with neural memory networks. In Br. Mach. Vis. Conf. (BMVC), pages 1-12, Cardiff, United Kingdom, 2019.
Silvio Giancola and Bernard Ghanem. Temporally-aware feature pooling for action spotting in soccer broadcasts. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), pages 4485-4494, Nashville, TN, USA, 2021.
Silvio Giancola, Mohieddine Amine, Tarek Dghaily, and Bernard Ghanem. SoccerNet: A scalable dataset for action spotting in soccer videos. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), pages 1792-179210, Salt Lake City, UT, USA, 2018.
Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, Rengang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei Zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, Yaqian Zhao, Yi Yu, Yingying Li, Yue He, Yujie Zhong, Zhenhua Guo, and Zhiheng Li. SoccerNet 2022 challenges results. In Int. ACM Work. Multimedia Content Anal. Sports (MMSports), pages 75-86, Lisbon, Port., 2022.
Silvio Giancola, Anthony Cioppa, Bernard Ghanem, and Marc Van Droogenbroeck. Deep learning for action spotting in association football videos. arXiv, abs/2410. 01304, 2024.
Rohit Girdhar and Kristen Grauman. Anticipative video transformer. In IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pages 13485-13495, Montreal, QC, Canada, 2021.
Ryota Goka, Yuya Moroto, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. What to do and where to go next? Action prediction in soccer using multimodal coattention transformer. In Int. ACM Work. Multimedia Content Anal. Sports (MMSports), pages 75-79, Melbourne, Victoria, Aust., 2024.
Dayoung Gong, Joonseok Lee, Manjin Kim, Seong Jong Ha, and Minsu Cho. Future transformer for long-term action anticipation. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 3042-3051, New Orleans, LA, USA, 2022.
Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, and Jitendra Malik. Ego4D: Around the world in 3, 000 hours of egocentric video. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 18973-18990, New Orleans, LA, USA, 2022.
Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. AVA: A video dataset of spatio-temporally localized atomic visual actions. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 6047-6056, Salt Lake City, UT, USA, 2018.
Jan Held, Anthony Cioppa, Silvio Giancola, Abdullah Hamdi, Bernard Ghanem, and Marc Van Droogenbroeck. VARS: Video assistant referee system for automated soccer decision making from multiple views. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), pages 5086-5097, Vancouver, Can., 2023.
Jan Held, Hani Itani, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, and Marc Van Droogenbroeck. X-VARS: Introducing explainability in football refereeing with multimodal large language models. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), pages 3267-3279, Seattle, WA, USA, 2024.
James Hong, Haotian Zhang, Michäel Gharbi, Matthew Fisher, and Kayvon Fatahalian. Spotting temporally precise, fine-grained events in video. In Eur. Conf. Comput. Vis. (ECCV), pages 33-51, Tel Aviv, Isräel, 2022.
Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, and Mubarak Shah. The THUMOS challenge on action recognition for videos " in the wild". Comput. Vis. Image Underst., 155: 1-23, 2017.
Hilde Kuehne, Ali Arslan, and Thomas Serre. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 780-787, Columbus, OH, USA, 2014.
Yin Li, Miao Liu, and James M. Rehg. In the eye of beholder: Joint learning of gaze and actions in first person video. In Eur. Conf. Comput. Vis. (ECCV), pages 639-655, 2018.
Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. MViTv2: Improved multiscale vision transformers for classification and detection. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 4794-4804, New Orleans, LA, USA, 2022.
Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander Hauptmann, and Li Fei-Fei. Peeking into the future: Predicting future person activities and locations in videos. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), pages 2960-2963, Long Beach, CA, USA, 2019.
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In Int. Conf. Learn. Represent., New Orleans, LA, USA, 2019.
Esteve Valls Mascaro, Hyemin Ahn, and Dongheui Lee. Intention-conditioned long-term human egocentric action anticipation. In IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), pages 6037-6046, Waikoloa, HI, USA, 2023.
Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, and Kristen Grauman. Ego-topo: Environment affordances from egocentric video. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 160-169, Seattle, WA, USA, 2020.
Megha Nawhal, Akash Abdu Jyothi, and Greg Mori. Rethinking learning approaches for long-term action anticipation. In Eur. Conf. Comput. Vis. (ECCV), pages 558-576, 2022.
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollar. Designing network design spaces. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 10425-10433, Seattle, WA, USA, 2020.
Fadime Sener, Dipika Singhania, and Angela Yao. Temporal aggregate representations for long-range video understanding. In Eur. Conf. Comput. Vis. (ECCV), pages 154-171, 2020.
Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, and Angela Yao. Assembly101: A large-scale multi-view video dataset for understanding procedural activities. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 21064-21074, New Orleans, LA, USA, 2022.
Karolina Seweryn, Anna Wróblewska, and Szymon Lukasik. Survey of action recognition, spotting and spatio-temporal localization in soccer-current trends and research perspectives. arXiv, abs/2309. 12067, 2023.
Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. Hollywood in homes: Crowdsourcing data collection for activity understanding. In Eur. Conf. Comput. Vis. (ECCV), pages 510-526, 2016.
João V. B. Soares, Avijit Shah, and Topojoy Biswas. Temporally precise action spotting in soccer videos using dense detection anchors. In IEEE Int. Conf. Image Process. (ICIP), pages 2796-2800, Bordeaux, France, 2022.
Sebastian Stein and Stephen J. McKenna. Combining embedded accelerometers with computer vision for recognizing food preparation activities. In ACM Int. Jt. Conf. Pervasive Ubiquitous Comput., pages 729-738, Zurich, Switzerland, 2013.
Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. Gate-shift-fuse for video action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 45 (9): 10913-10928, 2023.
Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, and Cordelia Schmid. Relational action forecasting. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 273-283, Long Beach, CA, USA, 2019.
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, and Rita Cucchiara. RMS-net: Regression and masking for soccer event spotting. In IEEE Int. Conf. Pattern Recognit. (ICPR), pages 7699-7706, Milan, Italy, 2021.
Joakim Valand, Haris Kadragic, Steven Hicks, Vajira Thambawita, Cise Midoglu, Tomas Kupka, Dag Johansen, Michael Riegler, and Pal Halvorsen. AI-based video clipping of soccer events. Mach. Learn. & Knowl. Extr., 3 (4): 1-19, 2021.
Joakim Valand, Haris Kadragic, Steven Hicks, Vajira Thambawita, Cise Midoglu, Tomas Kupka, Dag Johansen, Michael Riegler, and Pal Halvorsen. Automated clipping of soccer events using machine learning. In Int. Symp. Multimedia (ISM), pages 210-214, Naple, Italy, 2021.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Adv. Neural Inf. Process. Syst. (NeurIPS), pages 6000-6010, Long Beach, CA, USA, 2017.
Xinyu Wei, Patrick Lucey, Stuart Morgan, and Sridha Sridharan. Predicting shot locations in tennis using spatiotemporal data. In Digit. Image Comput.: Tech. Appl., pages 1-8, Hobart, TAS, Australia, 2013.
Xinyu Wei, Patrick Lucey, Stephen Vidas, Stuart Morgan, and Sridha Sridharan. Forecasting events using an augmented hidden conditional random field. In Asian Conf. Comput. Vis. (ACCV), pages 569-582, 2015.
Fei Wu, Qingzhong Wang, Jiang Bian, Ning Ding, Feixiang Lu, Jun Cheng, Dejing Dou, and Haoyi Xiong. A survey on video action recognition in sports: Datasets, methods and applications. IEEE Trans. Multimedia, 25: 7943-7966, 2023.
Artur Xarles, Sergio Escalera, Thomas B. Moeslund, and Albert Clapés. ASTRA: An Action Spotting TRAnsformer for soccer videos. In Int. ACM Work. Multimedia Content Anal. Sports (MMSports), page 93-102, Ottawa, Ontario, Can., 2023.
Artur Xarles, Sergio Escalera, Thomas B. Moeslund, and Albert Clapés. T-DEED: Temporal-discriminability enhancer encoder-decoder for precise event spotting in sports videos. In IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), pages 3410-3419, Seattle, WA, USA, 2024.
Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, and Stefano Soatto. Long short-term transformer for online action detection. In Adv. Neural Inf. Process. Syst. (NeurIPS), pages 1086-1099. 2021.
Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, and Chen Sun. Object-centric video representation for long-term action anticipation. In IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), pages 6737-6747, Waikoloa, HI, USA, 2024.
Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun. AntGPT: Can large language models help long-term action anticipation from videos? In Int. Conf. Learn. Represent., Vienna, Austria, 2024.
Yue Zhao and Philipp Krähenbühl. Real-time online video detection with temporal smoothing transformers. In Eur. Conf. Comput. Vis. (ECCV), pages 485-502. 2022.
Yi Zhong and Wei-Shi Zheng. Unsupervised learning for forecasting action representations. In IEEE Int. Conf. Image Process. (ICIP), pages 1073-1077, Athens, Greece, 2018.
Zeyun Zhong, Manuel Martin, Michael Voit, Juergen Gall, and Jürgen Beyerer. A survey on deep learning techniques for action anticipation. arXiv, abs/2309. 17257, 2023.
Zeyun Zhong, David Schneider, Michael Voit, Rainer Stiefelhagen, and Jurgen Beyerer. Anticipative feature fusion transformer for multi-modal action anticipation. In IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), pages 6057-6066, Waikoloa, HI, USA, 2023.
Zeyun Zhong, Chengzhi Wu, Manuel Martin, Michael Voit, Juergen Gall, and Jürgen Beyerer. DiffAnt: Diffusion models for action anticipation. arXiv, abs/2311. 15991, 2023.
Xin Zhou, Le Kang, Zhiyu Cheng, Bo He, and Jingyu Xin. Feature combination meets attention: Baidu soccer embeddings and transformer based temporal detection. arXiv, abs/2106. 14447, 2021.