Soccer; Deep learning; Sport; Sports; Semantic; Field extraction; Player detection; Football
Abstract :
[en] Automatic interpretation of sports games is a major challenge, especially when these sports feature complex players organizations and game phases. This paper describes a bottom-up approach based on the extraction of semantic features from the video stream of the main camera in the particular case of soccer using scene-specific techniques.
In our approach, all the features, ranging from the pixel level to the game event level, have a semantic meaning. First, we design our own scene-specific deep learning semantic segmentation network and hue histogram analysis to extract pixel-level semantics for the field, players, and lines.
These pixel-level semantics are then processed to compute interpretative semantic features which represent characteristics of the game in the video stream that are exploited to interpret soccer. For example, they correspond to how players are distributed in the image or the part of the field that is filmed. Finally, we show how these interpretative semantic features can be used to set up and train a semantic-based decision tree classifier for major game events with a restricted amount of training data.
The main advantages of our semantic approach are that it only requires the video feed of the main camera to extract the semantic features, with no need for camera calibration, field homography, player tracking, or ball position estimation. While the automatic interpretation of sports games
remains challenging, our approach allows us to achieve promising results for the semantic feature extraction and for the classification between major soccer game events such as
attack, goal or goal opportunity, defense, and middle game.
Research Center/Unit :
Telim Montefiore Institute - Montefiore Institute of Electrical Engineering and Computer Science - ULiège
Disciplines :
Electrical & electronics engineering
Author, co-author :
Cioppa, Anthony ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Deliège, Adrien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Van Droogenbroeck, Marc ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Language :
English
Title :
A bottom-up approach based on semantics for the interpretation of the main camera stream in soccer games
Publication date :
June 2018
Event name :
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Event organizer :
IEEE
Event place :
Salt Lake city, United States
Event date :
from 18-06-2018 to 22-06-2018
Audience :
International
Main work title :
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Publisher :
IEEE
Pages :
1846-1855
Peer reviewed :
Peer reviewed
Name of the research project :
DeepSport
Funders :
DGTRE - Région wallonne. Direction générale des Technologies, de la Recherche et de l'Énergie
J. Assfalg, M. Bertini, C. Colombo, A. D. Bimbo, and W. Nunziati. Semantic annotation of soccer videos: Automatic highlights identification. Comp. Vision and Image Understanding, 92(2-3):285-305, Nov.-Dec. 2003.
G. Bradski. The OpenCV Library. Dr. Dobb's Journal of Software Tools, 2000.
M. Braham, S. Piérard, and M. Van Droogenbroeck. Semantic background subtraction. In IEEE Int. Conf. Image Process. (ICIP), pages 4552-4556, Beijing, China, Sept. 2017.
M. Braham and M. Van Droogenbroeck. Deep background subtraction with scene-specific convolutional neural networks. In IEEE Int. Conf. Syst., Signals and Image Process. (IWSSIP), pages 1-4, May 2016.
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Int. Conf. Comput. Vision and Pattern Recogn. (CVPR), volume 1, pages 886-893, San Diego, USA, June 2005.
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fe. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Int. Conf. Comput. Vision and Pattern Recogn. (CVPR), pages 248-255, Miami, Florida, USA, June 2009.
T. D'Orazio, M. Leo, P. Spagnolo, P. Mazzeo, N. Mosca, M. Nitti, and A. Distante. An investigation into the feasibility of real-time soccer offside detection from a multiple camera system. IEEE Trans. Circuits and Syst. for Video Technol., 19(12):1804-1818, Dec. 2009.
T. D'Orazio, M. Leo, P. Spagnolo, M. Nitti, N. Mosca, and A. Distante. A visual system for real time detection of goal events during soccer matches. Comp. Vision and Image Understanding, 113(5):622-632, May 2009.
A. Ekin, A. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization. IEEE Trans. Image Process., 12(7):796-807, July 2003.
D. Farin, S. Krabbe, and W. E. et. al. Robust camera calibration for sport videos using court models. In Storage and Retrieval Methods and Applications for Multimedia, volume 5307 of Proceedings of SPIE, pages 80-92, Dec. 2003.
P. Figueroa, N. Leite, and R. Barros. Tracking soccer players aiming their kinematical motion analysis. Comp. Vision and Image Understanding, 101(2):122-135, Feb. 2006.
M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. of the ACM, 24(6):381-395, June 1981.
A. Fitzgibbon and R. Fisher. A buyer's guide to conic fitting. DAI Research paper, 1996.
R. Gade and T. Moeslund. Constrained multi-target tracking for team sports activities. IPSJ Trans. Comp. Vision and Appl., 10(1):1-11, Jan. 2018.
X. Gao, Z. Niu, D. Tao, and X. Li. Non-goal scene analysis for soccer video. Neurocomputing, 74(4):540-548, Jan. 2011.
A. Gupta, J. Little, and R. Woodham. Using line and ellipse features for rectification of broadcast hockey video. In Canadian Conf. Comput. and Robot Vision (CRV), pages 32-39, St. Johns, Canada, May 2011.
K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. CoRR, abs/1703. 06870, 2018.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512. 03385, 2015.
N. Homayounfar, S. Fidler, and R. Urtasun. Sports field localization via deep structured models. In IEEE Int. Conf. Comput. Vision and Pattern Recogn. (CVPR), pages 4012-4020, Honolulu, HI, USA, July 2017.
M. B. N. Hoyningen-Huene, B. Kirchlechner, S. Gedikli, F. Silesand, M. Durus, and M. Lames. Aspogamo: Automated sports game analysis models. International Journal of Computer Science in Sport, 8(1):1-21, 2009.
Y. Huang, J. Llach, and S. Bhagavathy. Players and ball detection in soccer videos based on color segmentation and shape analysis. In Multimedia Content Analysis and Mining, volume 4577 of Lecture Notes Comp. Sci., pages 416-425. Springer, 2007.
K. Ingersoll. Vision based multiple target tracking using recursive RANSAC. Master's thesis, Brigham Young University, Mar. 2015.
Y. Kang, J. Lim, Q. Tian, and M. Kankanhalli. Soccer video event detection with visual keywords. In Joint Conference of the International Conference on Information, Communications and Signal Processing, and Pacific Rim Conference on Multimedia, volume 3, pages 1796-1800, Singapore, Dec. 2003.
D. Liang, Y. Liu, Q. Huang, and W. Gao. A scheme for ball detection and tracking in broadcast soccer video. In Pacific Rim Conference on Multimedia (PCM), pages 864-875, Jeju Island, Korea, Nov. 2005.
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and L. Zitnick. Microsoft coco: Common objects in context. In Eur. Conf. Comput. Vision (ECCV), volume 8693 of Lecture Notes Comp. Sci., pages 740-755. Springer, 2014.
M. Manafifard, H. Ebadi, and H. Moghaddam. A survey on player tracking in soccer videos. Comp. Vision and Image Understanding, 159:19-46, June 2017.
F. Mufti, R. Mahony, and J. Heinzmann. Robust estimation of planar surfaces using spatio-temporal RANSAC for applications in autonomous vehicle navigation. Robotics and Autonomous Syst., 60(1):16-28, Jan. 2012.
S. Nam, H. Kim, and J. Kim. Trajectory estimation based on globally consistent homography. In Comput. Anal. Images and Patterns, volume 2756 of Lecture Notes Comp. Sci., pages 214-221. Springer, 2003.
C. Papageorgiou and T. Poggio. Trainable pedestrian detection. In IEEE Int. Conf. Image Process. (ICIP), volume 4, pages 35-39, Kobe, Japan, Oct. 1999.
P. Parisot and C. D. Vleeschouwer. Scene-specific classifier for effective and efficient team sport players detection from a single calibrated camera. Comp. Vision and Image Understanding, 159:74-88, June 2017.
G. Pingali, A. Opalach, and Y. Jean. Ball tracking and virtual replays for innovative tennis broadcasts. In IEEE Int. Conf. Pattern Recogn. (ICPR), volume 4, pages 152-156, Barcelona, Spain, Sept. 2000.
P. Pinheiro, R. Collobert, and P. Dollar. Learning to segment object candidates. In Adv. in Neural Inform. Process. Syst. (NIPS), volume 2, pages 1990-1998, Montreal, Canada, Dec. 2015.
X. Qian, G. Liu, H. Wang, Z. Li, and Z. Wang. Soccer video event detection by fusing middle level visual semantics of an event clip. In Advances in Multimedia Information Processing, volume 6298 of Lecture Notes Comp. Sci., pages 439-451. Springer, 2010.
Y. Qian and D. Lee. Adaptive field detection and localization in robot soccer. In RoboCup 2016, volume 9776 of Lecture Notes Comp. Sci., pages 218-229. Springer, 2016.
V. Ramanathan, J. Huang, S. Abu-El-Haija, A. Gorban, K. Murphy, and L. Fei-Fei. Detecting events and key actors in multi-person videos. In IEEE Int. Conf. Comput. Vision and Pattern Recogn. (CVPR), pages 3043-3053, Las Vegas, USA, June 2016.
Y. Seo, S. Choi, H. Kim, and K.-S. Hong. Where are the ball and players? soccer game analysis with colorbased tracking and image mosaick. In Int. Conf. Image Anal. and Process. (ICIAP), pages 196-203, Florence, Italy, Sept. 1997. Springer.
S. Suzuki and K. Abe. Topological structural analysis of digitized binary images by border following. Comp. Vision, Graph., and Image Process., 30(1):32-46, Apr. 1985.
G. Thomas, R. Gade, T. Moeslund, P. Carr, and A. Hilton. Computer vision for sports: current applications and research topics. Comp. Vision and Image Understanding, 159:3-18, June 2017.
P. Viola, M. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. In IEEE Int. Conf. Comput. Vision (ICCV), volume 2, pages 734-741, Nice, France, Oct. 2003.
B. Wu and R. Nevatia. Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In IEEE Int. Conf. Comput. Vision (ICCV), volume 1, pages 90-97, Beijing, China, 2005.
M. Xu, N. Maddage, and C. Xu. Creating audio keywords for event detection in soccer video. In IEEE Int. Conf. Multimedia and Expo (ICME), volume 2, pages 281-284, Baltimore, USA, July 2003.
Y. Yang and D. Li. Robust player detection and tracking in broadcast soccer video based on enhanced particle filter. J. of Visual Communication and Image Representation, 46:81-94, July 2017.
H. Zawbaa, N. El-Bendary, A. Hassanien, and A. Abraham. Svm-based soccer video summarization system. In Third World Congress on Nature and Biologically Inspired Computing (NaBIC), pages 7-11, Oct. 2011.
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In IEEE Int. Conf. Comput. Vision and Pattern Recogn. (CVPR), pages 6230-6239, Honolulu, USA, July 2017.