Park, Ho-Min ; Ghent University Global Campus & Ghent University, Incheon, Republic of Korea
Kim, Ganghyun ; Ghent University Global Campus, Incheon, Republic of Korea
Van Messem, Arnout ; Université de Liège - ULiège > Département de mathématique > Statistique appliquée aux sciences
De Neve, Wesley ; Ghent University Global Campus & Ghent University, Incheon, Republic of Korea
Language :
English
Title :
MuSe-Personalization 2023: Feature Engineering, Hyperparameter Optimization, and Transformer-Encoder Re-discovery
Publication date :
29 October 2023
Event name :
The 4th Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation
Event place :
Ottawa, Canada
Event date :
2023, October 29
Audience :
International
Main work title :
MuSe '23: Proceedings of the 4th Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation
Publisher :
Association for Computing Machinery, New York, United States - New York
ISBN/EAN :
979-8-4007-0270-9
Peer reviewed :
Peer reviewed
Funders :
NRF - National Research Foundation of Korea UGent - Ghent University
Funding text :
This research effort was supported by the National Research Foundation
(NRF) Korea (NRF-2020K1A3A1A68093469), funded by the Ministry of Science and ICT (MSIT) Korea, and by the Department of Biotechnology (India) (DBT/IC-12031(22)-ICD-DBT). This research effort was also supported by Ghent University Global Campus (GUGC) in Korea.
Shahin Amiriparian, Nicholas Cummins, Sandra Ottl, Maurice Gerczuk, and Björn Schuller. 2017. Sentiment analysis using image-based deep spectrum features. In 2017 Seventh International Conference on Affective Computing and Intelligent InteractionWorkshops and Demos (ACIIW). 26-29. https://doi.org/10.1109/ACIIW. 2017.8272618
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, and Michael Auli. 2022. data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 1298-1312. https://proceedings.mlr.press/v162/baevski22a. html
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 12449-12460. https://proceedings.neurips.cc/paper-files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf
Lukas Biewald. 2020. Experiment Tracking with Weights and Biases. https://www.wandb.com/Software available from wandb.com
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)
Kyunghyun Cho, Bart van Merrienboer, Caglar Gölcehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs/1406.1078 (2014). arXiv:1406.1078 http://arxiv.org/abs/1406.1078
Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Möller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023. The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation. arXiv:2305.03369 [cs.LG]
Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas Möller, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2022. The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge (Lisboa, Portugal) (MuSe? 22). Association for Computing Machinery, New York, NY, USA, 5-14. https://doi.org/10.1145/3551876.3554817
Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, and Yonghui Wu. 2021. w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training. In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 244-250. https://doi.org/10.1109/ASRU51503.2021.9688253
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. https://openreview.net/forumid= YicbFdNTTy
Paul Ekman andWallace V Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press
Paul Ekman, Robert W Levenson, and Wallace V. Friesen. 1983. Autonomic Nervous System Activity Distinguishes Among Emotions. Science 221, 4616 (1983), 1208-1210. https://doi.org/10.1126/science.6612338 arXiv:https://www.science.org/doi/pdf/10.1126/science.6612338
Florian Eyben, Klaus R. Scherer, Björn W. Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, and Khiet P. Truong. 2016. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing 7, 2 (2016), 190-202. https://doi.org/10.1109/TAFFC.2015.2457417
Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (Sept. 2020), 357-362. https://doi.org/10.1038/s41586-020-2649-2
Reza Lotfian and Carlos Busso. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings. IEEE Transactions on Affective Computing 10, 4 (2019), 471-483. https://doi.org/10.1109/TAFFC.2017.2736999
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765-4774. http://papers.nips.cc/paper/7062-Aunified-Approach-To-interpreting-model-predictions.pdf
Ho min Park, Ganghyun Kim, Arnout Van Messem, and Wesley De Neve. 2023. best general models. (8 2023). https://doi.org/10.6084/m9.figshare.23798262.v2
Ho min Park, Ganghyun Kim, Arnout Van Messem, and Wesley De Neve. 2023. prediction results for fusion. (8 2023). https://doi.org/10.6084/m9.figshare. 23798256.v1
Utku Ozbulak, Hyun Jung Lee, Beril Boga, Esla Timothy Anzaku, Ho min Park, Arnout Van Messem, Wesley De Neve, and Joris Vankerschaver. 2023. Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training. arXiv:2305.13689 [cs.CV]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206-5210. https://doi.org/10.1109/ICASSP.2015.7178964
The pandas development team. 2020. pandas-dev/pandas: Pandas. https://doi.org/10.5281/zenodo.3509134
Ho-min Park, Ilho Yun, Ajit Kumar, Ankit Kumar Singh, Bong Jun Choi, Dhananjay Singh, and Wesley De Neve. 2022. Towards Multimodal Prediction of Time-Continuous Emotion Using Pose Feature Engineering and a Transformer Encoder. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge (Lisboa, Portugal) (MuSe? 22). Association for Computing Machinery, New York, NY, USA, 47-54. https://doi.org/10.1145/3551876.3554807
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024-8035. http://papers.neurips.cc/paper/9015-pytorch-Animperative-style-high-performance-deep-learning-library.pdf
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. 2021. The MuSe 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. In Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge. 5-14
Suramya Tomar. 2006. Converting video formats with FFmpeg. Linux Journal 2006, 146 (2006), 10
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H.Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
ThomasWolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Perric Cistac, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-The-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38-45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
Fan Yang, Yang Wu, Sakriani Sakti, and Satoshi Nakamura. 2020. Make Skeleton-Based Action Recognition Model Smaller, Faster and Better. In Proceedings of the ACM Multimedia Asia (Beijing, China) (MMAsia ?19). Association for Computing Machinery, New York, NY, USA, Article 31, 6 pages. https://doi.org/10.1145/3338533.3366569