Feature Representations for Visual and Language: Towards Deeper Video Understanding
4:30pm - 6:00pm
Room 4402 (Lift 17-18), Academic Building

Abstract

This research enhances video understanding by leveraging Transformer-based models like BERT for feature representation in two tasks: video question answering and humor prediction. For video QA, using BERT to represent visual and subtitle semantics improved accuracy on TVQA and Pororo datasets. A comparative study of Transformers linked performance differences to their pre-training methods. For humor prediction, a novel multimodal method using pose, face, and subtitle features in a sliding window outperformed previous approaches on a new comedy dataset. The work highlights the importance of selecting optimal features and models for deeper video analysis.

 

Biography

Prof. YANG Zekun is an Assistant Professor at Tokyo University of Science. He graduated from Osaka University in 2021 and has worked at Nagoya University. His research areas are Machine Learning and Language Processing.

When
Where
Room 4402 (Lift 17-18), Academic Building
Language
English
Speakers / Performers:
Prof. YANG Zekun
Assistant Professor, Department of Information and Computer Technology, Tokyo University of Science
Organizer
Division of Humanities, Digital Humanities Initiative
The University of Hong Kong
Hong Kong Baptist University
Beijing Normal-Hong Kong Baptist University
Registration
https://lbcube.hkust.edu.hk/ce/index.php/event/10961/
RSS