Feature Representations for Visual and Language: Towards Deeper Video Understanding

17 October 2025

4:30pm - 6:00pm

Room 4402 (Lift 17-18), Academic Building

Abstract

This research enhances video understanding by leveraging Transformer-based models like BERT for feature representation in two tasks: video question answering and humor prediction. For video QA, using BERT to represent visual and subtitle semantics improved accuracy on TVQA and Pororo datasets. A comparative study of Transformers linked performance differences to their pre-training methods. For humor prediction, a novel multimodal method using pose, face, and subtitle features in a sliding window outperformed previous approaches on a new comedy dataset. The work highlights the importance of selecting optimal features and models for deeper video analysis.

Biography

Prof. YANG Zekun is an Assistant Professor at Tokyo University of Science. He graduated from Osaka University in 2021 and has worked at Nagoya University. His research areas are Machine Learning and Language Processing.

When

17 October 2025

Where

Room 4402 (Lift 17-18), Academic Building

Language

English

Speakers / Performers:

Prof. YANG Zekun

Assistant Professor, Department of Information and Computer Technology, Tokyo University of Science

Organizer

Division of Humanities, Digital Humanities Initiative

The University of Hong Kong

Hong Kong Baptist University

Beijing Normal-Hong Kong Baptist University

Registration

https://lbcube.hkust.edu.hk/ce/index.php/event/10961/

Event Format

Organizer

Interest Areas

Browse by Event Topic