Publications

Stats

View publication

Title An Empirical Study of the Effect of Video Encoders on Temporal Video Grounding
Authors Ignacio Meza, Edison Marrese-Taylor, Cristian Rodriguez-Opazo, Felipe Bravo-Marquez
Publication date 2023
Abstract Temporal video grounding is a fundamental task in computer vision, aiming to
localize a natural language query in a long, untrimmed video. It has a key
role in the scientific community, in part due to the large amount of video
generated every day. Although we find extensive work in this task, we note
that research remains focused on a small selection of video representations,
which may lead to architectural overfitting in the long run. To address this
issue, we propose an empirical study to investigate the impact of different
video features on a classical architecture. We extract features for three
well-known benchmarks, Charades-STA, ActivityNet-Captions and YouCookII,
using video encoders based on CNNs, temporal reasoning and transformers. Our
results show significant differences in the performance of our model by
simply changing the video encoder, while also revealing clear patterns and
errors derived from the use of certain features, ultimately indicating
potential feature complementarity.
Pages 2850-2855
Conference name IEEE International Conference on Computer Vision
Publisher IEEE Press (Piscataway, NJ, USA)
Reference URL View reference page