How to Efficiently Use Color and Temporal Information for Video Understanding

Yaxin Hu*, Erhardt Barth

*Corresponding author for this work

Abstract

The modeling of temporal dependencies, and the associated computational load, remain challenges in video understanding. We here focus on using a more efficient sampling of color and temporal information. We sample color not from the same frame but from different consecutive frames to capture richer temporal information without increasing the computational load. We demonstrate the effectiveness of our approach for 2D-CNNs, 3D-CNNs, and Transformers, for which we obtain significant performance improvements on two benchmarks. The improvements are 2.43% on UCF101 and 4.55% on HMDB51 for the ResNet18, 10.28% and 7.12% for the 3D-ResNet18, and 15.11% and 13.71% for the UniFormerV2. These improvements are obtained without additional costs by just changing the way color is sampled.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science
Number of pages14
Volume15293
PublisherSpringer Nature Singapore
Publication date02.12.2024
Pages413-426
ISBN (Print)978-981-96-6598-3
ISBN (Electronic)978-981-96-6596-9
Publication statusPublished - 02.12.2024

Fingerprint

Dive into the research topics of 'How to Efficiently Use Color and Temporal Information for Video Understanding'. Together they form a unique fingerprint.

Cite this