Abstract: Memory-based trackers are video object segmentation methods that form the target model by concatenating recently tracked frames into a memory buffer and localize the target by attending the ...
Abstract: Audio-visual zero-shot learning (ZSL) leverages both video and audio information for model training, aiming to classify new video categories that were not seen during the training. However, ...
This repository contains the code for the paper "One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation". We provide a dockerized environment to run the code ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果