Open3DIS Open-Vocabulary 3D Instance Segmentation with 2D Mask GuidancePhuc NguyenTuan Duc Ngoet al.2024CVPR 2024
What When and Where? Self-Supervised Spatio Temporal Grounding in Untrimmed Multi-Action Videos from Narrated InstructionsBrian ChenNina Shvetsovaet al.2024CVPR 2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D WorldYining HongZishuo Zhenget al.2024CVPR 2024
Resource- Efficient Transformer Pruning for Finetuning of Large ModelsFatih IlhanGong Suet al.2024CVPR 2024