SAM 3.1 Doubles Video Tracking Speed With Multiplexing

Meta's SAM 3.1 introduces object multiplexing to track multiple objects in one pass, reaching 32 fps on a single H100 GPU.
SAM 3.1 is Meta's latest update to its Segment Anything Model, built to track objects in video faster and on less powerful hardware. This article explains how multiplexing works and why it changes what real-time video AI can do.
What does object multiplexing do in SAM 3.1?
processes up to 16 tracked objects in a single forward pass instead of running a separate pass for each one. This eliminates repeated computation across objects and removes memory bottlenecks. The result is throughput that rises from 16 to 32 frames per second on a single H100 GPU.
Why SAM 3.1 matters for accessible, high-performance video AI
The shift to multiplexing is not just a speed improvement. It changes the economics of running advanced video segmentation. Previously, tracking more objects meant proportionally more GPU passes, making real-time performance in complex scenes expensive and hardware-intensive. SAM 3.1 removes that constraint by reasoning about all tracked objects together in one global pass, which also improves accuracy when objects are close together or visually similar. Because the model processes shared frame-level context once and distributes it across all objects simultaneously, it avoids the redundant encoding work that made SAM 3 slower at scale. The practical consequence is that demanding video applications like content editing, surveillance, wildlife monitoring, and robotics become feasible on smaller, more accessible hardware than before. Developers can use SAM 3.1 as a drop-in replacement for SAM 3 without changing their existing code, while immediately gaining the performance benefit. This positions the model not just as a research improvement but as an infrastructure upgrade for any team already building on SAM 3.