May 28, 2026

Immersive Media Workflows: Synchronizing Video and Metadata with ST 2110 Standards

Broadcast & ProAV: Learn how the Macnica MEP100 helps to sync ST 2100-22 video flows with ProRes

Immersive media workflows for devices like Apple Vision Pro are pushing live production infrastructure into new territory. Massive image payloads, multi-view video, ultra-low latency requirements, and real-time operation all combine to create one of the most demanding media transport environments the industry has faced.

 

A major part of that challenge is image transport itself. High-fidelity immersive workflows generate enormous amounts of visual data, making efficient compression essential. Using ProRes over SMPTE ST 2110-22 allows these workflows to preserve image quality while reducing bandwidth to a level that can realistically move through a live IP production system. Rather than relying on proprietary transport methods or repeated transcoding stages, production systems can carry compressed immersive video directly across standards-based ST 2110 infrastructure - but the video is only part of the story.

 

In immersive media workflows, the image alone is not enough. Every frame also depends on metadata that describes how that image should be interpreted, positioned, and experienced. This metadata must remain precisely synchronized with the video throughout the entire production pipeline.

 

That creates a very different kind of challenge than traditional broadcast workflows.

 

An immersive production may need to transport lens projection data, camera calibration information, spatial graphics instructions, motion characteristics, production events, and spatial audio metadata in real time. Some of this information changes continuously, frame by frame, as content moves through cameras, graphics systems, switchers, replay servers, encoders, and playback systems.

 

One concrete example is lens projection metadata. Immersive cameras rely on detailed optical calibration and projection models to reconstruct a believable spatial scene for the viewer. Alongside the image itself, the system may need to carry camera identifiers, calibration UUIDs, projection geometry, masking information, and spatial positioning data describing how that specific lens captured the scene.

 

In Apple’s immersive workflows, this metadata is associated directly with individual frames and cameras. The projection metadata effectively becomes part of how the image is interpreted downstream. If the wrong metadata is paired with the wrong frame, the immersive scene can distort, drift spatially, or break apart entirely.

 

ST 2110-22 Video Flow

Immersive VR requires synchronization of metadata throughout the processing pipeline

 

 

That makes synchronization critical. The metadata must remain locked to the exact frame it describes as content moves through cameras, graphics systems, switchers, replay servers, encoders, and playback devices across the production pipeline.

 

This is where SMPTE ST 2110-41, commonly referred to as “Fast Metadata,” steps in.

 

Traditional ST 2110 metadata workflows often rely on ST 2110-40, which was designed around ANC-style metadata transport models inherited from SDI environments. Those workflows remain extremely important across broadcast production. Immersive media, however, introduces significantly richer and more dynamic metadata requirements, often involving large amounts of structured data that must remain frame-accurate throughout the production pipeline.

 

ST 2110-41 extends the ST 2110 ecosystem by allowing metadata to exist as an independent, time-aligned essence flow within the network. Rather than embedding metadata directly into the video stream, ST 2110-41 transports it separately while maintaining deterministic synchronization using PTP timing.

 

In practice, this means every immersive video frame transported over ST 2110-22 can be accompanied by corresponding metadata delivered in parallel over ST 2110-41. Video and metadata remain separate within the infrastructure, but together they describe a single coherent moment in time.

 

This approach becomes especially powerful in distributed live production systems. Cameras can generate metadata. Graphics engines can update it. Switchers and replay systems can combine and modify it. Encoders and playback systems can consume it. Throughout the workflow, synchronization between image, sound, and metadata is maintained with frame-level precision.

 

Enabling this reliably requires substantial transport performance.

 

Macnica’s MEP25 and MEP100 SmartNICs were designed specifically for real-time ST 2110 media workflows, providing hardware-accelerated support for both ST 2110-22 and ST 2110-41. Operating at 25GbE and 100GbE respectively, they allow immersive video and synchronized metadata to move through live production systems with deterministic timing and extremely low latency.

 

Support for ProRes over ST 2110-22 allows high-quality immersive workflows to integrate directly into standards-based IP production environments. Combined with ST 2110-41 Fast Metadata support, macOS, Windows, and Linux developers can build immersive production applications on standard compute platforms while maintaining the precision required for professional live media systems.

 

The result is a practical, scalable foundation for immersive media production.

 

Open standards like ST 2110-22 and ST 2110-41 provide the transport layer. Solutions like the MEP25 and MEP100 make that transport usable at the scale and performance required by modern immersive workflows. Together, they enable immersive experiences such as those emerging in Apple’s ecosystem, where viewers are no longer simply watching the action from a distance, but experiencing it from within the scene itself.

 

 

 

 

 

Related Articles