Delivering Live TV Over HTTP: Chunked MPEG-TS and Multi-Language Audio

How Muvie turns raw MPEG-TS broadcast streams into browser-playable live TV with language selection, automatic codec fallback, and sub-second latency.

Live television usually arrives as a continuous MPEG-TS transport stream: a broadcast-oriented multiplex of video, audio, and signaling data that never really ends. Browsers do not consume that format directly, so a web player needs an intermediate path that can translate transport-stream semantics into something the browser can append and decode.

At Muvie, we built that path around a Go-based central proxy, chunked MPEG-TS delivery over HTTP, and a patched fork of mpegts.js in the browser. Upstream mpegts.js is designed for low-latency live playback and works by transmuxing MPEG-TS into fragmented MP4 for Media Source Extensions, which makes it a strong base for browser-side live TV delivery.

The result is a live playback pipeline that preserves broadcast features such as multiple language tracks, while still fitting the operational model of the web. Multi-audio support in our fork is still experimental today, but it is already usable enough to power real live TV channels with language selection and automatic fallback.

The transport problem

A raw MPEG-TS stream is just a continuous byte flow. You can proxy that directly over HTTP, and a fetch-based player can keep reading forever, but bare retransport creates the wrong scaling model for a shared live channel.

The issue is not primarily latency. The real issue is scalability: if every viewer is tied directly to a live upstream response, the proxy becomes a per-client retransmitter instead of a reusable distribution layer.

We solve that by converting the endless upstream stream into a rolling pool of bounded TS chunks. That lets one upstream reader feed many downstream viewers, while the server retains a short in-memory window that can be shared, expired, trimmed, and reused across all clients on the same channel.

This turns live TV delivery into a clean single-reader, multiple-writer system:

One long-lived upstream fetch per channel.
One in-memory chunk pool per active channel.
Many browser clients consuming from the same rolling buffer.

That is the core architectural reason for chunking. It gives the proxy a scalable fan-out boundary, keeps memory bounded, and lets inactive channels shut down cleanly when nobody is watching.

Why PAT and PMT matter

Making MPEG-TS work in the browser is not only about moving bytes. The player also has to understand the structure of the stream, and that starts with the signaling tables inside the transport stream.

The Program Association Table (PAT) is the entry point. It tells the demuxer which PID carries the Program Map Table (PMT) for the program being broadcast.

The Program Map Table (PMT) is where the useful playback map lives. It identifies the elementary streams in the multiplex: the video PID, each audio PID, their codec types, and any descriptors attached to those streams.

For live TV, PMT parsing is what makes language selection possible. A single channel may carry several audio streams, each on its own PID, and the PMT is where the browser-side player learns that those streams exist in the first place.

In our fork, the player does not stop at the first audio PID. It builds a fuller track model from the PMT, including codec identity and language metadata when available, so the application can expose actual language choices instead of a single default audio path.

A simplified example looks like this:

markdown

That is the critical bridge between broadcast metadata and a browser UI. Once the player has parsed the PMT, the web app can present a language selector and make a real playback decision instead of blindly following the first audio stream it sees.

Chunked delivery on the server

On the server side, each active channel has a long-lived fetcher that reads the upstream MPEG-TS response and accumulates data into a buffer. That buffer is flushed into bounded chunks and stored in an in-memory pool keyed by channel and sequence number.

Those chunks are not HLS segments and they are not meant to replace adaptive packaging. They are simply reusable windows of a live transport stream, shaped to make fan-out and buffering practical.

This has a few important benefits:

Multiple viewers can share one upstream connection.
The server can evict old chunks with a short TTL.
New viewers can attach to the current rolling window instead of waiting on arbitrary upstream read timing.
Idle channels can be torn down without keeping a useless live pipe open.

Transport correctness also matters here. MPEG-TS packets are fixed-size units with a sync-byte structure, and network reads do not respect those boundaries. If the proxy flushes data in the middle of a TS packet, the client demuxer will eventually see corrupted boundaries.

So before publishing each chunk, the server aligns the buffer to complete TS packets and carries any remainder into the next flush. That keeps chunk transitions clean and prevents an entire class of demux and decoder failures at the browser edge.

For channels whose audio is not browser-friendly, the proxy can also pass the stream through ffmpeg. In that path, video remains copy-mode while audio is transcoded into AAC, and all streams are preserved so alternate languages do not disappear during processing.

That preservation step is important. In a multi-language broadcast stream, transcoding is not just about changing codecs; it also has to retain the structure of the original program so the client still sees every audio option after remuxing.

Client playback and audio fallback

In the browser, our patched mpegts.js fork receives the chunked TS stream, parses PAT and PMT, identifies the playable streams, and feeds the result into MSE. Upstream mpegts.js already supports low-latency MPEG-TS playback over MSE and documents less-than-one-second latency in the best case, which aligns well with this kind of live delivery model.

The main addition in our fork is experimental multi-audio awareness. Instead of treating transport-stream audio as a single route, the player keeps track of multiple audio PIDs and exposes them to the application layer.

That enables two viewer-facing behaviors.

First, the player can show a language selector when the PMT advertises multiple audio tracks. Second, it can apply automatic codec fallback when the primary broadcast audio is not suitable for browser playback.

This matters because broadcast streams often use codecs like AC-3 or E-AC-3. Upstream mpegts.js has introduced support for ATSC AC-3 and E-AC-3 over MPEG-TS in recent releases, but browser decode behavior still varies enough that a transport stream may be structurally valid while remaining a poor default choice for web playback.

In practice, that means a browser player still needs a policy layer. In our fork, if the default audio track is not a good browser target, the player can fall back to another PMT-advertised track such as AAC or MP3 before bad audio ever reaches the playback pipeline.

When a user actively switches language, we currently prefer a controlled player remount over a fully dynamic in-place switch. That gives the new preferred PID a clean start from the first PMT parse and avoids carrying stale codec routing or timing state from the previous track.

The hard part in these switches is not the UI. It is maintaining continuity across stream discovery, codec changes, and transport timestamps so the browser sees a stable media timeline instead of a discontinuity storm.

Engineering notes

This design sits between two extremes. A full HLS or DASH pipeline adds packaging and manifest overhead that is not always necessary for browser live TV, while direct retransport of an endless TS stream is simple on paper but scales poorly once many viewers share the same channel.

Chunked MPEG-TS gives us a more useful middle layer: stream-like enough for low delay, but structured enough to support pooling, expiry, fan-out, and browser-safe playback behavior. Upstream mpegts.js also supports a broad MPEG-TS-to-MSE path, including recent work around iOS 17.1+ ManagedMediaSource and newer codec support, which makes it a stronger foundation than a from-scratch browser demux stack.

Our fork builds on that foundation with experimental support for richer audio-track handling in live broadcast streams. That part is still evolving, and we treat multi-audio in the fork as an active engineering area rather than a fully settled surface.

We are also Actively Checking if There is any Cost Benefit in CDN caching as We pay by egress bandwidth by CDN and not by cache Hits, Deliverablity Latency is what we need to solve. Remuxing via FFMPEG with our Already working pipeline is done to produce HLS stream from bare TS stream, if egress policy in CDN is not favorable then there is very less benefit to it. so Rather Offloading Relay work to Cheap Bandwidth provider is a better path forward for high volume streams.

In the earlier model, the live processing chain was closely tied to a single upstream HTTP connection. If that connection stalled or ended, recovery tended to rebuild too much of the pipeline. That is workable for clean sources, but it is a poor fit for real IPTV-style streams where short disconnects, temporary rate collapse, or uneven delivery are normal. The design problem was not simply “lack of retries.” It was that the system treated transport interruption and stream identity change as almost the same class of event.

The newer design separates those two concerns. If the source itself is still the same, reconnecting to it is treated as continuity recovery rather than a full pipeline replacement. The processing side remains stable, and only the upstream feed is reattached. A full restart is reserved for cases where the system actually switches to a different upstream candidate or where the processing layer itself has failed.

There is also a small in-memory buffering stage in front of the processing layer. It is not a large persistent ring-buffer architecture, and it is not meant to act as a long-delay media reservoir. Its role is narrower and more pragmatic: decouple the processor from the exact timing jitter of the upstream socket. In other words, the processor is no longer exposed as directly to tiny fluctuations in delivery cadence. That improves tolerance to short stalls and bursty arrival patterns without turning the system into a lag-heavy buffered relay.

The reconnect strategy was also tightened. Same-source recovery is now given more chances before the system cools that source down and moves on to alternates. This matters because many unstable live endpoints do not fail in a clean binary way; they wobble. A design that fails over too quickly can actually behave worse than one that briefly persists through startup instability. The revised behavior is more deliberate: first try to preserve continuity on the current source, then fail over only when instability looks persistent enough to be meaningful.

At a high level, the design moved from a disposable connection-oriented model to a stable session-oriented model. The source connection may come and go, but the live processing session tries to remain intact for as long as the stream identity is still the same. For noisy real-world live inputs, that is a much better engineering tradeoff than rebuilding the whole path every time the transport blinks.

Repository: Muvie mpegts.js fork