Revenge of the Codecs Part 2 – The War for 4K Streaming Profits

One codec to rule them all…

For many years, the happy Hobbits of Netflixshire, and adjacent suburbs of Hulu, HBO, and Amazon, have been serving massive dollops of delicious HD content to all visitors. The cooks (content creators) and the Shire are paid handsomely, but the Halflings of Codec receive only a small delivery fee. 

However, darkness has shadowed Middle-Earth as all eye the rise of Profits, which are far more Precious than ever before. Large shires are breaking apart as the Apple-ings, Lords of the Mouse, Servants of the CBS Eye and others set up their own content Shires. Codec Halflings, seeing 4K as their last chance to enrich their coffers, are demanding increased tariffs on 4K streams, thus beginning the War of the 4K – One Codec to Rule Them All.

With apologies to J. R. R. Tolkien, this parable summarizes the state of consumer streaming. There are no Orcs – just companies pursuing competing profit goals.

Years ago, when Netflix was in its infancy, the MPEG 4/AVC H.264 codec was developed for HD video. The patent pool, MPEG LA, set up a clearly defined royalty process, costs and cap for annual usage, which amounts to pocket change for large-scale streaming suppliers like Netflix today. Even so, Chrome and Firefox browsers don’t support H.264 (or HEVC), saving costs and advancing Google’s own VP9/AV1 codecs.

When HEVC (Highly Efficient Video Codec) H.265 for 4K streaming was developed, everyone got greedy, and things got messy. Now there are three independent patent pools and many others not in a pool, all insisting on a piece of the profits. As a result, HEVC adoption has been problematic, as companies can’t predict usage and costs.

What’s even more confusing is that HEVC is really a temporary codec, to be replaced by VVC (Versatile Video Codec) H.266 in a year or so. Turns out HEVC isn’t as Highly Efficient as planned. VVC is designed to be 30-50% more efficient, and promises to provide a better patent process – though experts expect the same patent mess as HEVC.

The streaming giants, such as Netflix, Amazon, Facebook, Intel, Google, Apple, Microsoft and more, formed the Alliance for Open Media to create a new 4K codec based on Google’s VP9, called AV1. It’s royalty-free, and the Alliance has the resources to fight off claims from patent pools. There’s one set up already, but yeah, good luck with that. 

Consumers won’t sense the battle in the background – 4K will still look like 4K. The big winners will be the streaming vendors, as they can encode all the streams using one codec. Generally, Web videos are typically VP9/AV1, and device streaming tends to be HEVC. Not that people are watching much 4K anyhow – more than 90% of streaming content is HD H.264. That will change over time. It’s interesting that Samsung is developing AI-assisted 8K upscaling as they bring out more 8K TVs.  Looks like they expect content to be a mix of HD and 4K for the foreseeable future.

Who wins? Easy question, as anyone who recalls the Microsoft Explorer/Netscape Wars knows – free always wins.

4K over H.264? 4 Sure – and More!

As consumers who binge hundreds of video streams from cable, DirecTV, Netflix, Amazon Prime and other services – we assume that the MPEG H.264 streams we’re watching are limited to 1080p. Surprise! Those clever cooks at MPEG LA (the entity that collects MPEG fees and royalties) always had a lot more in mind. The fact is, H.264 can crunch 4K and 8K video just as easily as 1080p!

The catch is, the streams would be much bigger, way to big to travel over the average network and WiFi. So about 95% of the content you watch at home is H.264-driven 1080p. 4K streams are encoded with HEVC H.265 or Google’s VP9 or AVI technology (all You Tube content is VP9 format) – all able to scrunch the 4K stream into a size closer to 1080p dimensions.

That’s fine for 4K programming in the home, where billion-dollar technology makes it possible to send content to $30 dongles. The math is different in the commercial world, where we need moderately-priced encoders to send in-house content and signage to lobby. lunchroom, classroom, and meeting room TVs.

While 4K H.264 streams are too big for home distribution, they’re fine for dedicated commercial media networks.

About That Spinning Wheel…

Remember the spinning wheel of Netflix that tried our patience in the days of 1.5 M Internet? That’s caused by buffering – the streaming box has to pull in a number of frames into memory so it can figure out how to decode the video. Not a big problem these days, but H.264 (and HEVC/VP9 and so on) will always need buffering time. The encoding format is called Inter-Frame, so called because the stream is made up of Groups of Pictures. Each group starts with an actual picture called the I frame, essentially a JPEG. The rest of the “pictures” are just data that describe what stayed the same, what moved and what colors changed. That keeps the streams small. The decoder does the same in reverse, grabbing the first picture in the group, then storing a few more data frames, then “reading” the information to reconstruct the original video.

As consumers, we’re used to that delay, happy to trade off a little time to see a great video. If you’re giving a presentation in a conference room, you want zero delay, or a close to zero as possible. So you wouldn’t want H.264 streams – or so you thought.

Those clever H.264 gnomes had a solution, called H.264-Intra. Instead of encoding groups of pictures and data, Intra creates a stream made of individual I-Frames, compressed images. For our AV Geek readers, this is similar to Motion JPEG 2000, used in many IP video switching systems. It’s great for fast switching, as the decoder doesn’t have to buffer frames and calculate, it simply uncompresses each frame as they arrive.

An H.264-Intra decoder in a presentation room context could be the best of both worlds, able to quickly switch between Intra-generated content, but also able to playback standard Inter-frame content as well.

The Catch Is….

An existing MPEG decoder likely isn’t expecting to process 4K or intra-frame content. It’s a forgivable oversight. Vendors thought like consumers and didn’t realize that other options are part of the H.264 standard and have value for commercial applications.

But it’s an interesting concept – a more universal streaming format for presentation and distribution content, Is it better? Haven’t seen a live demonstration as yet. Is it zero latency? Of course, nothing is zero – any amount of processing adds delay, so that’s a “we’ll see” as well. Another benefit is that MPEG supports captioning data for ADA requirements – presentation codecs like Motion JPEG 2000 don’t support that.



AV Over IP – A Primer


AV Over IP is the term for technology that delivers audio visual content over Ethernet. The term also implies that content traditionally sent or switched over analog or digital switching now employs IP packets and standard Ethernet switches between the source and destination.

There are two basic AVoIP applications, Distribution and Presentation.

Distributed IP Delivery

  • Content is distributed over a large area
  • Endpoints described as encoders and decoders
  • Usually a high ratio of decoders to encoders
  • Streams are highly compressed to save network bandwidth
  • Half- to 1-second latency is acceptable, depending on application
  • Streams can provide captioning data and audio sync

Distributed content is sent over large areas, including nationwide through Netflix or over a single site from cable channels, modulated over RF coax, or via an IP network. Due the larger scale of distribution, streams should be as small as possible, able to carry ADA captioning, and don’t require instant switching for viewing. The standard format for this application is MPEG, usually MPEG2 for off-air TV and in-house RF channels, and H.264 for commercial and consumer IP-based TV.

MPEG is designed for maximum compression – H.264 can easily compress a 3Gbps (bits per second) 1080p video to a 15 mbps stream, a 200:1 ratio, and consumer streams are compressed far more. The secret sauce is called the GOP, the Group of Pictures, and only the first frame is an actual image. MPEG compresses the first frame into an image similar to a JPEG. The encoder then captures a few more frames and notes objects that move or change color, and saves just that data. This is called inter-frame encoding, as the encoder is continually cross-referencing frames. When a decoder changes a stream, it has to capture the first several frames in the next GOP and rebuild the video back to its original content. This deconstruction/reconstruction process takes time, creating about a half- to one-second delay. That’s why you experience a pause when changing MPEG channels.

Presentation IP Delivery

  • Displayed in a defined area
  • Endpoints described as transmitters and receivers 
  • Simpler to define and expand input/output configurations
  • Large streams require a dedicated IP network
  • Instant switching, about 2-4 frames
  • Streams composed of images and audio tracks, no captioning data

When a system is delivering a live presentation or event, the content on the video screens must to be in sync with what’s on the stage or podium, and cameras need to be changed instantly. Visible latency is disconcerting to the presenter and audience. This is the traditional application for AV switching systems.

The new counterpart to AV Switching is routing content over an IP network. Instead of a central switch with multi-port input and output cards, designers can define any number of individual transmitters and receivers, routed through a standard Ethernet switch.  Video frames are converted to stream packets using Motion JPEG 2000, typically compressed to 3:1 to 20:1 ratios. As the stream consists of individual images, switching is fast, with little latency. 

However, latency is a bit looser than AV’s vertical-interval switching. IP guarantees delivery, but there is no central clock to lock down timing, so there are network variables that could affect latency. As with AV switching extensive scaling at the input and output points can affect latency as well. As there is not sync data as there is in MPEG, audio could be behind – or ahead of the video, especially if the video is highly compressed and scaled and the audio just passed through. I would imagine some IP systems offer settings to minimize audio timing issues. 

What is the Effect of 4K video on commerical IP distribution systems?

For commercial systems, 4K is more of a marketing pitch than a reality. Current room and screen designs can’t take advantage of 4K, and very little content will be 4K for some time. It just isn’t needed. 

MPEG distribution technology is typically limited to 1080p. However, H.264 does support 4K and “zero latency“. For the present, 4K codecs such as HEVC (H.265) and VP9/AV1 aren’t usable for commercial applications.