Part 1 - Vanilla HTML5 Video Sucks. Media Source Extensions are the Answer

Introduction

At WIREWAX, providing a reliable, robust and smooth video platform is imperative to creating an immersive interactive experience. In this 4-part series we take a look under the hood of our interactive player and peek at some of the technologies we've been experimenting with to deliver the next evolution of video. The best bit? We're going to show you how to do it yourself.

If you want to skip straight to the how see Part 2.

Before HTML5 the only way to watch video in the browser was with Flash (you could argue the QuickTime browser plugin got there first, but it was nothing compared to the prevalence of Flash). Apple were undoubtedly responsible for the surge in HTML5 adoption after the launch of the iPhone in 2007 and Steve Jobs' rather brutal punch in the balls for the proprietary and ageing VM plugin. Of course, native video support has enormous advantages but was it ready for such a wide and speedy adoption? And could it deliver the same abilities as Flash?

In short, no. Missing were: dynamic switching, stalling detection, buffering control and (for better or worse) DRM. Adobe had been perfecting secure video delivery to practically any browser with the ability to serve bitpart and different sources depending on the end-user's connection since 2002.

HTML5 video was quickly heralded as an alternative to Flash but it was very far from matching - let alone beating - Flash for video. It's only in 2015 that YouTube, the biggest video platform ever, has finally been able to get HTML5 video into a state it feels comfortable matches Flash for stability, performance and monitoring.

After initially using Flash for the desktop experience and HTML5 for mobile we switched to serving our content exclusively through HTML5 at the start of 2014.

Why did we abandon Flash months ago? Primarily we wanted an identical experience between mobile and desktop. Unfortunately vanilla HTML5 video can be fiendishly unstable and unpredictable. Notably it has little or no error reporting, rubbish stalling detection and a lack of support for dynamic quality switching. We've been forced to cobble together patches and wrappers with our own proprietary monitoring solutions and rendition switching before we could allow HTML5 video to become the norm for millions of WIREWAX users. Continuous monitoring of the playback and regularly tweaking code to support hundreds of frustrating idiosyncrasies of HTML5 video handling by a wide range of browsers feels like a constant battle.

We've only become truly happy using HTML5 video since the advent of Media Source Extensions (MSE). This extension of the video element allows JavaScript to override the browser's handling of the video stream and do a whole range of powerful things like controlling loading and delivery of media, client-side. No special media server license required. Your JavaScript can manage multiple video file renditions to cater for end-user bandwidth capabilities. It can even switch playback mid-stream with bitpart file chunking. It's a significant leap forward for media handling in the browser, far superior than the standard HTML5 video tag and, in many ways, an improvement on anything Flash could handle.

Note: The MSE spec is still in a draft form. Its currently supported by default in the latest Chrome, and can be enabled in Firefox settings. Currently only IE11 on Windows 8.1 supports MSE but is restricted to MP4 formats. In this series we're sticking to WebM support only.

What's wrong with vanilla HTML5 Video?

No control over how your video files download

With classic HTML5 Video you simply specify a video source file in the source attribute. Your browser then opens a pipe to the video file and stuffs it into your video player however it likes, when it likes. If it goes wrong you have no visibility and no way of fixing it.

Stalling detection about as reliable as Vladimir Putin at a peace summit in Minsk

About the worst thing that can happen for our viewers is when the video freezes. This can happen if the video buffer screws up, the browser stops downloading the video file (see above) or the underlying browser is just being cranky. HTML5 video is supposed to alert you when this happens using a stalling event, but often this arrives late or not at all. For WIREWAX this means our tags will still happily be flying across the screen assuming the video is playing.

There's no point perfectly motion-tracking an object moving across a screen or accurately following someone's face moving through a crowd if the interactive tags aren't where they're supposed to be simply because the video has stalled without an event. Here's what it can look like:

No adaptive streaming. Slow network? Too bad.

With HTML5 video there's no way to switch the video rendition on the fly without interrupting the experience. If your network starts running slow you either have to replace the video with a new one or simply wait for it to buffer. With Flash it's been possible to use adaptive streaming to change the video quality on the fly for years.

With MSE we can switch the video source between multiple renditions stored on a Content Delivery Network (CDN). We don't even need to use a streaming server.

Below a network slowdown is simulated and at 00:07 the video stream smoothly switches to a lower bitrate rendition with no jarring pauses and no streaming server. This would be impossible with classic HTML5 video. In a shameless plug you can also see how cool our face recognition technology is ;D

Note: This is a video of a working example, for those without a MSE capable browser. Scroll to the bottom for a working example.

No control over how the video buffers

With classic HTML5 video its completely up to the browser how it buffers. Even if you know for a reliable experience you need certain chunks there's no way to tell the video to buffer these parts, but not other parts. You can end up necessarily wasting your viewer's bandwidth preloading the entire video. With MSE we can customise exactly how each video buffers to cater to specific requirements.

  • Creating a Choose Your Own Adventure style video (where the viewer can jump to another video segment by making choices in-frame)? Ensure a seamless experience by buffering all of the entry points to each segment prior to starting playback.
  • Is an engaged viewer approaching an interactive tag which may open a video-in-video overlay? Start buffering the nested video now so the viewer doesn't have to wait for content to download when the interactive tag is clicked.
  • Want to save data transit costs? Be conservative with how much unwatched video you download.
  • Be more sympathetic to your audience and only serve video chunks they are more than capable of receiving without annoying buffer time.

What does it look like in practice?

At the end of this tutorial you should be able to create something looking like this. But hopefully prettier...

Note: this demo will only work in modern Chrome and Firefox if MSE is enabled in config. Internet Explorer is not supported.

This video begins using a 1080 rendition, when you click the Simulate Network Slowdown button it will switch to a lower 180 rendition. This change will happen visually in the next cluster. This allows the current cluster to continue playing at the same resolution so the video playback isn't interrupted.

The current download rate, as a ratio of average download time per second of video per second of playback is also displayed.

All of the code for this example and the upcoming examples are available in our Git repository.

In the next article we explain how to prepare a clustered WebM file and how to build a basic Media Source Extensions player.

Part 2 - Sounds great! How the frick do I do it?