Part 3 - Buffering a video cluster by cluster

Introduction

Previously we explained how to build a basic Media Source Extension HTML5 player which downloads a complete WebM video file and places it into the Media Source buffer. To create an adaptive streaming player we need to be able to select specific parts of a video to download and place into the buffer.

Extracting the WebM cluster data

The first thing we need to do is extract the cluster data metadata from the clustered WebM rendition. This data is stored in the file in the form of EBML, a binary extension of XML. While it would be possible to write an EBML parser in Javascript we prefer to do it server-side at the point of generating a rendition and store it in JSON.

As in Part 2 we're going to use acolwell's Media Source Extension Tools, this time the msejsonmanifest tool, ie:

./mse_json_manifest <input-file.webm> -> <output-file.json>

Note: If you'd prefer to use Python instead of Go there's a good alternative outlined on the IONCANNON blog, however the format of the resultant JSON will be slightly different.

The generated JSON manifest should be of the form:

{ "type": "video/webm; codecs="vp8, vorbis"", "duration": 131724.000000, "init": { "offset": 0, "size": 4651}, "media": [ { "offset": 4651, "size": 228889, "timecode": 0.000000 }, { "offset": 233540, "size": 1426141, "timecode": 2.228000 }, ... ] }

An example WebM file with associated JSON manifest can be found here:

http://edge-assets.wirewax.com/blog/vidData/example.webm

http://edge-assets.wirewax.com/blog/vidData/example.json

Understanding the JSON

The WebM format begins with an initialization cluster. This cluster is analogous to a header; it includes metadata about the rest of the file, telling the video decoder where in the file to look for different parts of the video. It is from this cluster that we have extracted our JSON metadata. The HTML5 decoder needs to be given this data to successfully play the video.

The init field in the output JSON gives us the byte-range of the initialization cluster, so in the example above we would look in the byte range 0-4,651.

The duration field is the duration of the video in milliseconds. We'll want to divide this by 1,000 to make it consistent with the timecode field, which is in seconds.

The media list is a list of the non-init clusters in the video, each of which should begin with an Intra-frame so that each cluster can be played independently of previous clusters.

The offset refers to the byte-offset of that cluster, so our first cluster starts at byte 4,651. Not that this is immediately after the end of the initialization cluster as we'd expect.

The size refers to the length of this cluster, in bytes. This means our first cluster occupies the byte-range 4,651-233,539.

The timecode refers to the start point of the cluster in the video timeline, in seconds. Note that our first cluster begins at 0 seconds as we'd expect.

Preparing our cluster data

We're going to be using the cluster data a lot in the upcoming code so it's important we store it in a logical manner. A cluster will initially be defined by the data we receive from the cluster JSON. This will yield the fields:

  • Starting byte (inclusive)
  • End byte (exclusive)
  • Time-range start (inclusive)
  • Time-range end (exclusive)
  • Is it an initialization cluster?

We will also want to store the state of the cluster as it moves through our data pipeline. This should be straightforward:

Download Requested Downloaded, queued to be added to SourceBuffer Buffered in the SourceBuffer

Remember the SourceBuffer can only be appended to when it's not in the updating state, so cluster data will need to be queued up waiting for an opportunity to be added.

This means we'll need fields flagging:

  • requested - it's been requested and we're waiting for it to be downloaded.
  • queued - it's been downloaded and is now queued up waiting to be added to the buffer.
  • buffered - it's associated data has been added to the source buffer.

Finally we'll also need a field in which store the video data itself, once it's been downloaded. So we're going to define an object with these properties:

function Cluster(byteStart, byteEnd, isInitCluster, timeStart, timeEnd) { this.byteStart = byteStart; //byte range start inclusive this.byteEnd = byteEnd; //byte range end exclusive this.timeStart = timeStart ? timeStart : -1; //timecode start inclusive this.timeEnd = timeEnd ? timeEnd : -1; //exclusive this.requested = false; //cluster download has started this.isInitCluster = isInitCluster; //is an init cluster this.queued = false; //cluster has been downloaded and queued to be appended to source buffer this.buffered = false; //cluster has been added to source buffer this.data = null; //cluster data from vid file }

This object now stores all of the data we need to know about a specific cluster, as well as its state in our player pipeline.

Using HTTP Range to download a specific cluster

We now need to be able to download a specific cluster, and take advantage of some of the possibilities download video parts manually allows. Here we define a download method using HTTP Range which includes a time out and a retry with a cache buster to avoid situations where the browser cache screws up.

Cluster.prototype.download = function (callback) { this.requested = true; this._getClusterData(function () { self.flushBufferQueue(); if (callback) { callback(); } }) }; Cluster.prototype._makeCacheBuster = function() { var text = ""; var possible = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"; for (var i = 0; i < 10; i++) text += possible.charAt(Math.floor(Math.random() * possible.length)); return text; }; Cluster.prototype._getClusterData = function (callback, retryCount) { var xhr = new XMLHttpRequest(); var vidUrl = self.sourceFile; if (retryCount) { vidUrl += '?cacheBuster=' + this._makeCacheBuster(); } xhr.open('GET', vidUrl, true); xhr.responseType = 'arraybuffer'; xhr.timeout = 6000; xhr.setRequestHeader('Range', 'bytes=' + this.byteStart + '-' + this.byteEnd); xhr.send(); var cluster = this; xhr.onload = function (e) { if (xhr.status != 206) { consoel.error("media: Unexpected status code " + xhr.status); return false; } cluster.data = new Uint8Array(xhr.response); ; cluster.queued = true; callback(); }; xhr.ontimeout = function () { var retryAmount = !retryCount ? 0 : retryCount; if (retryCount == 2) { self._failed(); } else { cluster._getClusterData(callback, retryCount++); } } };

Plugging our clusters into a basic player

The next step is to plug this cluster data into our basic player, to make a slightly-more-advanced player. We're going to download our cluster JSON file, create a cluster object for each item in the media list and our initialization cluster. We'll simply download all of the clusters in sequence, one-by-one and append it to the buffer using a queue system.

First, we'll download the cluster data and wait for it to be completed before we create our MediaSource. To download all the clusters and place them in a list:

this.downloadClusterData = function (callback) { var xhr = new XMLHttpRequest(); var url = self.clusterFile; xhr.open('GET', url, true); xhr.responseType = 'json'; xhr.send(); xhr.onload = function (e) { self.createClusters(xhr.response); callback(); }; } this.createClusters = function (rslt) { self.clusters.push(new Cluster( rslt.init.offset, rslt.init.size - 1, true )); for (var i = 0; i < rslt.media.length; i++) { self.clusters.push(new Cluster( rslt.media[i].offset, rslt.media[i].offset + rslt.media[i].size - 1, false, rslt.media[i].timecode, (i === rslt.media.length - 1) ? parseFloat(rslt.duration / 1000) : rslt.media[i + 1].timecode )); } }

Now we need to make some methods to fulfil our basic requirements:

  • Download the initialization cluster
  • Download all the non-initialization clusters
  • Concatenate the Uint8Array data from the list of queued clusters into a single Uint8Array to append to the SourceBuffer
  • Flush our queue of queued clusters such that the initialization cluster is always added first, when the SourceBuffer is not already updating

To keep our code clean and simple we're going to use a functional toolkit library. You can use either UnderscoreJS or Lodash depending on how hipster you're feeling today ;)

this.createSourceBuffer = function () { self.sourceBuffer = self.mediaSource.addSourceBuffer('video/webm; codecs="vp8,vorbis"'); self.sourceBuffer.addEventListener('updateend', function () { self.flushBufferQueue(); }, false); self.setState("Downloading clusters"); self.downloadInitCluster(); } this.flushBufferQueue = function() { if (!self.sourceBuffer.updating) { var initCluster = _.findWhere(self.clusters, {isInitCluster:true}); if (initCluster.queued || initCluster.buffered) { var bufferQueue = _.filter(self.clusters,function(cluster) { return (cluster.queued === true && cluster.isInitCluster === false) }); if (!initCluster.buffered) { bufferQueue.unshift(initCluster); } if (bufferQueue.length) { var concatData = self.concatClusterData(bufferQueue); _.each(bufferQueue, function (bufferedCluster) { bufferedCluster.queued = false; bufferedCluster.buffered = true; }); self.sourceBuffer.appendBuffer(concatData); } } } } this.downloadInitCluster = function() { _.findWhere(self.clusters, {isInitCluster:true}).download(self.downloadNextUnRequestedCluster); } this.concatClusterData = function (clusterList) { var bufferArrayList = []; _.each(clusterList, function (cluster) { bufferArrayList.push(cluster.data); }) var arrLength = 0; _.each(bufferArrayList, function (bufferArray) { arrLength += bufferArray.length; }); var returnArray = new Uint8Array(arrLength); var lengthSoFar = 0; _.each(bufferArrayList, function (bufferArray, idx) { returnArray.set(bufferArray, lengthSoFar); lengthSoFar += bufferArray.length }); return returnArray; }; this.downloadNextUnRequestedCluster = function() { var nextCluster = _.chain(self.clusters) .filter(function(cluster) { return (cluster.requested === false && cluster.isInitCluster === false) }) .first() .value(); if (nextCluster) { nextCluster.download(self.downloadNextUnRequestedCluster); } else { self.setState("all clusters downloaded"); } }

We've also modified our createSourceBuffer method from the basic player so that it kicks the whole process off. We call the flush buffer queue method whenever a cluster is added to the queue or when the SourceBuffer has finished updating, so our queue is always added to the buffer at its earliest opportunity.

Building a simple buffering service

Now we're going to build a simple service, powered by the timeupdate event from the video element. We're going to check for unrequested clusters which are less than five seconds ahead of the current timestamp and then set them downloading.

this.createSourceBuffer = function () { self.sourceBuffer = self.mediaSource.addSourceBuffer('video/webm; codecs="vp8,vorbis"'); self.sourceBuffer.addEventListener('updateend', function () { self.flushBufferQueue(); }, false); self.setState("Downloading clusters"); self.downloadInitCluster(); self.videoElement.addEventListener('timeupdate',function(){ self.downloadUpcomingClusters(); },false); } this.downloadInitCluster = function () { _.findWhere(self.clusters, {isInitCluster: true}).download(self.downloadUpcomingClusters); } this.downloadUpcomingClusters = function () { var nextClusters = _.filter(self.clusters, function (cluster) { return (cluster.requested === false && cluster.timeStart <= self.videoElement.currentTime + 5) }); if (nextClusters.length) { _.each(nextClusters, function (nextCluster) { nextCluster.download(); }); } else { if (_.filter(self.clusters, function (cluster) { return (cluster.requested === false ) }).length === 0) { self.setState("finished buffering whole video"); } else { self.finished = true; self.setState("finished buffering ahead"); } } }

Here's one we made earlier

Above, you can see the buffering amount starts a few seconds above 0 at the start and moves forward as you watch the video.

All of the code for this example and the upcoming adaptive streaming player can be found in our Git repository.

What's coming up?

In the next article we'll implement adaptive streaming, allowing us to switch between different renditions of the video on the fly.

Part 4 - Adaptive Streaming