SVE (Streaming video clip Engine) is the video processing pipeline the has remained in production at on facebook for the previous two years. This file gives an introduction of its design and also rationale. And it certainly got me thinking: intend I essential to build a video processing pipeline, what would I do? Using one of the major cloud platforms, straightforward video uploading encoding, and also transcoding is virtually a tutorial level use case. Here’s A Cloud Guru describing just how they built a transcoding pipeline in AWS utilizing Elastic Transcoder, S3, and also Lambda, in less 보다 a day. Or you can use the video encoding and transcoding services in Google Cloud Platform. Or Microsoft Azure media services. Which is a an excellent reminder, when you look at what’s affiliated in structure SVE, the “you are not Facebook.” what on the journey from basic and reasonably small scale video upload and processing, come Facebook scale (tens of countless uploads/day – and also presumably cultivation fast) that starts come make feeling to roll-your-own. Wherein is the handover point? (And by the moment you with it remember, the cloud platforms will be even more capable than they room today). Let’s take a look at the situation inside of on facebook to obtain our bearings.

You are watching: How long does it take facebook to process a video

Video at Facebook

at Facebook, us envision a video-first human being with video clip being used in plenty of of our apps and also services. ~ above an average day, videos space viewed much more than 8 billion times on Facebook. Every of this videos demands to it is in uploaded, processed, shared, and also then downloaded.

There are an ext than 15 various applications at facebook that combine video, which collectively ingest 10s of numerous uploads and also generate billions of video processing tasks. You’ve more than likely heard of several of those apps ;), which include:

Facebook video clip postsMessenger videosInstagram stories360 videos (recorded on 360 cameras and processed come be consumed by VR headsets or 360 players)

Beyond encoding and transcoding, it transforms out there’s a lot much more processing going on through those videos:

countless apps and also services include application-specific video operations, together as computer vision extraction and also speech recognition.

In fact, the DAG for handling Facebook video uploads averages 153 video clip processing jobs per upload! Messenger videos i beg your pardon are sent out to a certain friend or team for practically real-time interaction have a tighter handling pipeline averaging simply over 18 task per upload. The Instagram pipeline has actually 22 work per upload, and also 360 videos create thousands the tasks. (These numbers encompass parallel processing of video segments).

Reading v the paper, it becomes clear the the one metric the matters at Facebook once it comes to video is time to share. How long does it take from as soon as a human being uploads (starts uploading) a video, to once it is obtainable for sharing?

Low latencyFlexibility to assistance a range of different applications (with custom processing pipelines)Handle system overload and faults

How Facebook provided to procedure video

Prior come SVE, Facebook used to procedure video through ‘MES’ – a Monolithic Encoding Script. As you can imagine, the was hard to maintain and also monitor in the existence of multiple different and evolving applications demands for encoding and also transcoding. Yet most importantly, it provided a batch-oriented sequential handling pipeline. A record is uploaded and also stored, this then triggers processing, and when processing is complete the outcomes are written to storage again and become obtainable for sharing. That doesn’t sound too various to what you can put in addition to S3, Lambda, and Elastic Transcoder.

The pre-sharing video pipeline comprises on-device recording, uploading, validation (is the document well formed, reparation where possible), re-encoding in a selection of bitrates, and also storing in a BLOB storage system.

The major issue with MES is it has actually a bad time-to-share. Just uploading can take minute (with 3-10MB videos, over 50% the uploads take much more than 10 seconds, and with larger video clip sizes us are quickly into the minutes and also even 10s of minutes). The encoding time for larger videos is likewise non-trivial (again, measure in minutes). Also the time forced to durably store a video makes a coherent contributed come the overall latency. Batch is the adversary of latency.

Introducing SVE

The main idea in SVE is to process video data in a streaming fashion (strictly, mini-batches), in parallel, as videos relocate through the pipeline.

SVE provides low latency by harnessing parallelism in three methods that MES walk not. First, SVE overlaps the uploading and also processing of videos. Second, SVE parallelizes the handling of videos through chunking videos into (essentially) smaller videos and processing each chunk separately in a big cluster that machines. Third, SVE parallelizes the save on computer of uploaded videos (with replication for fault tolerance) with processing it.

The net an outcome is 2.3x-9.3x palliation in time-to-share contrasted to MES. Learning what we do around the impact of user-perceived performance on company metrics, my bet would certainly be that this provides a really material distinction to Facebook’s business.


SVE division videos right into chunks referred to as GOPs (group the pictures). Wherein possible, this is done on the customer device. Every GOP in a video clip is independently encoded, so the each have the right to be decoded without referencing previously GOPs. Throughout playback, this segments can be played individually of each other.

The video chunks space then forwarded directly to a preprocessor (rather than to storage as in the old system). The preprocessor submits encoding jobs to a spread worker farm via a scheduler, in parallel with writing the original video clip to storage. As component of submitting work to the scheduler, the preprocessor dynamically generates the DAG the processing work to be offered for the video. Worker procedures pull work from queues. Yes sir a basic priority system – each swarm of workers has actually a high-priority queue and a low-priority queue. Once cluster utilisation is short they traction from both queues, under heavier pack they pull only from the high-priority queue.

You deserve to probably imagine what together a mechanism looks favor – the details are in the paper, but it complies with pretty much what you would certainly expect. So in the staying space, I want to to mark a few areas I found interesting: the an inspiration for structure a whole new data processing mechanism rather than reusing an present one; the DAG execution system; and also the approaches supplied to attend to heavy load and also failures.

Why not just use X?

before designing SVE we examined currently parallel handling frameworks consisting of batch processing systems and also stream handling system. Batch handling systems choose MapReduce, Dryad, and also Spark all assume the data to be processed currently exists and also is accessible… Streaming processing systems prefer Storm, Spark Streaming, and also StreamScope overlap uploading and also processing, but are designed because that processing continuous queries instead of discrete events.

The previous don’t optimise for time-to-sharing, the last don’t assistance the overload manage policies and custom DAG per-task model that on facebook wanted: “we discovered that virtually none that our architecture choices, e.g., per-task priority scheduling, a dynamically developed DAG every video, were detailed by currently systems.

A video processing programming model based upon DAGs

As the file hints at, video processing in ~ Facebook has reencoding, yet is absolutely not minimal to that. There might be video analysis and classification, speech-to-text translation and also all kinds of other processing procedures involved.

Our main goal because that the abstraction that SVE presents is to make it as straightforward as possible to add video clip processing that harnesses parallelism. In addition, we want an abstraction that permits experimentation with brand-new processing, allows programmers to carry out hints to boost performance and reliability, and also that provides fine-grained security automatic. The stream-of-tracks abstraction achieves every one of these goals.

The present of monitor abstraction provides two size of granularity: tracks within a video (e.g., the video clip track and the audio track), and also GOP-based segments in ~ a video. Part tasks have the right to be specified to operate on simply a solitary track (e.g. Speech-to-text), or a solitary segment. Others might operate ~ above the full video (e.g., computer system vision based video clip classification). Programmers write jobs that execute sequentially over your inputs, and connect them right into a DAG.

Here’s a streamlined view the the facebook app video clip processing DAG:


The initial video clip is separation into video, audio, and also metadata tracks. The video and audio tracks space then replicated n times, once for each encoding bitrate, and also the encoding jobs operate in parallel end these. The output segments throughout tracks are then joined because that storage.

Pseudo-code for generating the DAG looks like this:


Dynamic generation that the DAG enables us come tailor a DAG come each video clip and gives a flexible way to tune performance and also roll out brand-new features. The DAG is tailored come each video based on details video features forwarded native the customer or probed through the preprocessor. For instance, the DAG for a video clip uploaded at a low bitrate would not include tasks for re-encoding the video at a greater bitrate.

Azure Media Services has actually a “premium encoding” alternative that likewise gives friend the capability to define your very own encoding workflows. AWS Elastic Transcoder supports ‘transcoding pipelines.’ GCP it s okay a failure here due to the fact that I can not easily uncover out from the public documentation what capability they have here – whatever seems come be concealed behind a ‘Contact Sales’ link. (Maybe the details is accessible of course, it might just it is in a search failure on my part).

Fault tolerance and overload control

There’s fairly a little bit of interesting material here, and also I’m almost out that space, so for an ext details please view sections 6 and also 7 in the complete paper. Ns will simply highlight right here the retry policy on failure. A failed task will be tried approximately 2 times locally on the exact same worker, then as much as 6 an ext times on an additional worker – leading to up come 21 execution attempts prior to finally offering up.

us have discovered that such a huge number of retries does rise end-to-end reliability. Analyzing all video-processing jobs from a current 1-day duration shows the the success price excluding non-recoverable exceptions on the an initial worker boosts from 99.788% to 99.795% ~ 2 retries; and on different workers boosts to 99.901% after 1 retry and 99.995% at some point after 6 retries.

Logs capture that re-execution to be necessary, and also can be mined to uncover tasks through non-negligible retry rates due to non-deterministic bugs.

Overload by the way comes from 3 sources: essential (e.g. Social events such together the ice-bucket challenge); the load-testing system because of Kraken; and also bug-induced overload.

The critical word:

SVE is a parallel processing frame that specializes data ingestion, parallel processing, the programming interface, error tolerance, and overload regulate for videos at enormous scale.

See more: Good 3 On 3 Basketball Team Name S (Best, Funny, Cool, Fantasy Names)

If girlfriend envision a video-first world, and content sharing at scale is what you do, then every this specialism really can be worth it.