This blog post is also available on Medium

Recently, HTMLCanvasElement.captureStream() was implemented in browsers. This allows you to expose the contents of a HTML5 canvas as a MediaStream to be consumed by applications. This is the same base MediaStream type that getUserMedia returns, which is what websites use to get access to your webcam.

The first question that comes to mind is, of course: “Is it possible to intercept calls to getUserMedia, get a hold of the webcam MediaStream, enhance it by rendering it into a canvas and doing some post-processing, then transparently returning the canvas’ MediaStream?”

As it turns out, the answer is yes.

We built a cross-platform WebExtension called Zombocam that does exactly this. Zombocam injects itself on every webpage and monkey-patches getUserMedia. If a webpage then calls getUserMedia, we transparently enhance the camera and spawn a floating UI in the DOM that lets you control your different filters and settings. This means that any website that uses your webcam will now get your enhanced webcam instead!

This blog post is a technical walk-through of the different challenges we ran into while developing Zombocam.

Monkey-patching 101

Monkey-patching getUserMedia essentially means replacing the browser’s implementation with our own. We supply our own getUserMedia function that wraps the browser’s implementation and adds an intermediary canvas processing step (and fires up a UI). Of course, since getUserMedia is a web JS API, there are one million different versions that need to be supported. There’s Navigator.getUserMedia and MediaDevices.getUserMedia, and then vendor prefixes on top of that (e.g. Navigator.webkitGetUserMedia and Navigator.mozGetUserMedia), and then there are different signatures (e.g. callbacks vs promises), and then on top of that again they historically support different syntaxes for specifying constraints. Oh, and they have different errors too. To be fair, MediaDevices.getUserMedia, the one true getUserMedia, solves all of these problems, but the web needs to wait for everyone to stop using the old versions first.


All of this boils down to having to type a lot of code to iron over the inconsistencies between different implementations, but in the happy case we end up with something like:

The rendering pipeline

Most of the effects and filters in Zombocam are implemented as WebGL fullscreen quad shader passes. This is a WebGL rendering technique that essentially lets us generate images on the fly on a per-pixel basis by using a fragment shader. This is elaborated upon in thorough detail in this excellent article by Alexander Oldemeier. Using this technique means that the image processing can be done on the GPU, which is essential to achieve smooth real-time performance. For each video frame, the frame is uploaded to the GPU and made available to an effect’s fragment shader, which is responsible for implementing the specific transformation for that effect.

Effects in Zombocam are split into three main categories: color filters, distortion effects and overlays. Filters in the first categories are implemented as non-linear per-channel functions with hard-coded mappings of input to output values in each frame. The idea is that a color grading expert creates a nice-looking preset using his or her favorite color grading tool. Then that color grading is applied to three 0–255 gradients, one for each color channel. The color graded outputs then serve as lookup tables for the pixel values in order to create a color graded output. This is a simplified version of the technique elaborated upon in this excellent article by Slick Entertainment.

Distortion effects are implemented as non-linear pixel coordinate transformation functions on the input image. That is, the pixel at coordinate (x, y) in the transformed image is copied from the pixel at coordinate f(x, y) in the original image. As long as you define f correctly, you can implement swirls, pinches, magnifications, hazes and all sorts of other distortions.

Finally, overlay effects simply overlay new pixels on parts or all of the frame. These new pixels can be sourced from anywhere, including other video sources. This effectively lets us overlay Giphy videos directly in the camera stream! Productivity will never be the same.

Since effects can be chained in Zombocam, the output from one effect’s rendering pass is fed directly as input to the next effect’s rendering pass. This opens for a wide array of different possible effect combinations.

Zombocam can turn you into a cyclops if you’re not careful when chaining effects!

Works everywhere! (*)

In theory, this approach works everywhere out of the box, so you can use when you’re snapping a profile picture on Facebook, hanging out in video meetings on Appear.in or Google Hangouts. In practice, however, the story is a little more nuanced. Reliably monkey-patching getUserMedia in time in a cross-browser fashion via injection from a WebExtension without going overboard with permissions turns out to be hard in some cases. This means that if an application is really adamant at calling getUserMedia reeeally early in the page’s lifetime, getUserMedia might not be monkey-patched yet. In that case, Zombocam will simply never trigger, and it will be as if it weren’t ever even installed.

When attempting to transparently monkey-patch APIs one has to take extreme care to make sure that the monkey-patching actually is transparent. That means properly forwarding all sorts of properties on the Streams and Tracks returned from getUserMedia that applications might expect and depend on.

One specific example of this that we ran into was with Appear.in’s new premium offering, where you can screen-share and show your webcam stream in your meeting room at the same time. The application relied on the name of one of the Tracks to be “Screen”, which we didn’t properly forward to our Tracks that we got from our canvas. Because of this, Appear.in didn’t know which of the tracks was the screen-sharing track, and things stopped working. Properly forwarding the name property solved the issue, and we learned an important lesson in the virtues of actually being transparent when trying to transparently intercept APIs.

What’s next: audio filters

With the new release of Zombocam we’ve taken it one step further and enhanced getUserMedia audio tracks as well using the Web Audio API. More on that in a later blog post!

UiO Telenor Digital Research Competition

Kickstart your career!

Join SAI and Telenor Digital’s research competition! SAI & Telenor Digital invite all anthropology and SV students to participate.

How will this work, and how can I participate?

  1. Submit a short 1-2 page research proposal by 01.02.2017, to be evaluated by SAI and Telenor staff.
  2. February 2017: the proposals that made it to the final will be announced. All finalists get a small prize. The finalists will have about one month to complete their proposed research.
  3. All finalists will receive individual guidance from SAI and Telenor.
  4. The finalists will present their findings to Telenor and SAI staff, in late March
  5. At this event the winner will be announced, the first place winner get’s a prize valued at (approx) 10 000 NOK, as well as a shadow day of their dreams!

Can we participate in groups?

Absolutely! If your team wins, the prize will be split amongst the team members.

What should my research proposal include?

  • A clearly defined research question(s) that is anchored in anthropological thinking and methodology
  • Outline of methodology
  • Clearly defined timeline (no longer than 3 weeks)
  • Study population: Who are you interested in? and how will you recruit them for your study?
  • Ethical considerations

What topic should my study be on?

The topic is for this competition is communication within Norwegian families:How do families organise and communicate amongst themselves? How do they navigate, choose, understand and feel about the disperse landscape of communication tools? Cork-boards, oral messages, SMS, chat groups; use ethnography to capture the lived experience of families.

What should my research question be?

You are free to choose your own research question(s) as long as it relates to the topic: ‘Communication within Norwegian Families’. Below are some examples for inspiration:

  • Private life and public spaces:
  • How do families, and its different members experience and understand public and private with relation to communication?
  • Communication in everyday life:
  • How do families communicate when they plan and organise? What tools do they use and what needs do the different family members have?
  • How do family dynamics and communication practices within families influence each other?

When is the deadline?

Submit your proposal by 01.02.2017

What can I win?

  • First prize for a value of (approx) 10 000 NOK
  • Runners-up prizes for a value of 1000-2000 NOK
  • The opportunity to present your findings to Telenor Digital.
  • Shadow day tailored especially for you! Get first-hand experience on how your education can be put to work in an organisation like Telenor.

I’m still confused, how can I contact you?

Send an email to cecilie dot perez at telenorditigal dot com

Polluted air on a city

Earlier this year I moved from Amsterdam to Barcelona with my family. As soon as we settled in Barcelona, there was something we immediately noticed: we could smell the pollution. Locals here didn’t notice it and, sure enough, after several months here I can’t smell it anymore myself. Out of sight, out of mind. Scary.

This got me interested in day-to-day pollution and how it impacts our lives. Turns out, we breathe more than 20.000 liters of air every day. If that sounds like a lot, it’s because it is! And yet we are seldomly concerned about the quality of the air that goes through our lungs. We should be though, because air pollution has vast implications for our health. Please allow me to scare you a little bit:

Quite a grim reality. The stream of shocking research findings is endless. In short, we can conclude that air pollution is killing us slowly, in ways we may not even know yet.

What can we do about it?

That’s the million dollar question. It might seem that the forces involved are so huge that we can’t do much as individuals.

But I believe that the first step is to be aware and informed. Do you know how polluted it is where you are are standing right now? And how this pollution is affecting your life?

Imagine having a service that gives you real-time information about pollution around you and gives you advice on how to deal with it. This would empower people and give them control of how much they expose themselves to pollution, and would also give them a tool to take action on reducing those levels of pollution city-wide.

For some time now I’ve been thinking of making such a service. Turns out, I am not the only one that likes that idea!

The project

At Strategic Engineering, we set out to explore the possibility of making such a service. We are proposing it as a project called WAQI (Wearable Air Quality Indicator).

We think that even if WAQI is still just a proposal, it can influence other projects at Telenor Digital, because it touches on many interesting areas:

IoT

We will measure the air quality using a wearable device equipped with a few sensors. The first iterations of the device will be simple, using Bluetooth LE for communications. Telenor Digital has made some stints into IoT and we’ll be tapping those for in-house knowledge and networks.

Front-end

We want the front-end to be the star of the show. Users will see creative and useful visualizations on the current air quality, along with suggested actions about what to do about it. The user will interact with the wearable exclusively through their phone.

Back-end

The back-end will potentially be dealing with an enormous amount of anonymized data, along with its geolocation coordinates. We’ll need real-time data-acquisition and processing, along with a flexible model that allows us to do deep analysis and train models on this data. Given enough data, it should give us interesting insights on how pollution behaves and evolves in our cities.

These three areas are interesting for many future projects. In-house IoT knowledge, real-time databases, Machine Learning on geolocated datasets… the department can benefit of any experience and knowledge acquired during exploring this project.

University of Oslo joins the effort!

One of the persons who immediately liked the idea was Hakeem, creative extraordinaire at Telenor. Turns out that Haq has connections with UIO, and he proposed WAQI as a potential research project for Interaction Design students. And it got accepted!

This is very important, because it means that we have a whole extra team of highly motivated (they are betting their semester results on it!) interaction designers helping us find ways to engage users that we might not have come up with. It’s a luxury that most projects can’t count on.

Besides interaction design related to the device and the phone application, they will research target users and scenarios where measuring air quality can be useful (from personal to professional uses), which will make us think out of the box about how to implement the project in ways it fits the different scenarios. At the same time, we’ll be helping them shape their university project and showing them how a project gets implemented outside the walls of the university, in a real-world company. We’ll be meeting every week to share progress and integrate each team’s results.

We have already met the design students who will be looking at ways to make WAQI compelling to potential users. They are as excited and eager as we are, and we are extremely grateful to have such a great help!

Students from UIO A videoconference-selfie of the students that will help us with WAQI.

The future

We are very excited about the possibilities for this project. We are at a very initial stage, trying to find the main challenges and uses for it, but the more we think about it, the more we see a clear need for something like WAQI. We are completely open to suggestions, ideas and constructive criticism, so go ahead and drop us a line!

This article explains how MediaStreams work in Firefox and the changes I did to them to accommodate cloning.

First of all. What is a MediaStreamTrack, and how can you clone it?

A MediaStreamTrack represents a realtime stream of audio or video data.

It provides a common API to the multiple producers (getUserMedia, WebAudio, Canvas, etc.) and consumers (WebRTC, WebAudio, MediaRecorder, etc.) of MediaStreamTracks.

A MediaStream is simply put a grouping of MediaStreamTracks. A MediaStream also ensures that all tracks contained in it stay synchronized to each other, for instance when it gets played out in a media element.

Cloning a track means that you get a new MediaStreamTrack instance representing the same data as the original, but where the identifier is unique (consumers don’t know it’s a clone) and disabling and stopping works independently across the original and all its clones.

Now, how does all this come together in Firefox?

Continue reading »

Click the image above to start playing the video

Continue reading »