There are two things that I absolutely love about running neural networks in the web browser: the ability to use the APIs of the browser to input data into a model for training or testing, as well as rendering the training progress or output of the models such as activations, filters, and model structures to the screen. Both tasks are incredibly powerful and fairly easy to implement in modern browsers.
Note: All of the source code used in this book can be found here: https://github.com/backstopmedia/deep-learning-browser. And, you can access the demo of our Rock Paper Scissors game here: https://reiinakano.github.io/tfjs-rock-paper-scissors/. Also, you can access the demo of our text generation model here: https://reiinakano.github.io/tfjs-lstm-text-generation/.
A great way to present a trained Neural Network to a user is enabling the user to interact with the model by creating sample data directly in the browser and feeding it to the network. Thanks to the capabilities of modern browsers, these interactions can range from capturing images from the webcam, drawing sketches on a canvas, or recording audio with your built-in microphone.
Besides evaluating pretrained models in the browser, we can even train a gesture classifier using training images streamed from the webcam, train a MNIST-like model by drawing training samples on a canvas or build a simple voice-recognition network.
In this chapter, we will first cover the loading, extraction and manipulation of image data. In the next step, you will learn how to render image data as well as any two-dimensional data to the screen. We will cover image blending as well as drawing shapes on top of the original images for visualizing object bounding boxes.
In the following section, we will learn how to access the webcam or built-in cameras to decode the image data of the video or single image. In addition, we will also extract the raw data from the microphone stream. At the end of this section, we will load and decode audio files and output sounds on the device speakers.
In the last section, we take a look at the data retrieval and manipulation utilities of popular deep learning frameworks for the browsers such as TensorFlow.js, Keras.JS and WebDNN. We will learn how these frameworks facilitate data loading, image manipulation, and data conversion by providing useful APIs and utility tools.
The process of loading images in JavaScript is one of the simplest, yet very useful techniques for interactively evaluating trained neural networks in the browser. To use the image as an input for a machine learning algorithm, the image’s pixel values have to be extracted beforehand.
In this section, we will learn how to extract RGBA pixel data from images using the Canvas
API. We will not just load image data from DOM elements, but also from URLs only. Next, we will understand how to deal with the cross-origin
security policy to load remote resources from within JavaScript. In the end of this section, we will see how to fetch binary blobs from the network and cast them into typed array data types.
Let’s define an HTML image tag to load a local image to the DOM.
<img
src=
"data/cat.jpeg"
id=
"img"
></img>
Now we can access the image element using the global DOM API document.getElementById('img')
, which is provided in every browser. The image element which is of type HTMLImageElement
, does not provide a direct API to extract its pixel values. Please note, that the global document
object and many other JavaScript APIs are only available in the browser and NOT available in JavaScript runtimes without a Document Object Model (DOM), such as Node.js.
For pixel manipulations, modern browsers provide the Canvas
API, which can be used to programmatically draw pixel graphics to the screen. We will use this API to extract the pixel values from the image element. To extract data from a canvas
element, we first have to create a canvas Context
. Within this context, we can draw the image content to the canvas and consecutively access and return the canvas pixel data.
Note: You can find an extensive overview of the Canvas
API on MDN - Canvas API, including samples for styles, colors, animations, pixel manipulations and optimizations.
Let’s declare a function to implement this.
function
loadRgbaDataFromImage
(
img
)
{
// create a canvas element
const
canvas
=
document
.
createElement
(
'canvas'
);
// set the canvas dimension to the image size
canvas
.
width
=
img
.
width
;
canvas
.
height
=
img
.
height
;
// create a 2D rendering context
const
ctx
=
canvas
.
getContext
(
'2d'
);
// render the image to the canvas context
// at position 0,0 (left, top)
ctx
.
drawImage
(
img
,
0
,
0
,
img
.
width
,
img
.
height
);
// extract the image data
const
imgData
=
ctx
.
getImageData
(
0
,
0
,
canvas
.
width
,
canvas
.
height
);
// convert the image data to int32
return
new
Int32Array
(
imgData
.
data
);
}
In the above code, the getImageData
function returns an element of type ImageData
, which is an object with the attributes width
, height
, and data
. The data
attribute stores the pixel values as a typed array Uint8ClampedArray
. To use the image data for deep learning algorithms and frameworks at a later stage, we transform the array to type int32
.
Finally, we can call the function once the image is loaded. Keep in mind that the image is loaded asynchronously, and therefore we can extract the pixel values only once the image is fully loaded by the browser and the onload
event is triggered.
const
img
=
document
.
getElementById
(
'img'
);
img
.
onload
=
()
=>
{
const
data
=
loadRgbaDataFromImage
(
img
);
// Int32Array(40000) [255, 255, 255, 254, ...]
}
The above function returns the raw RGBA pixel values of the original image as a flat array with the dimensions (height, width, channel)
in the range [0, 255]
.
Besides using local (demo) images on a webserver, it is usually also very useful to allow the user to load images from remote locations, and to allow the user to change the url
parameter of an image. However, this comes with a great security risk because content could be loaded from any other resource and executed within the current context. Therefore, the browser automatically blocks so-called cross-site
requests to different domains, protocols or ports other than the current connection.
The Cross-Origin Resource Sharing (CORS) policy allows a browser to perform a cross-origin HTTP request to a resource by setting additional HTTP headers. For image elements, use the crossOrigin
attribute and set it to anonymous
, which will explicitly allow this element to loaded cross-site
resources.
Note: You can find more information about CORS access control, headers, and attributes for images on MDN - HTPP CORS.
Let’s take a look at the example from the previous section. This time we are loading the sample from a different domain.
<img
src=
"https://../cat.jpeg"
crossOrigin=
"anonymous"
id=
"img"
></img>
The above statement can be used if the image element already exists. However, we can easily create an image programmatically from within JavaScript. Due to the asynchronous behavior of loading the image resource, we return a so-called Promise
instead of the finished resource. A Promise
is an API for an eventual completed (resolved
) or failed (rejected
) process, which is often used instead of a callback function.
Let’s create an Image
object, set the crossOrigin
policy, and return a Promise
which resolves the loaded image resource.
function
loadImage
(
url
)
{
return
new
Promise
((
resolve
,
reject
)
=>
{
const
img
=
new
Image
();
img
.
crossOrigin
=
"anonymous"
;
img
.
src
=
url
;
img
.
onload
=
()
=>
resolve
(
img
);
img
.
onerror
=
reject
;
})
}
Finally, we can load the image from a remote resource using the function from the above code snippet and use the function loadRgbaDataFromImage
from the previous section to extract the RGBA pixel values of the image. We use the Promise::then()
function to asynchronously resolve the Promise
when the resource is loaded.
const
url
=
"https://foo.bar/cat.jpeg"
;
loadImage
(
url
).
then
((
img
)
=>
{
const
data
=
loadRgbaDataFromImage
(
img
);
console
.
log
(
data
);
// Int32Array(40000) [255, 255, 255, 254, ...]
});
Using the more elegant async/await
syntax, you can write the Promise
without a nested .then()
function. Please note that the await
keyword can only be used within an async
function and we have to wrap the complete execution block in a self-executing async function call. This looks like an overkill in this simple example, but comes quite handy when multiple await
statements are used.
(
async
function
(){
const
url
=
"https://../cat.jpeg"
;
const
img
=
await
loadImage
(
url
);
const
data
=
loadRgbaDataFromImage
(
img
);
}());
Note: You can find more examples about how and when to use async functions
on MDN - Async Function.
In the following code snippets, the self-executing async
function block will be skipped for brevity when the await
keyword is used.
Most deep learning frameworks generate large binary blobs for datasets, model weights, activations, and much more. JavaScript is a very versatile language and has built-in support for typed arrays and array buffers. These data structure make working with binary data in the browser quite handy.
If the data can be dumped as a binary blob of a JavaScript compatible datatype (such as for example int8
, int16
, int32
, float32
or float64
) in any programming language, it can be easily loaded within JavaScript using ArrayBuffer
and TypedArray
objects if the datatype is known. The main advantage of using binary blobs and loading them via ArrayBuffer
into a typed array, is that the data does not need to be parsed within JavaScript. This leads to huge improvements over using textual representation formats such as CSV or JSON, especially for large files or even makes loading model weights for larger models possible in JavaScript.
Let’s try how this works with a simple Python snippet and Numpy. First we generate an array of random data and then dump it to disk as a binary file.
import
numpy
as
np
filename
=
"data/rand.bin"
# create an array with random values
r
=
np
.
random
.
rand
(
100
,
100
)
# write the array to disk
with
open
(
filename
,
'wb'
)
as
f
:
f
.
write
(
r
.
astype
(
np
.
float32
)
.
tostring
())
Now, we have a binary blob, rand.bin
, and we can go ahead and create a function to fetch binary blobs as array buffers.
async
function
loadBinaryDataFromUrl
(
url
)
{
const
req
=
new
Request
(
url
);
const
res
=
await
fetch
(
req
);
if
(
!
res
.
ok
)
{
throw
Error
(
res
.
statusText
);
}
// return the array buffer representation
return
res
.
arrayBuffer
();
}
We mark the function with the keyword async
in order to use the await
keyword in the function body. Using await
we can wait until the fetch
promise is resolved before we continue with the function execution. The fetch response implements the method Fetch::arrayBuffer()
to return the array buffer representation from an HTTP request.
Finally, we can load the rand.bin
data using the above function, and cast the array buffer into the original datatype. Knowing the original array dimensions we can as well visualize the blob with the renderData
function, which was created in the previous section.
const
size
=
100
;
const
buf
=
await
loadBinaryDataFromUrl
(
'data/rand.bin'
);
const
data
=
new
Float32Array
(
buf
);
renderData
(
document
.
body
,
data
,
size
,
size
,
false
);
For every development, debugging, training and evaluation process of deep learning models a crucial step is to visualize results to the screen. You can not only spot implementation and training errors when visualizing the output of the layer activations and filter weights, but you also can reason about network performance and insights by visualizing training progress, the class scores, or activation of the receptive field of single input pixels and regions.
The article The Building Blocks of Interpretability (Source: https://distill.pub/2018/building-blocks/) shows the potential for visualizations in the field of deep learning in a very impressive way. Use these visualizations as motivation to learn and master the skills of rendering data to the canvas.
In this section, we will first learn how to display simple image elements on the screen. As a next step, we will render pixel data either as RGBA for color images or single layers from output activations or filter weights as grayscale images. In addition, we will cover how to blend images for displaying segmentation maps and how to draw common shapes on top of existing images for bounding box visualizations.
Let’s start with the simplest approach and render an image element in the browser. To do so, we only have to append the HTMLImageElement
to an element in the DOM, e.g. document.body
.
function
renderImage
(
elem
,
img
)
{
// append the image element to the DOM
elem
.
append
(
img
);
return
img
;
}
Using the above snippet and the loadImage
function from the previous section, we can easily load images from remote resources and render them to the screen.
const
url
=
"data/cat.jpeg"
;
const
img
=
await
loadImage
(
url
);
renderImage
(
document
.
body
,
img
);
We continue with a slightly more useful approach to render actual pixel data to the screen. This will be very useful whenever you want to visualize a chunk of data such as output activations or filter weights.
For the first approach we assume that the data is stored in RGBA
format with the dimensions (height, width, channel)
and values in the range [0, 255]
, such as int32
type. This format is equivalent to the one returned by the function loadRgbaDataFromImage
, which was implemented in the previous section.
Let’s write a function to render RGBA data. First, we create a canvas element and dimension it according to the image dimensions. We then create a context element and retrieve the image data from the context spanning the canvas. Next, we transform the values into Uint8ClampedArray
and overwrite the image data with these values. Finally, we write the image data back to the canvas and append it to the DOM.
function
renderRgbaData
(
elem
,
data
,
width
,
height
,
smooth
)
{
// create a canvas element
const
canvas
=
document
.
createElement
(
'canvas'
);
canvas
.
width
=
width
;
canvas
.
height
=
height
;
// create a 2D rendering context
const
ctx
=
canvas
.
getContext
(
'2d'
);
// get the ImageData object from the canvas
const
img
=
ctx
.
getImageData
(
0
,
0
,
width
,
height
);
// convert the pixel values
const
vals
=
new
Uint8ClampedArray
(
data
);
// write the values to the image data
img
.
data
.
set
(
vals
);
// write the image data to the canvas context
ctx
.
putImageData
(
img
,
0
,
0
);
// enable/disable automatic smoothing
ctx
.
imageSmoothingEnabled
=
Boolean
(
smooth
);
// append the canvas element to the DOM
elem
.
append
(
canvas
);
return
canvas
;
}
In the above function we add a parameter smooth
to control the canvas image smoothing, which is enabled by default. However, when drawing filter activations, we want to see the actual pixel values instead of interpolated smoothed values.
Now we can use the above function to render RGBA data to the screen. We use the two previously create functions, loadImage
and loadRgbaDataFromImage
, to retrieve the RGBA from an existing image.
const
url
=
"data/cat.jpeg"
;
const
img
=
await
loadImage
(
url
);
const
data
=
loadRgbaDataFromImage
(
img
);
renderRgbaData
(
document
.
body
,
data
,
img
.
width
,
img
.
height
);
The above functions works great for RGBA data, however there is a slight problem: actual filter weights and output activations usually consist of more than three depth channels. So, we usually visualize each channel individually as a grayscale image.
In order to extend the above function to render grayscale images, we replace the value array after the line // convert the pixel values
. We need to create a target array vals
with the proper dimensions and data type. Then, we need to iterate through the original array and apply the grayscale value to all RGB channels. The modified function looks similar to renderRgbaData
, except for the following difference:
function
renderData
(
elem
,
data
,
width
,
height
,
smooth
)
{
...
const
alpha
=
255
;
const
len
=
data
.
length
*
4
;
const
vals
=
new
Uint8ClampedArray
(
len
);
for
(
let
x
=
0
;
x
<
width
;
++
x
)
{
for
(
let
y
=
0
;
y
<
height
;
++
y
)
{
// compute the index position
let
ix0
=
(
y
*
width
+
x
);
let
ix1
=
ix0
*
4
;
// transform range [0, 1] to [255, 0]
let
val
=
(
1
-
data
[
ix0
])
*
255
;
// write the value to all RGB channels
// to generate a grayscale image
vals
[
ix1
+
0
]
=
val
;
// R
vals
[
ix1
+
1
]
=
val
;
// G
vals
[
ix1
+
2
]
=
val
;
// B
vals
[
ix1
+
3
]
=
alpha
;
// A
}
}
...
}
Using this function, we can now visualize any two-dimensional data blob.
const
size
=
8
;
const
data
=
new
Int32Array
([
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
]);
renderData
(
document
.
body
,
data
,
size
,
size
,
false
);
To visualize results of a segmentation map, we often have to blend two images together, namely the original and the segmentation mask. Let’s implement this method as a separate function which acts on two arrays d0
and d1
of the same dimensions. We define a parameter alpha
, such that alpha=0
should return d0
and alpha=1
should return d1
. All values in between 0 and 1 should interpolate between the images.
function
interpolateRgba
(
d0
,
d1
,
alpha
,
width
,
height
,
channels
){
const
out
=
new
Uint8ClampedArray
(
d0
.
length
);
const
a0
=
1
-
alpha
;
const
a1
=
alpha
;
for
(
let
x
=
0
;
x
<
width
;
++
x
)
{
for
(
let
y
=
0
;
y
<
height
;
++
y
)
{
for
(
let
c
=
0
;
c
<
channels
;
++
c
)
{
let
ix
=
(
y
*
width
+
x
)
*
channels
+
c
;
out
[
ix
]
=
d0
[
ix
]
*
a0
+
d1
[
ix
]
*
a1
;
}
}
}
return
out
;
}
We implement a simple helper function loadRgbaDataFromUrl
to retrieve the image data directly from any image by only providing the image’s URL.
async
function
loadRgbaDataFromUrl
(
url
)
{
const
img
=
await
loadImage
(
url
);
return
loadRgbaDataFromImage
(
img
);
}
In addition, we extend the function renderRgbaData
such that it can render to an existing canvas element instead of creating and appending a new one every time it gets called.
Finally, we can load the original image and the segmentation mask, overlay the two images and show the overlay whenever the cursor moves over the original image. The code snippet would look like this:
const
width
=
500
;
const
height
=
375
;
const
channels
=
4
;
const
canvas
=
document
.
getElementById
(
"scene"
);
const
pixels
=
await
loadRgbaDataFromUrl
(
"data/bike.jpg"
);
const
object
=
await
loadRgbaDataFromUrl
(
"data/bike_object.png"
);
// Compute the interpolated overlay
const
layover
=
interpolateRgba
(
pixels
,
object
,
0.5
,
width
,
height
,
channels
);
// initial render
renderRgbaData
(
canvas
,
pixels
,
width
,
height
);
canvas
.
onmouseover
=
()
=>
renderRgbaData
(
canvas
,
layover
,
width
,
height
);
canvas
.
onmouseleave
=
()
=>
renderRgbaData
(
canvas
,
pixels
,
width
,
height
);
In the above code, we assume that there exists a canvas element with the id scene
in which both images are rendered.
To visualize the results of a localization task, you need to render geometric shapes, such as bounding boxes, to the canvas as an overlay. Using the canvas API this is quite easy to do in JavaScript.
Let’s create a function that adds a rectangle stroke on top of an existing canvas image. We only have to specify the left, top, width, height
dimensions and then use ctx.rect()
to draw the rectangle. We create a stroke to outline the rectangle by using ctx.stroke()
, whereas ctx.fill()
fills the rectangle.
function
addRect
(
canvas
,
dims
,
color
)
{
const
ctx
=
canvas
.
getContext
(
"2d"
);
const
left
=
dims
[
0
];
const
top
=
dims
[
1
];
const
width
=
dims
[
2
];
const
height
=
dims
[
3
];
ctx
.
strokeStyle
=
color
||
'black'
;
ctx
.
rect
(
left
,
top
,
width
,
height
);
ctx
.
stroke
();
}
Creating a circle is similar. We only need to pass the arc parameters cx, cy, outerRadius, innerRadius, arcAngle
to the ctx.arc()
function. cx
and cy
define the circle center point, outerRadius
and innerRadius
allow the radius for both inner and outer circle of a donut, and arcAngle
defines the end angle of the arc to draw arc segments. To draw a full circle we default innerRadius
to 0
and arcAngle
to 2 * Math.PI
.
function
addCircle
(
canvas
,
dims
,
color
)
{
const
ctx
=
canvas
.
getContext
(
"2d"
);
const
cx
=
dims
[
0
];
const
cy
=
dims
[
1
];
const
outerRadius
=
dims
[
2
];
const
innerRadius
=
dims
.
length
>
3
?
dims
[
3
]
:
0
;
const
arcAngle
=
dims
.
length
>
4
?
dims
[
4
]
:
2
*
Math
.
PI
;
ctx
.
strokeStyle
=
color
||
'black'
;
ctx
.
beginPath
();
ctx
.
arc
(
cx
,
cy
,
outerRadius
,
innerRadius
,
arcAngle
);
ctx
.
stroke
();
}
Now let’s use both functions to draw a bounding box as well as a circle around the face of the cat.
const
url
=
"data/cat.jpeg"
;
const
img
=
await
loadImage
(
url
);
const
data
=
loadRgbaDataFromImage
(
img
);
const
canvas1
=
renderRgbaData
(
document
.
body
,
data
,
img
.
width
,
img
.
height
);
addRect
(
canvas1
,
[
70
,
20
,
100
,
100
],
"green"
);
const
canvas2
=
renderRgbaData
(
document
.
body
,
data
,
img
.
width
,
img
.
height
);
addCircle
(
canvas2
,
[
120
,
70
,
50
],
"red"
);
Note: You can find more information about the shapes in the canvas rendering context on MDN - CanvasRenderingContext2D.
The most appealing fact for using browsers to develop, train and evaluate deep learning algorithms and applications is the variety and simplicity of Media APIs available in modern browsers. Training gesture recognition from the built-in webcam or speech recognition from the built-in microphone is just a few lines of code away.
In this section, we will first access the webcam or built-in cameras to display a video stream and access the image data. We can either run an a model continuously on the stream of images for a video or on single images using a button. Next, we will extract data from the microphone in order to feed the raw input data into a deep learning model. Finally, we will use the WebAudio
API to load sound files, decode common audio formats such as MP3
, WAV
, and many more, and play those sounds on the device speakers.
Many deep learning algorithms and applications focus on two-dimensional datasets such as images and video frames. Modern browser and most laptops and mobile devices provide not just a camera but fantastic APIs to easily access the camera images from within JavaScript. This greatly facilitates creating interactive and easy accessible DL applications for the browser.
The browser’s MediaDevices
API gives a user access to the video and audio devices such as camera, screen share, microphone and speakers. It is part of the more general WebRTC
(short for Web Real-Time Communication) API, a standard to enable peer-to-peer teleconferencing without intermediary servers.
Please note that depending on the browser you can only access the camera and audio via WebRTC in a single browser tab. Due to the sensitivity of the camera and audio data, WebRTC
might only work over HTTPS and a valid certificate. However, most browsers also allow access to WebRTC
on localhost.
We can start a video stream using the MediaDevices::getUserMedia()
function, which will return a promise containing the MediaStream
object.
navigator
.
mediaDevices
.
getUserMedia
({
video
:
true
,
audio
:
false
})
.
then
((
stream
)
=>
{
...
});
To extract the data from the MediaStream
, we have to attach the stream to a video
player element. Let’s create such an element and feed the stream into the player. If you want the video player to be visible, we could as well append it to the DOM.
const
player
=
document
.
createElement
(
'video'
);
// if the video playback should be visible
// document.body.append(player);
navigator
.
mediaDevices
.
getUserMedia
({
video
:
true
,
audio
:
false
})
.
then
((
stream
)
=>
{
player
.
srcObject
=
stream
;
});
Finally, we can extract the content from the video
element the same as we did with the img
element in the previous section. We render the image to a canvas in the first step, and then extract the ImageData
object in the second step.
function
loadRgbaDataFromImage
(
img
,
width
,
height
)
{
const
canvas
=
document
.
createElement
(
'canvas'
);
canvas
.
width
=
width
;
canvas
.
height
=
height
;
const
ctx
=
canvas
.
getContext
(
'2d'
);
ctx
.
drawImage
(
img
,
0
,
0
,
width
,
height
);
const
imgData
=
ctx
.
getImageData
(
0
,
0
,
width
,
height
);
return
new
Int32Array
(
imgData
.
data
);
}
const
width
=
240
;
const
height
=
160
;
const
data
=
loadRgbaDataFromImage
(
player
,
width
,
height
);
Note: You can also access the content from a video
element in WebGL directly without using a canvas
element. To do so, you can bind the video
element to a 2D Texture using the WebGLRenderingContext.texImage2D
method.
Using the same MediaDevices
API as in the previous section, we can as well access a stream from the microphone. However, instead of using a video
element to process the data, we use the WebAudio
API, which is a very flexible graph-based audio processing API. This API allow us to create audio streams, processors, as well as to route audio between these nodes and to the speakers.
Let’s first start by retrieving the audio stream using the MediaDevices::getUserMedia()
function. We also need to define the global AudioContext
.
const
audioContext
=
new
AudioContext
();
function
onStream
()
{
...
}
navigator
.
mediaDevices
.
getUserMedia
({
audio
:
true
,
video
:
false
})
.
then
(
onStream
);
Next, we setup a very simple audio graph, consisting of an input (the microphone stream), a simple processor, and a default output. We also need to define the properties of the processor, including the number of input and output channels as well as the buffer size of the audio chunks.
function
onProcess
()
{
...
}
const
bufferSize
=
4096
;
// 256, 512, 1024, 2048, 4096, 8192, 16384
const
numInputChannels
=
1
;
const
numOutputChannels
=
1
;
function
onStream
(
stream
)
{
const
source
=
audioContext
.
createMediaStreamSource
(
stream
);
const
processor
=
audioContext
.
createScriptProcessor
(
bufferSize
,
numInputChannels
,
numOutputChannels
);
// connect the processor to the source
source
.
connect
(
processor
);
// connect the output to the processor
processor
.
connect
(
audioContext
.
destination
);
processor
.
onaudioprocess
=
onProcess
;
};
As we can see in the above snippet, all audio processing is performed on an audio graph, which builds on the AudioContext
API. In the case of audio recording and feeding it to a deep neural network, we need a source, a processor and a destination as nodes on the audio graph and connect them in series. The processing is then performed in the onProcess
function.
function
onProcess
(
e
)
{
const
data
=
e
.
inputBuffer
.
getChannelData
(
0
);
console
.
log
(
e
.
inputBuffer
,
data
);
}
Using the AudioBuffer.getChannelData()
function, we retrieve the raw audio data as an Float32Array(bufferSize)
. The AudioBuffer
object e.inputBuffer
gives us access to the duration
, sampleRate
, and numberOfChannels
properties.
We can now use this data for further processing or feed it directly to a network. If the small audio chunk of the buffer size is not big enough, one can create a large array and copy the different chunks into the corresponding positions of the array.
Using the WebAudio
API and AudioContext
we can also load and decode common audio formats in the browser using the AudioContext::decodeAudioData()
function. This is very useful when we want to for example load an MP3 encoded audio sample and feed it into a deep neural network.
In this sample we will reuse the loadBinaryDataFromUrl
function from a previous section to load the sound file as ArrayBuffer
. Let’s write the function to wrap the AudioContext::decodeAudioData()
function in a Promise. This method can decode all audio formats that are supported in the audio
and video
tags of the HTML5 browser.
const
audioContext
=
new
AudioContext
();
function
decodeAudio
(
data
)
{
return
new
Promise
((
resolve
,
reject
)
=>
{
// decode the array buffer using supported audio formats
audioContext
.
decodeAudioData
(
data
,
(
buffer
)
=>
resolve
(
buffer
));
});
}
To test the above snippet, we also implement a minimalistic audio graph that outputs audio from a buffer to the speakers.
function
playSound
(
buffer
)
{
const
source
=
audioContext
.
createBufferSource
();
source
.
buffer
=
buffer
;
source
.
connect
(
audioContext
.
destination
);
source
.
start
(
0
);
}
Finally, we can test the functions to load a sample sound and output it to the speakers.
<
script
>
const
url
=
"data/Large-dog-barks.mp3"
;
const
data
=
await
loadBinaryDataFromUrl
(
url
);
const
audio
=
await
decodeAudio
(
data
);
playSound
(
audio
);
<
/script>
The audio
object in the above snippet is of type AudioBuffer
exactly like in the previous section. We can now use the AudioBuffer.getChannelData()
method to extract both audio arrays of the two audio channels of this MP3 file.
const
c0
=
audio
.
getChannelData
(
0
);
const
c1
=
audio
.
getChannelData
(
1
);
In this section we will learn about the utility tools for data loading and manipulation of the popular deep learning frameworks for the browser: TensorFlow.js, Keras.js, and WebDNN. As we saw in the previous chapters, each framework uses an abstraction on top of the TypedArray
object to store the tensor variables in a flat array. Most of these frameworks provide utility tools to load, create, resize and visualize data. Let’s take a look!
TensorFlow.js abstracts the data as tf.tensor
objects, that contain of the raw data, the tensor shape, and the data type. It doesn’t provide an utility for loading images from URLs, but it provides the tf.fromPixel
function to convert image and image like elements (video, image, canvas, etc.) to tf.tensor
objects. We can also use the tf.tensor.print()
function to print the tensor to the developer console.
const
url
=
"data/cat.jpeg"
;
const
img
=
await
loadImage
(
url
);
const
data
=
tf
.
fromPixels
(
img
);
data
.
();
If the image needs to be resized, we can use the tf.image.resizeBilinear
function.
const
dataResized
=
tf
.
image
.
resizeBilinear
(
data
,
[
100
,
100
]);
Converting a single image to a batch of one image doesn’t affect the underlying raw data but only the tensor shape. In TensorFlow.js we can use the tf.tensor.expandDims
function to expand the dimension of the tensor along a defined axis.
const
dataBatch
=
data
.
expandDims
(
2
);
To load binary data and parse it do a tensor in TensorFlow.js, we can call the default tf.tensor
constructor. In the following example we use the loadBinaryDataFromUrl
function to load a binary blob in JavaScript which will be implemented in the consecutive chapter. In this case, the blob contains a matrix, and hence we can use the tf.tensor2d
constructor.
const
size
=
100
;
const
buf
=
await
loadBinaryDataFromUrl
(
'data/rand.bin'
);
const
data
=
tf
.
tensor
(
new
Float32Array
(
buf
),
[
size
,
size
]);
data
.
();
To render a tensor to the screen (to a canvas
element), we can use the tf.toPixels
function. Let’s write a wrapper that creates a new canvas element and renders the tensor to the canvas.
async
function
render
(
rootElem
,
data
)
{
const
canvas
=
document
.
createElement
(
'canvas'
);
rootElem
.
append
(
canvas
);
await
tf
.
toPixels
(
data
,
canvas
);
return
canvas
;
}
In the above code you can see that tf.toPixels
returns a promise. The reason is that accessing the tensor data using tensor.data()
is an asynchronous operation in TensorFlow.js. Hence we make the render
function asynchronous as well. Finally, we can use this function to render tensors to the screen.
await
render
(
document
.
body
,
data
);
Keras.js uses the ndarray
library under the hood for abstracting tensors on top of TypedArrays
. ndarray
is a modular multidimensional array implementation for JavaScript and should make it easy for Matlab or NumPy users to get started with vector calculus in JavaScript.
An ndarray
object from an image can be created like the following:
const
url
=
"data/cat.jpeg"
;
const
img
=
await
loadImage
(
url
);
const
data
=
ndarray
(
new
Float32Array
(
img
),
[
width
,
height
,
4
])
The ndarray
core API provides a lot of functionality for slicing, transposing, reversing and reshaping arrays. However, much more functionality is available in multiple ndarray-*
packages.
WebDNN provides a lot of image loading, parsing and transformation capabilities under the WebDNN.Image
scope. It also contains the very convenient function WebDNN.Image.getImageArray
for loading an image array from an URL, parsing it and resizing it into a defined shape. In the same manner, the function WebDNN.Image.setImageArrayToCanvas
provides similar functionalities to render a tensor to a canvas element.
const
url
=
"data/cat.jpeg"
;
const
img
=
await
WebDNN
.
Image
.
getImageArray
(
url
,
{
dstW
:
256
,
dstH
:
256
});
const
canvas01
=
createCanvas
(
document
.
body
,
256
,
256
);
WebDNN
.
Image
.
setImageArrayToCanvas
(
img
,
256
,
256
,
canvas01
)
const
canvas02
=
createCanvas
(
document
.
body
,
100
,
100
);
WebDNN
.
Image
.
setImageArrayToCanvas
(
img
,
256
,
256
,
canvas02
,
{
dstW
:
100
,
dstH
:
100
});
WebDNN can easily deal with binary data as well. To visualize 2D weights as a greyscale image we can simply add the option {color: WebDNN.Image.Color.GREY}
.
const
size
=
100
;
const
buf
=
await
loadBinaryDataFromUrl
(
'data/rand.bin'
);
const
img
=
new
Float32Array
(
buf
);
const
canvas01
=
createCanvas
(
document
.
body
,
100
,
100
);
WebDNN
.
Image
.
setImageArrayToCanvas
(
img
,
100
,
100
,
canvas01
,
{
color
:
WebDNN
.
Image
.
Color
.
GREY
,
dstW
:
100
,
dstH
:
100
,
scale
:
[
255
],
bias
:
[
-
1
]
});
const
canvas02
=
createCanvas
(
document
.
body
,
256
,
256
);
WebDNN
.
Image
.
setImageArrayToCanvas
(
img
,
100
,
100
,
canvas02
,
{
color
:
WebDNN
.
Image
.
Color
.
GREY
,
dstW
:
256
,
dstH
:
256
,
scale
:
[
255
],
bias
:
[
-
1
]
});
In this chapter you learned how to extract data, such as images from URLs, images from the webcam, and audio from the microphone all from within the browser. When loading data from other domains, we need to set the crossOrigin
attribute accordingly to allow a cross-site
request.
Binary blobs of data can be easily and very efficiently fetched and parsed into TypedArray
data structures using the Fetch
API and the Response::arrayBuffer
method.
We need to use the Canvas
API to transform images and videos into image data or to render image data to the screen. On top of the canvas element, we can draw shapes to visualize object positions or interpolate between two images to visualize the result of a segmentation model.
You also saw how to use the utility functions of TensorFlow.js, Keras.js and WebDNN to load and parse image, audio and binary data using the respective abstractions on top of the TypedArray
object.
In the following chapter we will go into more detail and take a look at practical applications and building blocks for advanced data manipulation. We will parse the complete Caffe and Tensorflow model graphs from within JavaScript using protobuf.js
, learn how to draw charts using Chart.js
, and extract spectogram features from an audio feed.