I get the news I need on the weather report

In which we publish a MJPEG stream from the BeagleBone Black.

Continuing with the series of post on OpenCV, webcams, and MJPEG, today we will look at streaming an MJPEG capture from the BBB.  Before I get into it though, you should know that I did try FFMPEG/avconv and VLC to stream video from the BBB using rtp, but the several seconds of latency made it unsuitable for my needs.  You should also know that I do not claim that this is the one true way to stream video from the BBB.

The libraries used:

  • ZeroMQ[1] and CZMQ[2] - used to create pub/sub connections between the BBB and software running on a desktop
  • OpenCV[3] - used to display the MJPEG stream

 

The subscriber was compiled and tested under Windows 7 using Visual Studio 2012; however, the code should compile under Linux with very few, if any, modifications.

Backgound:

I am working on a project that requires a video stream from the BBB be consumed in N places where N is a minimum of 2.  The stream will be processed using OpenCV, and because of the nature of this project, I need as little latency in the video stream as possible.

Theory of Operation:

The BBB will capture frames in MJPEG format from a webcam via a modified version of framegrabber.  The modified version of framegrabber can run indefinitely and outputs the frames as a series of ZeroMQ messages over a publish socket.  The clients will subscribe to the publish socket on the BBB using ZeroMQ and load each frame received into OpenCV.

The ZeroMQ pub/sub configuration allows many clients to connect to the published stream.  No synchronization is used between the publisher and subscriber; the the stream is treated as continuous, and the subscribers are free to connect and disconnect at will.

Results:

Single subscriber 640x480 - cpu use on the BBB ~4.3% and memory use ~0.8%

Multiple subscribers 640x480 - cpu use on BBB ~6.6% and memory use ~0.8%

Single subscriber 1920x1080 - cpu use on BBB ~23.2% and memory use ~3.5%

Using this setup, I have been able to stream frames with a resolution as high as 1920x1080 with little to no latency, but there is a limitation, the network.  When using this over Wifi with high resolutions or several clients running on one machine, I noticed the frame rate would drop the further I went from the router.  If you watch the output of the top command on the BBB as you get further from the router, you will see framegrabber's memory use begin to climb.  This is due to the publish socket buffering the data.  As you walk back towards the router, you will see the memory use drop until it, and the frame rate, stabilizes.  During this stabilization period you will probably experience delayed video that is displayed at a higher frame rate than normal as the buffer is flushed.

There are several things you can do reduce or eliminate this latency.

  1. If possible, use a wired connection
  2. Use an 802.11 N router and clients
  3. Make sure your WiFi router is optimally located
  4. Adjust the QoS settings of your router to give higher or highest priority to the traffic on the port you publish over

 

To reduce the amount of time it takes for subscribers to catch up once their connection has improved, the high water mark on the socket can be reduced.  This has the effect of dropping frames once too many are buffered and essentially reduces the amount of buffered data a subscriber has to process to get in sync.

The reader may find it interesting that it does not matter if the BBB publisher or the subscribers are started up first.  The publisher will simply dump data until at least one subscriber connects, and the subscribers will wait on the publisher.  In addition, you can kill the publisher while subscribers are connected, restart it with new (or the same) settings, and the subscribers will continue on.  The reader should verify this by changing the resolution after the subscriber or subscribers have connected.

Code:

framegrabberPub.c (17.96 kb) Publisher - you will need zhelpers.h

compile with

gcc framegrabberPub.c -lzmq -o framegrabberPub 

framegrabberSub.c (3.79 kb) C client

[BONUS]
framegrabberSub.py (2.52 kb) Python client

The Python client will display the stream with little latency until garbage collection occurs.  When this happens, the display will freeze and the buffered data on the BBB will increase.  Once garbage collection completes, the display will eventually synchronize much like the WiFi issue detailed above.

[UPDATE]
If your wireless router is capable of broadcasting at both 2.4 and 5 Ghz at the same time, you can improve performance when using a WiFi connection for both the publisher and the subscriber by having one connect at 2.4 Ghz and the other connect at 5 Ghz.

[UPDATE]
Added a link to zhelpers.h needed to compile the publisher.

[1] http://zeromq.org/
[2] http://czmq.zeromq.org/
[3] http://opencv.org/

I said, "Do you speak-a my language?"

In which we learn how to turn a buffer of bytes into an image OpenCV can work with.

If you followed my previous post, you may have realized you can produce, with very few modifications, a stream of MJPEG (jpeg) images captured from the webcam. You may also be left wondering how to work with these images using OpenCV. It might actually be easier than you imagine.

The little bit of Python code below will display a single image. The code loads an image from a file into a buffer, converts the buffer into an OpenCV image, and displays the image. If you are receiving the image as a buffer already, you can forgo the loading step.

import cv2
from cv2 import cv
import numpy as np

# let's load a buffer from a jpeg file.
# this could come from the output of frameGrabber
with open("IMAGE.jpg", mode='rb') as file:
    fileContent = file.read()

# convert the binary buffer to a numpy array
# this is a requirement of the OpenCV Python binding
pic = np.fromstring(fileContent, np.int8)

# here is the real magic
# OpenCV can actually decode a wide variety of image formats
img = cv2.imdecode(pic, cv.CV_LOAD_IMAGE_COLOR)

cv2.namedWindow("image")
while True:
    cv2.imshow("image", img)
    
    key = cv2.waitKey(20)
    if key == 27: # exit on ESC        
        break
    
cv2.destroyAllWindows()

For those of you working in C, you can convert a buffer into an image with this (tested in Visual Studio 2012)

#include <stdio.h>
#include <malloc.h>
#include "opencv2\core\core_c.h"
#include "opencv2\highgui\highgui_c.h"

// used to load a jpeg image into a buffer
int load_buffer(const char *filename, char **result) 
{ 
	int size = 0;
	FILE *f = fopen(filename, "rb");
	if (f == NULL) 
	{ 
		*result = NULL;
		return -1; // -1 means file opening fail 
	} 
	fseek(f, 0, SEEK_END);
	size = ftell(f);
	fseek(f, 0, SEEK_SET);
	*result = (char *)malloc(size+1);
	if (size != fread(*result, sizeof(char), size, f)) 
	{ 
		free(*result);
		return -2; // -2 means file reading fail 
	} 
	fclose(f);
	(*result)[size] = 0;
	return size;
}

int main() 
{ 
	char *buffer; 
	int size;
	CvMat mat;
	IplImage *img;

	// load jpeg file  into buffer
	// this could come from the output of frameGrabber
	size =load_buffer("IMG.jpg", &buffer);
	if (size < 0) 
	{ 
		puts("Error loading file");
		return 1;
	}

	// create a cvMat from the buffer
	// note the params: height, width, and format
	mat = cvMat(1080, 1920, CV_8UC3, (void*)buffer);
	// magic sauce, decode the image
	img = cvDecodeImage(&mat, 1);

	// show the image
	cvShowImage("image", img );

	// wait for a key
	cvWaitKey(0);

	// release the image
	cvReleaseImage(&img);


	return 0;
}

For further information see the OpenCV Docs.

OpenCVjpeg.py (2.24 kb)
OpenCVjpeg.c (2.78 kb)

I spy with my PS3Eye

In which we discover the limits of webcams connected to the BeagleBone Black.

As the previous couple of posts may have hinted, I am currently working on a computer vision application.  On the hardware side, I am using a webcam connected to a BeagleBone Black to capture and process images.  Finding the right camera and software configuration seems to be a challenge many people are trying to overcome.  The following is what I have learned through experimentation.

During my first foray into the world of webcams on the BBB, I chose the PS3Eye.  The PS3Eye has been used for many computer vision applications thanks to its ability to produce uncompressed 640x480 images at up to 60 FPS or uncompressed 320x240 at up to 120 FPS.  The ability to capture uncompressed images at high frame rates plus being available for $16.98 would normally make the PS3Eye a fantastic choice; however, we are dealing with the BBB.

If you plug a PS3Eye into the BBB and fire up an OpenCV application to capture at 640x480, you will receive "Select Timeout" errors instead of a video stream.  If you do the same but with the resolution set to 320x240, it will work.  It turns out the PS3Eye transfers data in bulk mode over USB.  In bulk mode, you are guaranteed to receive all of the transmission; however, you are not guaranteed timing.  What is essentially happening is the PS3Eye is saturating the bulk allotment on the USB.  The reason you encounter this problem at 640x480 and not 320x240 is because OpenCV with Python sets the frame rate to 30 FPS and provides no way to change it.  We can calculate the amount of data put on the bus as follows:

Height * Width * (Channels * 8) * FPS

So for our uncompressed images at 640x480 we have:

640 * 480 * (3 * 8) * 30 = 221184000 bits/s or ~26.36 MB/s

and 320x240 is ~6.59 MB/s

As OpenCV with Python does not allow you to set frame rate, I modified v4l2grab[1] to accept frame rate as a command line argument.  With this, I discovered you can capture images from the PS3Eye at 640x480 as long as you set the frame rate to 15 FPS or less.  You can also capture images at 320x240 at up to 60 FPS.  The astute reader will notice that 640 * 480 * (8 * 3) * 15 = 320 * 240 * (8 * 3) * 60 which is ~13.2 MB/s.  In other words, the USB on the BBB taps aout at ~13.2 MB/s for bulk transfers.

At this point you might be thinking you do not have to worry about frames per second because you will only take still shots.  It turns out uvc under Linux does not support still image capture[2].  In order to capture an image, you open the webcam the same way you would to capture a stream; however, you just grab one frame (or more if needed).

If you would like to capture 640x480 or larger images at 30 FPS or faster, all is not lost, but you will need a webcam that supports some sort of native compression.  In my case, I am using a Logitech C920.  It can compress using both H264 and MJPEG.  If you want to capture a video stream, H264 is probably your best choice as it should have fewer compression artifacts.  It you are after still shots, MJPEG will be your friend.

MJPEG typically compresses each frame as a separate jpeg*.  Since MJPEG uses intra-frame compression, you only need to capture one image for a still shot.  H264 uses inter-frame compression - meaning it relies on information from several frames to determine how to compress the current frame.  In order to reconstruct the frame, you need all the ones involved in the compression.  I know the last two sentences are a great simplification, but they suffice for our discussion.

In order to test the different combinations of frame rates and encodings, I extended the v4l2 capture sample available from the project's website[3].  To the base sample I added the ability to specify image dimensions, frame rate, and pixel format (ie compression).  I also added handling for CTRL-C so the webcam is not left in an inconsistent state if you kill the program with CTRL-C, and the ability to set the timeout value and maximum number of timeouts.

The program is available here framegrabber.c.

Please note this software is not finished.  I am publishing it now so others may use it to determine the capabilities of their webcams, but I will be improving and extending it in the future.  You may consider the capture timing functionality described in 1 below to be complete while the saving of frames described in 2 will change.

To compile framegrabber you must have the development files for v4l2 installed.

Compile with:

gcc framegrabber.c -o framegrabber

At this time, framegrabber is intended to be used in one of two ways.

1.  Timing frame capture rates.

To time frame capture rates, simply pass framegrabber to time, omit the -o switch, and set -c to the number of frames you would like to capture.  Omitting -o instructs the program to simply discard all captured frames.  In this mode, framegrabber will capture c number of frames from the webcam as fast as possible.

Here is the simplest case:

time ./framegrabber -c 1000

And here we set the pixel format, image dimensions, and frame rate:

time ./framegrabber -f mjpeg -H 480 -W 640 -c 1000 -I 30

Have a look at all the other command line switches to get a sense of the possibilities.

2.  Capturing frames from a webcam

As mentioned above, I extended the application v4l2grab to support setting the frame rate.  v4l2grab allows you to capture jpeg images from webcams that support the YUYV format.  It grabs frames in YUYV format and then converts the frames to jpeg.

When capturing frames with framegrabber, the raw frame is written out.  No conversion to jpeg is done.  This is mostly a proof on concept to show that frames captured in MJPEG format are individual jpegs and can be written out without further processing.  This has been tested with a Logitech C920, and the output is indeed an jpeg image.  Capturing in H264 and YUYV format will also work, but you will not be able to simply open the resulting file in your favorite image editor.

Currently there is no way to specify the filename for the frame or frames captured, and if -c is greater than one, the first c - 1 frames will be overwritten by framec.  To capture a frame, include the -o switch and set -c to one.  The resulting frame will be written to capture.jpg.

./framegrabber -f mjpeg -H 480 -W 640 -c 1 -o

And now for the results of testing both the PS3Eye and Logitech C920

Here we see capturing 1000 frames from the Logitech C920 in MJPEG format takes ~33.6 seconds which is ~29.76 frames per second.

Here we see capturing 1000 frames from the Logitech C920 in YUYV format takes ~67 seconds which is ~14.92 frames per second.

Moving to the PS3Eye we see that if we try to capture at 30 FPS, we receive a select time out error, but if we set the frame rate to 15, we are successful.  If you compare the results of the PS3Eye capture with the results of the Logitech C920 YUYV test, you will see the real times are essentially the same, almost 15 frames per second.

At this point you maybe wondering why the Logitech C920 does not receive select timeouts at 30 FPS YUYV but the PS3Eye does.  If you notice, even though we set the frame rate to 30 FPS, we receive frames from the C920 at about 15 FPS.  The C920 use isochronous transfers as opposed to bulk like the PS3Eye, and isochronous guarantees transfer speed but not delivery.  It is likely that frames are getting dropped but enough make it through fast enough that we do not receive select timeouts.  I have not tested this further as of now.  For more information on USB transfers see [4].

In our final screenshot we can see that framegrabber uses very little cpu (.3%) while just grabbing frames.

I hope you find framegrabber useful.  The interested reader can extend the process_image function to do as they will with frames captured from the webcam.

[UPDATE 1]
It seems some MJPEG streams omit the Huffman table causing the resulting jpeg captures to fail to open in some programs[5].  The error message, if any, is something along the lines of "Huffman table 0x00 was not defined".  If you cannot open the MJPEG captures, please try the Python script MJPEGpatcher.py below.  MJPEGpatcher should patch the captured jpeg with the required information.  It takes a single command line argument, -f FILENAME, and outputs FILENAME[patched].jpg.  The Logitech C920 does not exhibit this behavior.  MJPEGpatcher has been tested and works on images captured by a Toshiba webcam built into a laptop as well as an image submitted by a reader.  I would appreciate any feedback.

[UPDATE 2]
William C Bonner pointed out in the comments that I neglected to provide any timing information for the C920 for resolutions greater than 640x480.  When researching for this post, I was interested in explaining why the C920 could provide 640x480 at 30 FPS  and the PS3Eye could not.  In doing so, I focused on the greatest resolution and frame rate the two cameras had in common.  To redress my omission, here are timings for the C920 for 1920x1080 at 30 FPS in MJPEG, H264, and YUYV formats.  It can be seen below that the C920 is able to provide 1920x1080 at 30 FPS in both MJPEG and H264 formats, but YUYV tops out around 2.49 FPS

 

framegrabber.c (18.79 kb)
MJPEGpatcher.py (6.46 kb)

[1] https://github.com/twam/v4l2grab
[2] http://www.ideasonboard.org/uvc/
[3] http://linuxtv.org/downloads/v4l-dvb-apis/capture-example.html
[4] http://www.beyondlogic.org/usbnutshell/usb4.shtml
[5] http://www.the-labs.com/Video/odmlff2-avidef.pdf

you gotta keep 'em separated

In which we threshold video frames using HSV values.

In my previous post, I provided a tool that allows one to determine the H, S, and V values of a pixel or pixels in an image for use in setting a threshold on an image or video frame.  Today, I will show you what these values are good for, namely setting a threshold on frames in a video or webcam stream.

You may download the Python code here HSVthresholder.
The code is copyright © 2013 Matthew Witherwax and released under the BSD license.

To use the application, simply supply a command line argument for either a video file or a webcam using -v.  In the case of a webcam, specify dev# where # is the number of the device.  If you only have one webcam, you should be able to pass dev0; otherwise, you may start at dev1 and increment until you find the webcam you would like to use.  In the case of a video file, the application will loop the video indefinitely until you exit with esc.  For other options, pass -h on the command line.

The general flow of the program is:

  1. Capture frame from video or webcam
  2. Dilate image
  3. Convert to HSV
  4. Threshold image
  5. Erode image
  6. Display image

 

You can find more information about dilation and erosion here [1] and the function inRange() used for thresholding here [2].

All the interesting bits in the code are documented with comments, but I would like to draw attention to a two operations with parameters you may wish to tune.

In both dilating and erosion, a structuring element is supplied.  Possible values to adjust are the element itself which may be one of cv2.MORPH_RECT, cv2.MORPH_CROSS, or cv2.MORPH_ELLIPSE as well as the size, the second argument to cv2.getStructuringElement.  In addition, you may adjust the number of times the operation is applied by adjusting the iterations parameter.  While tuning these parameters, you might find it beneficial to create a third named window to display the frame after the dilation so you can see what effects your changes have.

Dilating

# here we dilate the image so we can better threshold the colors
if self.dilate:
	element = cv2.getStructuringElement(cv2.MORPH_CROSS,(5,5))
	frame = cv2.dilate(frame, element, iterations=5)


Erosion

# here we erode the image to remove noise
if self.erode:
	element = cv2.getStructuringElement(cv2.MORPH_RECT,(5,5))
	frameHSV = cv2.erode(frameHSV, element, iterations=2)

And here are some screenshots that show the application isolating the red grip on a pen.  The red grip show up as white while the rest of the image is blacked out.  If you look closely, you will see some stray white dots.  Depending on your needs you may either ignore these or tune the parameters until they disappear.  Now that the grip has been isolated, you can track it, but that is a topic for another post.

[1] http://docs.opencv.org/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.html
[2] http://docs.opencv.org/modules/core/doc/operations_on_arrays.html#cv2.inRange

HSVthresholder.py (6.38 kb)

Was that Imperial Red, Lust, or Crimson? ...I am pretty sure it was just red.

In which we find the H, S, and V values for an image's pixels.

Recently I have been working on an application that requires me to locate a small colored dot in an image or video frame.  In order to accomplish this, I have been using OpenCV.  For reasons outside the scope of this post, I am working with the images/frames in the HSV color space.  You can find more information about HSV in general and OpenCV's HSV implementation here [1].  There you will also find a nifty tool - ColorWheelHSV - to help you visualize OpenCV's HSV implementation.  However, if you are like me, you might find it difficult to determine proper H, S, and V values by eyeballing your target and the output of ColorWheelHSV.

Enter HSVpixelpicker.

The Python script below will allow you to determine the H, S, and V values of a given pixel in an image as OpenCV sees it.  Simply pass the image file name with -f on the command line, click on the pixel you are interested in, and the H, S, and V values will be printed to the console.  Using this tool, you can check the exact value of the pixel or pixels you are interested in.  The more samples you have to test, the better the idea you will get of the H, S, and V values you are dealing with.  You can then refer to ColorWheelHSV to select your ranges.

The code is copyright © 2013 Matthew Witherwax and released under the BSD license.

from optparse import OptionParser
import cv2
from cv2 import cv

class HSVpixelpicker:
    def __init__(self):
        # original image
        self.image = None
        # image converted to HSV
        self.imageHSV = None
        
        # create window to show image
        cv2.namedWindow('image')
        # wire click handler
        cv.SetMouseCallback('image',self.on_mouse, 0);
        
    # handles left clicking on the image
    # gets the pixel under the cursor and prints its HSV values
    def on_mouse(self, event,x,y,flag,param):
      if(event==cv.CV_EVENT_LBUTTONDOWN):
        pixel = self.imageHSV[y][x]
        print 'H:',pixel[0],'\tS:',pixel[1],'\tV:',pixel[2]
        
    def open_image(self, filename):
        self.image = cv2.imread(filename)
        self.imageHSV = cv2.cvtColor(self.image,cv2.COLOR_BGR2HSV)
        
    def show(self, filename):
        self.open_image(filename)
        while True:
            cv2.imshow('image', self.image)
            
            # show image until user presses the esc key
            if cv.WaitKey(10) == 27:
                break
            
        # clean up
        cv2.destroyAllWindows()

if __name__ == "__main__":
    parser = OptionParser()
    parser.add_option('-f', '--file', action='store',
        type='string', dest='filename')
    (options, args) = parser.parse_args()

    hsv_picker = HSVpixelpicker()
    hsv_picker.show(options.filename)

[1] http://www.shervinemami.info/colorConversion.html

HSVpixelpicker.py (3.03 kb)