FFmpeg: convert RGB(A) to YUV

I recently had a need to convert a series of rendered images  generated in an application to a movie file. The images are rasters of raw 32-bit RGBA values. A typical solution to this problem would be to dump the images to disk, and then use a command-line program,  such as ffmpeg, to convert them to the desired movie format (in this case, MPEG-2).

In my particular usage scenario, this simple solution was not an option for various reasons (nearly unbounded disk usage, user interface issues, etc.). Another option is to use the FFmpeg API to encode each frame’s raw RGBA data and dump it to a movie file. Unfortunately, I was unable to find a codec that would directly convert from the raw data to the desired movie format.

A web search turned up a potential solution: http://stackoverflow.com/questions/16667687/how-to-convert-rgb-from-yuv420p-for-ffmpeg-encoder

It turns out you can convert RGB or RGBA data into YUV using FFmpeg itself (SwScale), which then is compatible with output to a file. The basics are just a few lines: first, create an SwsContext that specifies the image size, and the source and destination data formats:

AVCodec *codec = avcodec_find_encoder(AV_CODEC_ID_MPEG2VIDEO);
AVCodecContext *c = avcodec_alloc_context3(codec);
// ...set up c's params
AVFrame *frame = av_frame_alloc();
// ...set up frame's params and allocate image buffer
SwsContext * ctx = sws_getContext(c->width, c->height,
                                  AV_PIX_FMT_RGBA,
                                  c->width, c->height,
                                  AV_PIX_FMT_YUV420P,
                                  0, 0, 0, 0);

And then apply the conversion to each RGBA frame (the rgba32Data pointer) as it’s generated:

uint8_t *inData[1]     = { rgba32Data };
int      inLinesize[1] = { 4 * c->width };
sws_scale(ctx, inData, inLinesize, 0, c->height, 
          frame->data, frame->linesize);

One important point to note: if your input data has padding at the end of the rows, be sure to set the inLineSize  to the actual number of bytes per row, not simply 4 times the width of the image.

If you’re familiar with the FFmpeg API, this info should be sufficient to get you going. The FFmpeg API is quite extensive and a bit arcane, and even something as functionally simple as dumping animation frames to a movie file is not completely trivial. Fortunately, the FFmpeg folks have provided some nice example files, including one that demonstrates some basic audio and video encoding and decoding: https://www.ffmpeg.org/doxygen/2.1/decoding__encoding_8c.html

I took the source for the video encoding function and hacked it up to incorporate the required RGBA to YUV conversion. The code performs all the steps needed to set up and use the FFmpeg API, start to finish, to convert a sequence of raw RGBA data to a movie file.  As with the original version of the code, it synthesizes each frame’s data (an animated ramp image) and dumps it to a file. It should be easy to change the code to use real image data generated in your application. I’ve made this available on GitHub at:

https://github.com/codefromabove/FFmpegRGBAToYUV

For Mac programmers, I’ve included an Xcode 6 project that creates a single-button Cocoa app. The non-app code is separated out cleanly, so it should be easy for Linux or Windows users to make use of it.

Other input formats

The third argument to sws_getContext describes the format/packing of your data. There are a huge number of formats defined in FFmpeg (see pixfmt.h), so if your raw data is not RGBA you shouldn’t have to change how your image is generated. Be sure to compute the correct line width (inLinesize in the code snippets) when you change the input format specification. I don’t know which input formats are supported by sws_scale (all, most, just a few?), so it would be wise to do a little experimentation.

For example, if your data is packed 24-bit RGB, and not 32-bit RGBA, then the code would look like this:

SwsContext * ctx = sws_getContext(c->width, c->height,
                                  AV_PIX_FMT_RGB24,
                                  c->width, c->height,
                                  AV_PIX_FMT_YUV420P,
                                  0, 0, 0, 0);
uint8_t *inData[1]     = { rgb24Data };
int      inLinesize[1] = { 3 * c->width };
sws_scale(ctx, inData, inLinesize, 0, c->height, 
          frame->data, frame->linesize);