AV Foundation: Saving a Sequence of Raw RGB Frames to a Movie

An application may generate a sequence of images that are intended to be viewed as a movie, outside of that application. These images may be created by, say, a software 3D renderer , a procedural texture generator, etc. In a typical OS X application, these images may be in the form of a CGImage or NSImage. In such cases, there are a variety of approaches for dumping such objects to a movie. However, in some cases the image is stored simply as an array of RGB (or ARGB) values. This post discusses how to create a movie from a sequence of such “raw” (A)RGB data.

Credit is due to the very helpful respondents on this Apple Developer Forum thread. I’ve posted an Xcode project with the complete test app on github.

Creating a movie from raw RGB data in QuickTime (i.e., with QTKit) is relatively simple. However, QTKit has been deprecated, in favor of AV Foundation functionality. AV Foundation is quite sophisticated, and any thorough treatment is far beyond the scope of this post. So, I’ll focus on the nuts-and-bolts of this particular problem, assuming the reader is already familiar with the object classes we’ll be using, or will go elsewhere for more information.

Preliminaries

To write data to a movie file, we’ll need three AV Foundation objects. We’ll assume our per-frame data is described by a data pointer, a width, a height, and a count of the bytes per row.

@property (strong) AVAssetWriter                        *assetWriter;
@property (strong) AVAssetWriterInput                   *assetInput;
@property (strong) AVAssetWriterInputPixelBufferAdaptor *assetInputAdaptor;

Setting Up

We’ll set up our code to create an MPEG-4 container with a JPEG codec, and get ready to start dumping each frame to the movie:

NSDictionary *outputSettings = @{ AVVideoCodecKey :AVVideoCodecJPEG,
                                  AVVideoWidthKey :@(VID_WIDTH),
                                  AVVideoHeightKey:@(VID_HEIGHT) };

_assetInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo 
                                                 outputSettings:outputSettings];

NSDictionary *bufferAttributes = @{ (NSString*)kCVPixelBufferPixelFormatTypeKey:@(kCVPixelFormatType_32ARGB) };

_assetInputAdaptor = [AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:_assetInput
                                                                                      sourcePixelBufferAttributes:bufferAttributes];

_assetWriter = [AVAssetWriter assetWriterWithURL:[NSURL fileURLWithPath:@"/tmp/test.mov"] 
                                        fileType:AVFileTypeMPEG4 
                                           error:nil];

[_assetWriter addInput:_assetInput];
[_assetWriter startWriting];
[_assetWriter startSessionAtSourceTime:kCMTimeZero];

This is boilerplate asset-writing code: we first create a (minimal) dictionary for codec type and image dimensions. We then create the three AV Foundation objects necessary for writing: a writer input, an input adaptor, and the asset writer itself. The final three lines hook the writer input up to the writer, and gets it ready to start writing the (A)RGB data to the movie file.

Exporting Each Frame from Raw Data

In an application you might have an array of raw frame data already available, or instead, the data might be retrieved or generated on the fly. In the former case, you might have a loop that runs over the raw data frames; in the latter case, you might have a function that’s called to dump each frame as it’s generated or otherwise made available. In this simple example, we’ll just do a loop, each time synthesizing some “fake” frame data. Note that we’re allocating our data on the heap, to avoid any possibility of stack allocation limitations:

for (int i = 0; i < 500; i++) {
    long size = VID_WIDTH * VID_HEIGHT * 4;
    UInt8 *data = new UInt8[size];
    memset(data, i % 255, size);

    // write this frame out to the movie file...
}

In a real-world app, the data variable would instead be retrieved from, say, an array, or fetched off a file on a disk, or be provided as a pointer to a function that saves out one frame. The guts of the solution here is to fill out the “write this frame out” functionality.

Before addressing that issue, we’ll complete our movie-file-writing by noting that once the last frame is written, we can simply call:

[_assetWriter finishWritingWithCompletionHandler:^{}];

In any case, we’ll be writing the data from a CVPixelBuffer object, so we need to use a function that allows us to create such an object with our raw data raster. Fortunately, we can use CVPixelBufferCreateWithBytes to do this rather directly:

// write this frame out to the movie file...

CVPixelBufferRef pixelBuffer;
NSInteger        totalBytes = VID_HEIGHT * bytesPerRow;
CFDataRef        bufferData = CFDataCreate(NULL, data, totalBytes);

delete [] data;

CVPixelBufferCreateWithBytes(kCFAllocatorSystemDefault,
                             VID_WIDTH, VID_HEIGHT,
                             k32ARGBPixelFormat,
                             (voidPtr)CFDataGetBytePtr(bufferData),
                             bytesPerRow,
                             ReleaseCVPixelBufferForCVPixelBufferCreateWithBytes,
                             (void*)bufferData,
                             NULL,
                             &pixelBuffer);

[_assetInputAdaptor appendPixelBuffer:pixelBuffer 
                 withPresentationTime:CMTimeMake(i, 30)])

The CVPixelBufferCreateWithBytes function simply takes the pixel format, image dimension info, and image data, and makes us a CVPixelBuffer. We then hand it off to our input adaptor, and we’re done with this frame. Note that we’re using k32ARGBPixelFormat rather than RGBA — the reason is that Core Video does not support RGBA order.

Handling Asynchrony

Ignoring our synthesizing of the “input data” as the pedagogical crutch it is, you might object that I’ve stuck in a seemingly “extra” step of first creating a CFData object, which now has a copy of the frame data. Why copy the input data? The reason is that -appendPixelBuffer kicks off an asynchronous activity — it does not simply do its work in the current thread, and then return when it’s done.

We need to be sure that the data pointer remains valid until that activity is completed. Of course, we’ll need some way to free up that (copied) data once the activity’s done. Indeed, CVPixelBufferCreateWithBytes provides us with a mechanism for just that: the seventh parameter, which in this example is the callback function ReleaseCVPixelBufferForCVPixelBufferCreateWithBytes . The parameter immediately following it allows us to specify what it is we need to delete (the CFData object, in this case). The callback function is defined like this:

void ReleaseCVPixelBufferForCVPixelBufferCreateWithBytes(void *releaseRefCon, 
                                                         const void *baseAddr) {
    CFDataRef bufferData = (CFDataRef)releaseRefCon;
    CFRelease(bufferData);
}

Now, if our per-frame data pointer is guaranteed to be valid throughout the entire duration of the movie-creation process, we could simply provide the raw data pointer to CVPixelBufferCreateWithBytes, and provide no deletion callback at all. This would be the case if, for example, we had all the frames in an array from the start.

Because writing a frame with -appendPixelBuffer is asynchronous, there’s a possibility that the input adaptor won’t be finished with its activity when we’re ready to send the next frame down. The input adaptor’s readiness is tracked by its “readyForMoreMediaData” property, so we could simply spin in the current thread until that property became true:

while (!_assetInput.isReadyForMoreMediaData) {
    usleep(1000);
}

This certainly works, but it lacks a certain elegance. Instead, we can use key-value programming to track this, in conjunction with a semaphore. The idea is that the calling thread should wait on this semaphore until “readyForMoreMediaData” becomes true. We’ll need to define a semaphore and a helper variable:

@property          BOOL                  isWaitingForInputReady;
@property (strong) dispatch_semaphore_t  writeSemaphore;

and initialize at the beginning of our setup code, and add an observer for the “readyForMoreMediaData”:

_isWaitingForInputReady = NO;
_writeSemaphore = dispatch_semaphore_create(0);

[_assetInput addObserver:self 
              forKeyPath:@"readyForMoreMediaData" 
                 options:0 
                 context:NULL];

So, instead of napping in a loop, we can wait on the input adaptor to signal us that it’s ready:

if (!_assetInput.isReadyForMoreMediaData) {
    _isWaitingForInputReady = YES;
    dispatch_semaphore_wait(_writeSemaphore, DISPATCH_TIME_FOREVER);
}

Our key-value observer function is this:

- (void)observeValueForKeyPath:(NSString *)keyPath
                      ofObject:(id)object
                        change:(NSDictionary *)change
                       context:(void *)context
{
    if ([keyPath isEqualToString:@"readyForMoreMediaData"]) {
        if (_isWaitingForInputReady && _assetInput.isReadyForMoreMediaData) {
            _isWaitingForInputReady = NO;
            dispatch_semaphore_signal(_writeSemaphore);
        }
    }
}

This simply checks to see if the primary thread is waiting for the input adaptor’s completion of its activity, and if so, signals the semaphore. The primary thread then can resume execution, hand its data off to the input adaptor, and iterate over the next frame (or return from the function and wait to be called again on the next frame).

The Final Code

#import "AppDelegate.h"
#import <AVFoundation/AVFoundation.h>

#define VID_WIDTH  2048
#define VID_HEIGHT 1024

@interface AppDelegate ()
@property (weak) IBOutlet NSWindow                      *window;

@property (strong) NSURL                                *writeURL;
@property (strong) AVAssetWriter                        *assetWriter;
@property (strong) AVAssetWriterInput                   *assetInput;
@property (strong) AVAssetWriterInputPixelBufferAdaptor *assetInputAdaptor;
@property          BOOL                                  isWaitingForInputReady;
@property (strong) dispatch_semaphore_t                  writeSemaphore;
@end

@implementation AppDelegate

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
    // Insert code here to initialize your application
}

- (void)applicationWillTerminate:(NSNotification *)aNotification
{
    // Insert code here to tear down your application
}

- (BOOL)applicationShouldTerminateAfterLastWindowClosed:(NSApplication *)theApplication {
    return YES;
}

- (void)observeValueForKeyPath:(NSString *)keyPath
                      ofObject:(id)object
                        change:(NSDictionary *)change
                       context:(void *)context {
    if ([keyPath isEqualToString:@"readyForMoreMediaData"]) {
        if (_isWaitingForInputReady && _assetInput.isReadyForMoreMediaData) {
            _isWaitingForInputReady = NO;
            dispatch_semaphore_signal(_writeSemaphore);
        }
    }
}

void ReleaseCVPixelBufferForCVPixelBufferCreateWithBytes(void *releaseRefCon, 
                                                         const void *baseAddr) {
    CFDataRef bufferData = (CFDataRef)releaseRefCon;
    CFRelease(bufferData);
}

- (IBAction)testVideo:(id)sender {
    _isWaitingForInputReady = NO;
    _writeSemaphore = dispatch_semaphore_create(0);

    _writeURL = [NSURL fileURLWithPath:@"/tmp/test.mov"];

    [[NSFileManager defaultManager] removeItemAtPath:[_writeURL path] error:NULL];
    
    NSDictionary *outputSettings = @{AVVideoCodecKey :AVVideoCodecJPEG,
                                     AVVideoWidthKey :@(VID_WIDTH),
                                     AVVideoHeightKey:@(VID_HEIGHT)};
    _assetInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo 
                                                     outputSettings:outputSettings];

    [_assetInput addObserver:self 
                  forKeyPath:@"readyForMoreMediaData" 
                     options:0
                     context:NULL];

    // Create the asset input adapter
    NSDictionary *bufferAttributes = @{ (NSString*)kCVPixelBufferPixelFormatTypeKey:@(kCVPixelFormatType_32ARGB) };

    _assetInputAdaptor = [AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:_assetInput
                                                                                          sourcePixelBufferAttributes:bufferAttributes];

    // Create the asset writer
    _assetWriter = [AVAssetWriter assetWriterWithURL:_writeURL 
                                            fileType:AVFileTypeMPEG4 
                                               error:nil];

    [_assetWriter addInput:_assetInput];
    [_assetWriter startWriting];
    [_assetWriter startSessionAtSourceTime:kCMTimeZero];

    for (int i = 0; i < 500; i++) {
        CVPixelBufferRef pixelBuffer;

        long size   = VID_WIDTH * VID_HEIGHT * 4;
        UInt8 *data = new UInt8[size];
        memset(data, i % 255, size);

        NSInteger samplesPerPixel = 4;
        NSInteger bytesPerRow     = VID_WIDTH  * samplesPerPixel;
        NSInteger totalBytes      = VID_HEIGHT * bytesPerRow;

        CFDataRef bufferData = CFDataCreate(NULL, data, totalBytes);
        delete [] data;

        CVPixelBufferCreateWithBytes(kCFAllocatorSystemDefault,
                                     VID_WIDTH,
                                     VID_HEIGHT,
                                     k32ARGBPixelFormat,
                                     (voidPtr)CFDataGetBytePtr(bufferData),
                                     bytesPerRow,
                                     ReleaseCVPixelBufferForCVPixelBufferCreateWithBytes,
                                     (void*)bufferData,
                                     NULL,
                                     &pixelBuffer);

        if (!_assetInput.isReadyForMoreMediaData) {
            _isWaitingForInputReady = YES;
            dispatch_semaphore_wait(_writeSemaphore, DISPATCH_TIME_FOREVER);
        }

        [_assetInputAdaptor appendPixelBuffer:pixelBuffer 
                         withPresentationTime:CMTimeMake(i, 30)]

        CFRelease(pixelBuffer);
    }

    if (!_assetInput.isReadyForMoreMediaData) {
        _isWaitingForInputReady = YES;
        dispatch_semaphore_wait(_writeSemaphore, DISPATCH_TIME_FOREVER);
    }

    [_assetWriter finishWritingWithCompletionHandler:^{}];
    [_assetInput removeObserver:self 
                     forKeyPath:@"readyForMoreMediaData"];
}

Code From Above