Warren Moore — Understanding CMTime

Understanding CMTime

A sign for the London Underground appears in the foreground of a shot of the world-famous Big Ben clock

I recently opined on Twitter that the CMTimeMakeWithSeconds function (part of the Core Media framework in iOS and Mac OS X) behaves in very odd ways. This post is my attempt to explain the purpose of the CMTime structure, and how to work with the sometimes-confusing functions used to create and manipulate time objects.

A precise perspective on time

Most people never need to think very precisely about time. To take an extreme case, maybe you're an Olympic runner who cares about a difference of milliseconds between your speed and the world record in the 100m dash. That's about as far as it goes. When it comes to timed media, however, we often care about very short increments of time (tens of microseconds) and sometimes very long durations (days or weeks).

Suppose we want to precisely specify a moment in a movie file, like 35:06. A naïve approach would be to represent time as some double-precision floating-point quantity, like this: NSTimeInterval t = 2106.0;. This is actually fine for most uses, but it breaks down when we care about very long stretches of time divided up into very small slices. Without going into the mechanics of floating-point formats, such a double can hold roughly 16 (decimal) significant digits in 8 bytes (On all of the platforms I currently care about, sizeof(NSTimeInterval) == sizeof(Float64) == sizeof(double) == 8).

Floating-point numbers suffer one big problem: repeated operations (addition, multiplication, etc.) cause the accumulation of imprecision which can result in noticeable divergences after a long period of time, potentially causing errors in synchronizing multiple media streams.

As a quick example, summing up one million terms of 0.000001 results in a value of about 1.0000000000079181. The error is caused by the fact that 1e-6 cannot be exactly represented in the floating-point format we use, so we instead use a binary approximation that differs in its less-significant bits. This error is not that big, but if you're running an HTTP streaming server, you might be accumulating this imprecision several thousand times per second for an indefinite period of time.

This motivates us to find a way to represent time more precisely, by doing away with floating-point representations and their inherent imprecision (to say nothing of their hard-coded rounding behaviors).

Time as a rational number

There have been plenty of data structures used by Apple to represent time on the Mac and iOS platforms. With the introduction of iOS 4 and Mac OS X 10.7, a couple more were added: CMTime and CMTimeRange. Once you understand the former, the latter is pretty self-explanatory, so I won't spend any time discussing it here.

CMTime is really nothing more than a C struct with four members. It looks like this:

typedef struct
{
   CMTimeValue    value;
   CMTimeScale    timescale;
   CMTimeFlags    flags;
   CMTimeEpoch    epoch;
} CMTime;

The remainder of this section will focus on value and timescale, but flags can take on some important values that will also merit a mention later. Various values of flags allow us to indicate that a timestamp is postive or negative infinity, or if it has been rounded as a result of some intermediate calculation. In this way, the struct is much more expressive than a single floating-point quantity could ever be, and this has numerous advantages.

Let us consider how time is actually expressed by the CMTime struct. It's very important to realize that value and timescale are both stored as integers: 64 bits and 32 bits respectively. It should be obvious from the previous discussion, but the reason value is stored as an integral value is to avoid the errors of the type we saw in the floating-point example. Also, by allocating an entire 64 bits to the numerator, we can represent 9 billion billion distinct positive values, up to 19 decimal digits, with no ambiguity, for each possible timescale value.

And what of the timescale value? It represents the number of "slices" each second is divided into. This matters because the precision of the CMTime object as a whole is limited by this quantity. For example, if timescale is 1, no timestamp smaller than 1 second can be represented by the object, and timestamps go in increments of one second. Similarly, if timescale is 1000, each second is subdivided into 1000 pieces, and the value member represents the number of milliseconds we want to signify.

How do you choose a sensible timescale to make sure you don't get truncated? Apple recommends a timescale of 600 for video, with the explanation that 600 is a multiple of the common video framerates (24, 25, and 30 FPS). You might want to crank this up to 60,000 or higher if you need sample-exact indexing on audio files. What's nice about those 64 bits in value is that you can still represent up to 5.8 million years in increments of 1/60,000th of a second unambiguously in this way.

The number of seconds represented by a timestamp is simply t.value / t.timescale. You can use the CMTimeGetSeconds function to easily transform a CMTime into a Float64 in a way that maintains as much precision as possible.

Not-so-convenient convenience methods

CMTimeGetSeconds may be a friendly and useful method, but its evil twin, CMTimeMakeWithSeconds is anything but friendly to newcomers. Here's its signature:

CMTime CMTimeMakeWithSeconds (
   Float64 seconds,
   int32_t preferredTimeScale
);

The first couple of times I used this function, I could not make it do what I wanted. Now that we've gone through all the internals of CMTime, I hope it will be obvious to you where I went astray. I was trying to represent a quantity of 0.5 seconds, so I could get periodic callbacks from an AVPlayer object that was streaming an MP3. I tried to do this:

CMTime interval = CMTimeMakeWithSeconds(0.5, 1);

If you've followed along so far, you already know that interval actually represents 0, not 0.5. It doesn't make sense to ask for a quantity that represents one-half on a number line consisting only of whole numbers, but that's exactly what we're asking for here. Our value gets truncated (rounded toward zero) because our timescale isn't precise enough to represent it.

The Core Media API includes a full complement of functions for constructing, comparing, and performing arithmetic on CMTime. Although performing arithmetic operations on CMTime using these functions is verbose, it is essential to use them if you want to retain precision and integrity. Each of these functions performs special overflow checks and rounding and will set the appropriate flags on the CMTime objects when the need arises.

To add two CMTimes, use the function CMTimeAdd. To compare two times, use CMTimeCompare (the return value of this function follows the convention of comparator functions as used by the C standard library method qsort). To find the larger of two times, use CMTimeMaximum. Read the documentation for plenty of others; this set of functions is quite comprehensive.

Indefinites and infinities: set your freak flag high

The final topic relating to CMTime is the representation of infinities, indefinite values, and rounding. An "indefinite" time is one whose value is unknown. You might receive such a timestamp from an object if you ask for its elapsed time before it's been initialized. The flags value can be checked for each of the following to determine if any of these exceptional situations applies:

kCMTimeFlags_PositiveInfinity
kCMTimeFlags_NegativeInfinity
kCMTimeFlags_Indefinite

Since bitwise OR'ing doesn't appeal to most people, there are useful macros for inferring the presence of these flags. Respectively:

CMTIME_IS_POSITIVE_INFINITY
CMTIME_IS_NEGATIVE_INFINITY
CMTIME_IS_INDEFINITE

Helpfully, these macros also ensure that the timestamp in question is, in fact, valid. They evaluate to NO if the timestamp isn't valid in the first place.

Also, here's a laundry list of how comparisons amongst different categories of times work: Invalid times are considered to be equal to other invalid times, and greater than any other time. Positive infinity is considered to be less than any invalid time, equal to itself, and greater than any other time. An indefinite time is considered to be less than any invalid time, less than positive infinity, equal to itself, and greater than any other time. Negative infinity is considered to be equal to itself, and less than any other time. Now you know.

Applications

I've run out of space for describing where you'll actually be using CMTime objects. But, the odds are pretty good that if you're reading this, you already have a pretty good idea what you want to do with them. AVFoundation objects such as AVPlayer and AVAssetReader use CMTime to talk about time, and there's plenty more where that came from.

If you have suggestions or corrections that would improve this post, please email me. I want this post to be everything I would have hoped for back when I was grappling with this stuff for the first time, so your perspective matters to me.

Translations

This article has been translated into Russian by Victor Grushevskiy: О применении CMTime. Thanks for the translation, Victor!

warrenm

Warren Moore is a Cocoa developer and occasional trainer, speaker, and blogger.

PUBLISHED MAY 23, 2012