Moving the Image

Some sort of “playable” table of contents.

For the past three years, I have been obsessed with the potential of computer interfaces to fundamentally change our relationship with the moving image.

Producing and distributing video has never been more accessible to more people. And alongside the boon in consumer video hardware and free or low-cost networked distribution platforms, consumers of video have a plethora of free, ad-supported, subscription, à-la-carte, and illicit methods for obtaining huge troves of video. While the Internet has produced some genuinely novel uses of video—eg. the very short and potentially transient forms—I believe that the design of interfaces for video distribution is largely stagnant and cynical. Stagnant, in aspiring to be little more than an upgrade to on-demand TV, and cynical in seeing viewers primarily as a lucrative market segment rather than as genuine co-participants in the creation of meaning.

There are two essential modes of control over the passage of time in a video:

  1. relative adjustments, a hallmark of mechanical devices that can spin faster and slower, forwards and back; and
  2. absolute (random-access, non-linear) controls, an affordance that digital media are uniquely able to offer.

The combination of both modalities is both simple and powerful; I find the utter lack of the former to be inexplicable in the vast majority of video players.

Simple player with drag/swipe relative adjustments, and full-width absolute seekbar.

In the sections that follow, I will document some of my attempts to provide the viewer with control over the illusion that is central to the moving picture, and I will speculate about blurring the distinction between producing and consuming media.

Control over an Illusion

Let's start with our example player from above and call that the illusion of motion; this illusion is constructed out of a reality we'll call the bucket of frames.

Two-pane player + frame-bucket, with scaling transitions between and a check-box to disable the connection between video and bucket.

We rarely see the “bucket of frames” for what it is, and that's probably for the best. Disable the connection and note how tedious it is to comprehend the video without reference to the “illusion.” I am not setting up a simple dichotomy where illusion is bad and control is good. Without the illusion of motion, seeing a video as what it “really” is, ironically makes it harder to understand—nearly impenetrable.

The illusion of motion is what connects the experience of video with the experience of things that are (actually) in motion all around us and allows us to leverage our highly evolved, unconscious, facilities for discerning meaning in moving imagery. “Hardware acceleration,” if you will.

So while the illusion of motion in a conventional video player leverages our powerful, intrinsic, capabilities for finding meaning in our changing visual field, at the same time there are severe downsides when compared, for example, to written material. Unlike a book, streams of video are difficult or impossible to skim, compile, underline, bookmark, excerpt, or search. Could some of these capabilities be introduced into a computer-based video interface? The standard illusion of motion and the grid-like bucket of frames are not the only ways to render out video, and in early 2011 I made an interface for exploring the different ways to create timelines.

Tangle-enhanced timeline-strip “minuteman” interface.

When I first made this interface, I had several revelations:

  1. you can carve pseudo-thumbnails by sweeping through different parts of successive frames periodically; and
  2. when you crop each frame to one pixel, all of a sudden the slitscan that results gives an (occasionally surreal) flash into the video contents;
  3. that slowly changing the crop factor (eg. from 40px to 30px) appears to animate the whole timeline.

While I had expected (1) to be fruitful, it was actually (2) and (3)—neither of which I anticipated— that suggested future directions for exploration. Drawing off (2), embedding an unmutilated frame within a slitscan and moving it simultaneously in space and time creates a very tactile effect, with good anticipation and recall:

Frame in slitscan.

Alternatively, drawing off the full-timeline motion noted in (3), a viewer could adjust the starting offset (or phase) of a timeline to flip through the whole thing at once:

Phase-offset flip-timeline.

More recently, but in a similar vein, I've turned to the GPU to overlay and composite, rather than crop or stagger, successive frames of video, allowing flip-like animation as well as abstract zoom:

GPU timelines

These timeline hybrids, in their different ways, are starting to connect the illusion of motion with the reality of the bucket of frames. In other words, they start to free us from the tyranny of linear time, and provide context to help relate the part (moment) with the whole (sequence). To summarize, timelines hybrids can provide some control over the illusion of motion, can empower the viewer with improved recall over previously-watched footage, and can provide some degree of anticipation into unwatched material.

Map/territory: creating vivid terrain

The presence of dynamic timelines alters our relationship with the passage of time, and we can fill the screen with them to create new, immersive, media instruments. As in a writing system, where sound has been frozen into spatial form and time progesses in visual sequence, here each frame represents a duration of time, flowing like English text from left to right and top to bottom. But unlike text, a synesthetic mapping between sound and sight does not need to be invented; the visual field can be transformed within its own medium into space. Additionally, zoom controls allow synopsis over a variable duration:

livezoom

This idea was optimized and simplified into the InterLace4 release, which combines a four-level zoom with an inlaid video player and a dense network of textual annotations. At the most zoomed out, the entire conference is visible on a single screen, while zooming in reveals the spectral traces of individual words and utterances.

interlace/videovortex9

When timelines move into two dimensions, they begin to comprise a map of the contents within. But the contents of video are often themselves spatial: the territory of video is not merely frames of still images, but traces of physical space.

By inverting the logic of a video stabilizer and zooming out to show the shifting frame of the camera rather than zooming in to hide it, a shaky camera can leave a peripheral memory.

periphery (?)

Paradoxically, this technique is dependent on my horrid camera work. If we only have minor camera shake, as in this stable video, the edges hardly expand our field of vision. Or, on the other extreme, periphery fails when the camera is in motion and the background and foreground separate irreconcilably. As you can see by trying this bus window video, the algorithm is hopelessly caught between the highway and the mountains.

While periphery fails in the case of a moving camera, we can cross-reference a camera's path into physical space, using floorplans and trajectories to serve as dual index into time and space.

omni spacetime
The Omni Spacetime example is compelling for this particular example; however, not only does it require manual construction but it also de-emphasizes whatever may be happening within the frame. Alas: we are entering a hopeless matrix of compromise!

Mathematicians are good at abstracting to the most general form of any problem, and although he was not thinking about arranging video frames, per se, through the work of Hilbert we may glimpse into a powerful abstraction for turning the one-dimensional line into a 2-D form:

the hilbert studies: path, zoom, and flow

We are using successive iterations of the Hilbert Curve to create a terrain of time. Unlike the left-to-right, up-to-down model of text, with this model there is no need to force a discontinuous break from one line to the next. In addition, a hierarchical zoom is possible that doesn't disrupt spatial order.

It's worth mentioning that the problem of line breaks is not unique to video grids: in fact text suffers the same problem! Typographers have long employed vertical columns to alleviate this problem, and something like the textual columns may be helpful for video grids as well.

columns; the griddle times

Reading is writing, and the value of focus

So far, I have focused almost exclusively on technique, giving comparatively little exposition to why all of these methods are in such desperate need of development. To some extent, this bias reflects my own plunge and immersion into the details of video representation at the edge of illusion and reality, but I hope this research is in the service of loftier aims. In particular, I hope to enable a mode of engagement with media where the viewer can create their own layers of meaning and paths of story. Hybrid timelines and the potential of meida terrains are important to introduce first because they amplify the potential of almost anything else that can be done with video.

Focusing attention is the core challenge of the 21st century, and it is the same essential task as shooting and editing a film. A finished film directs the viewers' gaze, and a talented filmmaker will so captivate her audience that they hardly notice the frame's edges and the narrative's cuts; I believe that with dynamic representations of video we can transform video from a mass medium (dependent on mastery of complex production techniques) into more of a folk meida, where communities of individuals will share the responsibility for focusing on their own priorities. Access to tools and distribution channels alone is not sufficient; if the medium is the same as TV, the new access will at best only shuffle power structures within the same general framework.

Something About Cinematic Reality

Here is an interface for collaborative logging/annotation of video. It respects the illusion of motion by keeping video playing at all times, and in group settings it allows for the illusion to be shared. But at the same time, one's impressions can be “stuck” onto the timeline, and the individual can temporarily diverge from the master timeline to recall something, jumping back as needed. Adding notes, then, is hardly a laborious, start-stop, process but simply an expression of focused attention. And after notes have been input, they can be lucidly recalled in extemporaneous sequence.

My inspiration was to turn video into something closer to text, but in doing so realized that text itself was lacking much of the power that the visual representation should have permitted. To reify the premise of reading is writing, I carved out an old encyclopedia and placed an updated digital encyclopedia (an offline cache of Wikipedia) and a thermal printer inside, so that when you read its contents, it prints a document to your reading.

Generating a “receipt” of paths traveled

“Narrative” at the speed of conversation

The goal of developing visual and interactive methods to represent and index video are in contrast to text-and-number-centric relational database systems, and there is an inherent tension between the two. Databases promise instantaneous answers to queries, while video curates the passage of time. Notwithstanding the spreadsheet, which standardizes a visual form for tabular data and affords certain numeric and graphic synopses, the representation of computer database is optimized for storage in the memory of the machine; btree indices do little to aid a human in their attempts to understand.

By integrating a relational database view with inline video timelines, the database referents can be explored from the referring table row.

Screen Dreams

The resulting texture of decoding a video into a spreadsheet is unsettling; the experience is inside-out. Used conventionally, the database denies film the grammar of visual-temporal continuity that gives any and all meaning to the form. Instead of crudly inserting video into formats designed for unmoving text and numerical data, we should invert the place of database into the service of maintaining video's illusion of motion.

With Montage Interdit, many of the same operations as in the database-spreadsheet can be achieved without losing our orientation within a playing video. The database provides context: beyond the instant in the video, we can see where we are within a clip, within a sequence, and within the whole archive.

Montage Interdit

The filmmaker has given us sequences that cut through the archive around a collection of topics—these are analogous with short films—but more broadly this organization has enabled a conversation, that can be reincorporated within, as in the linked Skype interviews and also in live public space, as with its performance at BDF2.

The conversation is only beginning.

Future directions: escaping the tiny rectangle

These ideas and projects are the start of a vision for the moving image in a dynamic networked medium. As online video providers grow to disrupt traditional distribution channels, I hope the filmmakers and viewers within these new systems will have more options for navigating this ever-increasing quantity of media. Along with a video bitstream, viewers must be provided with software that gives them control over the illusion of motion. Video need not be distributed in contained and isolated timelines, but can be organized into a vivid terrain of intersecting paths. New media-paths created through attentive viewing are a valid form of authorship, and ultimately the creation of these new sequences should be as fast as their viewing.

But I'm coming to realize that trapping these interfaces within the canonical frame of the personal computer (mobile or not) can be isolating and limiting. While this essay has addressed web interfaces that should be offering more than TV, I am curious to explore how these techniques can be adapted for shared, social environments (eg. the cinema and exhibition space) that can humanely acknowledge our bodies and co-presence as we stand trying to come to terms with all that has come before us.

Many thanks, to many people...

people like you!