Wednesday, March 19, 2014

LED Planet Software

In my last post, I discussed the process of building a spiral-sphere of LEDs that could be controlled from my laptop. The purpose was to create visuals for each movement of Gustav Holst's The Planets for an upcoming concert of the Boulder Symphony Orchestra. I wrote about the math that went into the design, the process of 3D printing the structure in many pieces, and powering the LED strip that was glued to the outside.


The work required to get this sphere ready for filming did not end there. In fact, building the sphere was fairly simple compared to what followed. While the sphere satisfied all of the physical requirements for the film, there was still a set of requirements on how the sphere would act. I've said that the LEDs would be controlled from my laptop, but in more concrete terms, the list of requirements was:
  • Independent control over each LED from laptop
  • Full-sphere refresh rate of at least 30 Hz
  • Pre-coded, modifiable effects (pulsing, twinkling, etc)
  • Image and video projection capabilities
  • Effect mixing
  • Keyframed effect attributes
Each task on its own was a formidable challenge. Implementing all of these together would require code written on multiple platforms, from a tiny 8-bit microcontroller running at 16 MHz to my workstation-laptop with a 64-bit quad-core processor running at 2.4 GHz. Luckily, every platform I had to work with could be programmed with C, so I could stick to one language throughout the whole project. While the entire code for this post (as well as my last 3 posts) is hosted on github, it's an ugly beast of a code. I'll try to include relevant code examples that have been isolated from the main code and cleaned up for legibility.

I considered splitting this software post into multiple sub-posts talking about the different levels of software I had to write to get everything to work. Instead of doing this, I've decided to lump it all into this post. I figure that it is the collaborative effort of all of these levels that make the software side of this project interesting. But to help organize this post, I've split it up into 5 sections:

Part 1 - Driving the LED Strip
Part 2 - Data Transfer
Part 3 - Parallel Protocol
Part 4 - Simple Effects
Part 5 - Sequenced Effects

As I step through the different levels of the software, I will be talking about how long it takes to run certain segments of code. Since I had a target frame rate that I needed to hit, timing was of utmost importance. A frame rate of 30 Hz means that every bit of software has to work together to produce a new frame in less than 33 milliseconds. To get across the idea of how long various bits of code take relative to 33 ms, I'll be using a basic timing diagram:


The idea here is to visualize how long different blocks of code take by displaying them as, well, blocks. The horizontal bar that contains the red, yellow, orange, and white blocks represents what a given processor is doing. As time passes from left to right, the processor performs different tasks, each colored differently. The lower bar that remains the same color represents the LED strip. It does not do any appreciable data processing, so I'll treat it as a passive item. The vertical arrows from the processor to the strip indicate the flow of data, and are positioned in the center of a code block that handles the transfer. Finally, the vertical lines in the background give the time-scale of things. I have placed these 33 ms apart, since that is our target update time. Their spacing on the screen may change between images as I zoom in and out of different features, but they will always represent the same time step. The goal is to transfer new data to the LED strip at least every 33 ms in order to get an update rate of 30 Hz, so in this kind of diagram we want to see one of those black arrows occur at least as often as the background lines. In the example above, the processor serving data to the LED strip is not fast enough.

I've measured the time needed to run various blocks of code by hooking up a Salea Logic probe to a pin of the relevant computing platform and triggering responses by inserting commands to toggle the pin to a 1 or 0 in the code. In most instances, the toggling takes around 0.0001 milliseconds. Since I'll probably be rounding most timing figures to the nearest millisecond, I've treated the toggling as instantaneous.

Part 1 - Driving the LED Strip

As with many of my projects, I used a flexible strip of WS2812s as my LEDs. These are surface-mount RGB LEDs with on-board memory and driving. Data for what color the LED should be is shifted down the strip to each LED through a single line. The datasheet for these LEDs gives details about how data must be formatted in order to properly communicate with them. The key to reliable data transfer to the strip is precise timing. Luckily, there are a few freely-available libraries on the internet that handle this timing. I happen to prefer the Adafruit Neopixel library, but I have heard that there are others just as good. This library includes sections of Assembly code that are hand-tuned for various architectures and processor speeds, all intended for use on an Arduino-compatible microcontroller. Since I'm just barely comfortable reading Assembly, I'm glad someone else has taken the time to work out this library for others.

For testing, I used my workhorse Arduino Mega. It runs at 16 MHz and has plenty of memory and 5V digital I/O pins. The interface for using the library is very simple:

Show/Hide Code

To reach the target frame rate, I didn't need to be concerned with the time it took to initialize things. All that mattered was how often I could call strip.show(). Since the timing of the data transfer is determined by the strip, and not the processor sending data, updating 233 pixels takes roughly 7 ms. In my timing diagram, the above code looks like:


If all we want to do is keep telling the strip to update, we can get a frame rate of over 130 Hz. Unfortunately, this doesn't allow for changing any of the LED colors ever. Hardly seems worth the frame rate if the image never changes. What we want is for a separate computer to do the complicated calculations that determine what image should be displayed, then the computer sends the LED values to the Arduino Mega via USB. The data is parsed on the Mega and then the LED strip is updated.

Part 2 - Data Transfer

I wrote a short C code that ran on my computer, calculated some LED values, and sent the values out to the Mega. It packaged the information for 233 LEDs in a 2048-byte packet, 5 bytes per LED and some padding on the end. The color for any given LED is specified by 3 bytes, but I included a 'pixel start' byte and a 'pixel index' byte to reduce visual glitches. The USB specification indicates that rounding up to the nearest 64 bytes for my package would have worked, but for some reason the nearest 1024 worked better. The Mega ran a code that waited for incoming serial data from the USB to serial converter, copied the incoming data to a processing buffer, parsed through the buffer to set LED values, then updated the LED strip. The additional code (neglecting some of the elements from the last snippet for brevity) looks like this:

Show/Hide Code

On my laptop, I wrote a short code to begin serial communications, package LED color values into a 2048-byte packet, and send them along. The code to open the serial port in Linux was ripped from the example code here:

Show/Hide Code

The data packaging code assumes it is given a struct called a 'strip' that contains arrays of length NUMPIXELS for the red, green, and blue channels:

Show/Hide Code

The timing diagram for the Mega code and laptop code working together is as follows:


I've added a new bar to show what my laptop was doing. The blue blocks are the USB transfer and the white blocks are delays introduced to keep things synchronized. On the Mega, the orange block still shows updating the LED strip, the yellow block is parsing the data buffer, and the red block is receiving the USB data. As you can see, the data transfer rate is slow compared to the background vertical lines. My Mega was only able to copy data in at around 55 kB/s, resulting in almost 38 ms for just the data transfer. This is close to the peak transfer speeds found in others' experiments. My data parsing code (yellow) also took around 20 ms. Not enough to push the frame rate past 33 ms by itself, but it certainly wasn't helping. At this point, I was updating the strip at 15 Hz. Too slow by a factor of two.

Luckily, I had a solution. If the Mega was too slow, I just had to use a faster microcontroller! I happened to have a few Teensy 3.1s fitting around, which boast USB transfer rates up to 20x faster than the Mega. The Teensy is Arduino-compatible, but uses a slightly different library for driving LED strips such as the one I was working with. Still, the interface was basically the same and it took the same amount of time to update the strip (again, determined by the strip, not the controller). I ported my Mega code over to the Teensy and ran it with the laptop still supplying USB data.


Blindingly fast! Well, at least compared to the sluggish Mega. The code on the Teensy is similar to that running on the Mega, but colored differently. Lime is the strip update, green is the data parsing, and light blue is the data receive. With the higher performance USB communication and 96 MHz clock (versus 16 MHz on the Mega), both the data transfer and the data parsing have sped up immensely. The frame rate here is about 77 Hz, more than twice what I need.

So that's it, right? I can run at 77 Hz, assuming I can get my laptop to churn out a new frame every 13 ms. Unfortunately, no. The Teensy runs at 3.3V as opposed to the Mega's 5V, so the data signal to update the LED strip needs to be buffered. I used an 74HTC245 octal bus transceiver as suggested by this site to bring the 3.3V digital signal up to 5V. It didn't work. For some reason, the LED strip did not like this voltage level. The only way I could get the strip to accept data was to drop the power supply 5V line to around 4.5V, but this was not something I could do continuously due to the nature of my power supply. Without an oscilloscope, it was nearly impossible to determine what the quality of the data line was like. It was entirely possible that the transceiver was giving me a crummy signal, but I had no way of knowing for sure. I was under a strict deadline to get the entire LED system up and running two days after this unfortunate discovery, so I had to consider my options:

 - the Mega could drive the strip, but couldn't get data fast enough
 - the Teensy could get data quickly, but couldn't drive the strip (for unknown reasons)

The answer was suddenly clear: use both! The Teensy would receive data from the laptop, pass it to the Mega, and the Mega would update the strip. The only catch was that I needed a way of passing data from the Teensy to the Mega faster than the Mega could handle USB data.

Part 3 - Parallel Protocol

Let's start by looking at how you can read in data manually to the Mega. Using some bits of non-Arduino AVR-C, the code to quickly read in a single bit of data from a digital I/O pin is:

Show/Hide Code

Not bad for two lines of code. The port values (DDRA, PINA) are found in tables listing the internal registers of the Mega and the Arduino pin mappings. I use these bitwise commands instead of the standard Arduino digitalRead() in order to speed up the process of reading pin values. The above code isn't too useful, because the data being read in one bit at a time is being written to the same place in memory before the previous bit is stored anywhere. But before I go into how to manage data as it arrives, we can look at how to speed up the existing code that reads in data. You may think, how can you possibly speed up a single line of code that probably only takes a handful of clock cycles? Easy:

Show/Hide Code

By removing some of the code, we've sped up data input by a factor of 8. What exactly did I do? I removed the mask that blocked out the other 7 pins available on PORTA, allowing a mapping of 8 digital I/O pins to 8 bits of internal memory. Now the code will read in 8 bits simultaneously every time that line of code is executed. Not a bad improvement. How does this compare to the serial USB communication from before? The average data rate I could get before was 55 KB/s, or roughly 284 clock cycles of the Mega for each byte. The act of grabbing a whole byte of data using this new method takes only a single clock cycle, leading to an impressive 15 MB/s. This is of course, completely unrealistic. The Mega doesn't only have to read the value of the port, but also store the value somewhere in memory. If I want to do something useful with the incoming data, the Mega also has to make sure each new byte of data gets stored in a unique location. Then there is the issue of synchronization with whatever is supplying the incoming data. You wouldn't want to miss an incoming byte of data, or even read the same byte twice before it is updated. Solving each of these issues creates overhead that will drastically slow down the data rate. The hope is that even with appropriate data management and synchronization, this method is still faster than using serial communication.

To manage where each new byte of data goes in the Mega, I used a 2048-byte buffer that could store the entire package sent for each update for the LED strip. As each byte is read in, it is placed in the buffer, and a counter is incremented to keep track of where in the buffer the next byte should go. To handle synchronization, I added two more data pins to the existing 8 pins of this parallel protocol. One pin is the Mega-Ready pin (RDY), which signals to the data source that it is prepared to accept data. The other new pin is the Data-Clock pin (CLK), which the data source uses to tell the Mega that a new byte has been made available and that it should read it in. I used an interrupt on the Mega triggered by the CLK signal to read in a byte. The Mega code to read in data and process it once there is enough looks like this:

Show/Hide Code

I found that the process of jumping to the interrupt routine and running the code within took around 6 microseconds. Assuming the data source can instantaneously provide a new byte once the Mega has finished reading the previous one, this allows for an overall data rate of 162 KB/s. Not nearly as good as the overly-ideal 15 MB/s, but much better (3x) than the original serial 55 KB/s.

Teensy on left, Mega on right. 8 data lines plus RDY and CLK.

The data source for the Mega is the Teensy controller. Since it can handle fast communication with the laptop and has a faster clock speed than the Mega, it handles reading in USB data and splitting each byte up into one bit per pin of the Mega.

Show/Hide Code

Going back to the timing diagrams, we can look at how the laptop, Teensy, and Mega work together to pass data along to the LED strip in an optimal way:


Starting with the top bar, the laptop spends its time computing what color each LED should be (black), sending the 2048-byte packet over USB to the Teensy (purple), and sitting around waiting for everyone else to catch up (white). Next, the Teensy waits for data from the laptop (white), receives and stores the data (blue), then starts arranging the data and sending it on to the Mega using the parallel protocol (green). The Mega receives the parallel data from the Teensy (red), parses it into LED values (yellow), then updates the LED strip (orange). The most important part of this figure is that by using the parallel protocol, the overall time taken to update the LED strip is less than the 33 ms vertical lines. This means the entire system can finally run at faster than the initial goal of 30 Hz.

It's interesting to see the relative time it takes to do various tasks on different computing platforms. The laptop can perform all sorts of complicated mathematical operations to determine what pattern will appear on the sphere in less time than it takes the Mega to just pass on the LED values to the strip. It's also neat to see how each platform independently handles its own share of the work, then joins with another platform to ensure a steady flow of data.

And with that, I had a reliable way of updating every LED on the spiral-sphere at right above 30 Hz. While it may not have been the simplest solution out there, I ended up very fond of this method. It was an excellent exercise in simple data transfer protocols and the limitations of different hardware platforms. For the rest of this post, I will move past the Mega+Teensy hardware and focus only on what the laptop has to do to produce the desired LED colors at each timestep.

Part 4 - Simple Effects

With the communication protocol worked out between each computing platform, my laptop had full control over the color of every LED and could update every value at 30 Hz. All it had to do was produce a properly-formatted 2048-byte packet containing the color values for each LED and send it off to the Teensy once every 33 ms. The next step in this project was to create a series of 'effects' that could be executed on the laptop and would determine what color values to send along at every update. These effects would later be the building blocks of the final visual product, so I needed a fairly general set that could be combined later.

I knew that while most effects would need a common set of parameters to determine their appearance (color, fade), they would often need their own specialized parameters based on the effect (pulsing rate, image location, etc). Instead of coding each effect to have a unique interface, I created a 'handle' struct in the code:

Show/Hide Code

The handle contained all of the parameters that any effect could need. It was then up the the individual effect to use the different values within a handle appropriately. The same handle could be applied to multiple effects, although I rarely did this. Simpler effects would use the fade and color variables, while more complicated ones would use the attr and file arrays for other effect attributes. The 'effect' struct contains miscellaneous variables related to a single effect (start time, which handle to use, unique pixel buffer, etc):

Show/Hide Code

When an effect is run, the appropriate function is called at every time step and passed an effect struct, a handle struct, and the current time so that the effect can know how far along it is.With this fairly straightforward interface worked out, I started creating effects.

Effect: Solid
Show/Hide Code

Set every LED to the same value based on color1 and fade.


Effect: Image
Show/Hide Code

Use projection mapping to display an image from file. The projection method (planar, cylindrical, spherical) and the projection direction are specified in attr.


Effect: Pulse
Show/Hide Code

Smoothly pulse the value of every LED between color1 and color2 at a rate determined by attr.


Effect: Circle
Show/Hide Code

Set LEDs within a certain distance of a specified location on the sphere to color1. The location and size of the circle are specified in attr, as well as a tapering parameter. This effect loads information on the 3D location of each LED from elsewhere.


Effect: Flicker
Show/Hide Code

Every time step, tells some number of LEDs to begin lighting up to either color1 or color2. The number of LEDs to light up per second, the fade in time, and the fade out time are all specified in attr. I've used a Poisson distribution to calculate how many LEDs to light up after every time step.


Effect: Packet
Show/Hide Code

Send a packet of light up or down the LED strip, ignoring the spherical geometry. The values in attr give the packet velocity, origin, and width.


Effect: Ring
Show/Hide Code

Given a start and end LED, lights up LEDs between them with a travelling wave of color. The start LED, end LED, wave speed, and wave length are specified in attr.


I wrote three more effects that were mostly variations on the ones above. There is a Marching effect that is similar to Flicker, but synchronizes the LEDs into two groups instead of letting each be random. The Video effect uses the same projection mapping as Image, but projects a frame of a video based on the elapsed time of the effect. The Single-Line Packet is a slight variation on Packet, where the packet of light is constrained to move along a single loop of the spiral-sphere. These effects were all introduced with the specific intention of using them in in the final video production.

As mentioned at the beginning of this section, each effect that needs to be running is called upon at each time step. If multiple effects are called, their unique pixel buffers are added together (additive blending) to create the final frame that is sent out to the electronics. For the Image and Video effects, I had the code pre-load the appropriate files before starting into the main loop. This reduced the amount of time spent calculating these effects during playback.

Part 5 - Sequenced Effects

With a complete set of effects written up, the next step of this project was to sort out how to create a sequence of effects that could be run as a unit. I not only needed multiple effects to run simultaneously, but to start and stop independent of each other. I also needed to be able to change the attributes within each handle as the sequence progressed. The simplest example of this would be a fade, where the fade attribute in a handle varies smoothly from 0 to 1 over the course of a few seconds, causing an effect to fade into view.

Ideally, I wanted key-framing. Every attribute of every handle would be able to change at any moment, and the changes would be defined by a set of key frames. Effects would be launched at specified times and produce visuals based on these key framed attributes. Since I had two days to set up the framework for this, I went with text-based key framing instead of a GUI. This required creating a 'language' that my code would interpret and turn into key frames. To build the syntax for this language, I wrote out some psuedo-code for how I wanted it to work:

# allow comments starting with a "#"
at 0:00, set fade of handle 1 to 0.0
at 0:00, set color1 of handle 1 to (255, 0, 0)
at 0:00, launch effect "solid" using handle 1
at 0:10, set fade of handle 1 to 1.0

The above example would create a handle with color1 set to red and fade set to 0. The effect Solid would be linked to this handle and launched at the beginning, but since the fade would be set to 0, it would not appear. Over the first ten seconds of the sequence, the fade would gradually increase until it equaled 1.0 (full on) at the end.

This example shows the two main commands I needed in my language; adding a key frame to an attribute of a handle and launching an effect linked to a handle. Shortening the above example to the final syntax I chose:

# comments still work
0:00 h1 fade 0.0
0:00 h1 color1 255 0 0
0:00 effect solid 1
0:10 h1 fade 1.0

Writing the code to parse these commands was fairly straightforward. Lines in a sequence file would start either with a #, indicating a comment, or a time stamp. In the latter case, the next bit of text after the space would either be "effect", indicating the launching of an effect at that time stamp, or "h*", indicating the addition of a key frame for an attribute of a handle. Since string parsing is an ugly matter in C, I'll save you the burden of looking at the code I wrote for this.

An array of empty handles and effects would be created at the start of the main program and filled with information as the sequence file was parsed. To keep track of key frames, the handle struct was modified to include arrays of keys:

Show/Hide Code

At each time step during the execution of the sequence, every attribute of every handle would be updated to reflect the appropriate value based on the nearest key frames:

Show/Hide Code

While this might not be the most efficient method of key framing, I only ever dealt with around 30 handles and key frames. If I had a need for hundreds or thousands, I would have worked some on optimizing this algorithm. I wrote up a quick testing sequence and recorded the result:

Show/Hide Code


Finally, with all of the effects playing nicely inside the sequencer, I began writing up the sequences for each movement of The Planets. I began by translating storyboards made by my collaborator on this project, then filled in gaps and introduced fades between different effects. After about 20 hours of translating pictures to text, I had the sequences written down in my made-up language:

In retrospect, maybe a GUI would have been worth an extra few days of work..

For filming the sphere in action, I would tell the main code to execute one of the sequences. It would pre-load any images and videos, then wait for my command to begin playback. With fabric draped around the sphere and cameras rolling, I would tell the code to play through the sequence. The visuals were completely determined by the written sequences, so we could do multiple takes and get identical results. I could start the sequences at any time stamp and fast-forward through the playback to get a quick preview before we committed to anything.

With around 55 minutes of content to produce, filming took quite a while. Writing sequences for the first time was a burden, but going through and editing them during the filming process was a nightmare. Even answering questions like "why is the sphere blue at 4:31" was difficult due to the nature of the sequencing language. The text-based key framing was a simple solution to a problem I had to solve in a short amount of time, but spending an extra few days implementing a graphical interface would have saved me quite a few headaches.

When the final full-length video is posted online, I will link to it here. For now, I can give you two still shots.

Venus, Bringer of Peace 

In all, I'm happy with the way this project turned out. Some of the technical bits may have been a little rough around the edges, but I've learned that's what happens when you have only a few weekends to get things working and an ideal budget of $0.

Uranus, the Magician

While the Planets project is over, the LED sphere and accompanying electronics are still fully-functional. I'll have to see if I can incorporate them in some other personal project down the road. It might even make for a nice hanging lamp in my apartment.