Illumindex
December 21, 2025
An exploration of display drivers and IoT systems, from scratch.
December 21, 2025
An exploration of display drivers and IoT systems, from scratch.

This project started as a simple idea: I wanted a small display on my desk that could show bits of information throughout the day. There are already a ton of these informational displays on the market, but if I’m being honest, actually having the display was never as interesting to me as building it. I’d always wondered how LED matrix displays worked, and combining one with networking sounded like a great excuse to also explore IoT development.
Thus "Illumindex" was born, short for "Illuminated Information Index".
If you want to learn the nitty-gritty, low-level implementation of an LED matrix display and its driver – at the hardware and firmware level – this article will hopefully be informative and useful for you. If you just want to see the pictures and the parts list, jump down to the sections after the software description.
The software is open source and available at github.com/zbauman3/illumindex.
The primary goal of this project, from a software perspective, was to learn how to build a display driver and design a small but complete IoT system from the ground up. To support that goal, I chose to write all of the application code myself. No third-party libraries are used in the firmware, only the manufacturer SDK, the roughly 4,000 lines of code that make up the system are entirely my own.
The project is built on top of the ESP-IDF, the official SDK provided by Espressif for the ESP32 family of microcontrollers. Using the ESP-IDF provides access to the underlying hardware, startup code, and toolchain support without abstracting away the details that are important when working close to the metal. It also provides some useful utilities that saved me a ton of time, like cJSON.
For the backend, I used a simple serverless application built with Next.js and deployed on Vercel. This is what I use day to day at work, which made it easy to stand up a basic API endpoint and iterate. The backend primarily serves as a lightweight data source for the device, but also houses some visual development utilities like a local simulator that mirrors the current state of the LED matrix and a small utility for drawing and generating bitmaps.
The display driver is easily the most interesting part of this project, and will be covered in the most detail. For more detailed information on it, feel free to checkout the source code here.
Before diving into the software, it is important to understand the physical layer that the display driver interacts with. An LED matrix display is nothing more than a grid where each point is made from small red, green, and blue LEDs:
In an LED matrix display, not all LEDs are illuminated at the same time. Instead, only two rows are active at any given moment: one row on the top half of the display and one row on the bottom half. To show an image, the system rapidly cycles through all rows of the display, fast enough to exploit persistence of vision and appear as a solid image. Achieving this refresh speed is trivial for a microcontroller. The ESP32-S3 used in this project has two cores operating at 240 MHz, which allows one core to be dedicated to the display driver while the other handles everything else.
The display used here is a 64x64 RGB LED matrix. It is divided into two halves, top and bottom, with each half measuring 64x32 and containing its own control hardware. Each half uses three 64-bit shift registers, one for red, green, and blue, which represent the columns of the display. Between these shift registers and the LEDs are latch circuits. These latches allow the data for all 64 columns to be shifted in slowly and then displayed simultaneously by toggling the latch signal. The latches also expose an enable/disable signal, which is used by the driver algorithm described later.
Row selection is handled by a 5-to-32 address decoder. Conceptually, the address decoder selects which row is connected to the negative side of the circuit, while the shift registers determine which columns drive the positive side for red, green, and/or blue. With simple on/off control of red, green, and blue, the display is limited to eight colors: red, green, blue, cyan, magenta, yellow, black, and white. Producing additional colors requires more advanced techniques in the display driver that is described later.
Each half of the matrix has its own set of shift registers and latches. However, both halves share the clock signal for the shift registers, the latch control and enable signals, and the output from the address decoder. This means that while the columns on each half can be controlled independently, all other control signals operate in unison across the entire display.
Here's a simplified block diagram of these physical components:
Now that we understand the hardware involved, we can discuss how to show an image on the display using software. There are several possible approaches for driving an LED matrix, but the driver described here is built around 24 bit, true color. Each pixel is stored with one byte each for red, green, and blue. As discussed earlier, the hardware can only turn each color channel fully on or fully off, giving us just eight possible color combinations per pixel. To translate the three bytes of color data into the 16,777,216 possible colors of true color, the driver uses a form of pulse-width modulation called binary coded modulation (BCM).
At a high level, the algorithm works by repeatedly displaying individual bits of the color data, holding more significant bits on the screen for longer periods of time. The steps below describe showing a single frame on the display:
bit N in the RGB bytes for the current row. Toggle the Clock line to shift this data into the registers.bit N and repeat steps 1 through 7 for all 8 bits.The most important part of this process is the delay in step 6. This delay is what enables binary coded modulation. A base time is chosen and multiplied by 2 raised to the power of the current bit index: time * 2^bit. For example, with a base time of 0.7 µs, the delays for each bit would be:
This means that more significant bits remain illuminated for longer periods, which causes larger numerical values to appear brighter to the human eye. This also enables mixing of different ratios of red, green, and blue to achieve "true color".
Choosing the base delay time is a balancing act between microcontroller performance and visible flickering. If the delay is too short, the MCU will not be able to complete all steps of the algorithm before moving on to the next row or bit. If the delay is too long, the total time required to draw all rows and bits increases, causing the display to appear flickery. Ideally, this value is tuned alongside the real-world execution time of the driver so that the entire display refreshes at a full-frame rate of roughly 120 to 240 Hz.
Here are the calculations I came up with for Illumindex:
7 // cpu_f = 240,000,000hz // CPU CLOCK 8 // cycles = 2,280 // via cpu_hal_get_cycle_count 9 // oneBit = cycles / cpu_f // 0.0095 ms 10 // miscTime = 0.001 ms // time between alarm & handler 11 // oneByte = ((oneBit + miscTime) * 8) // 0.084 ms 12 13 // src = 80,000,000hz // APB CLOCK 14 // speed = 40,000,000hz // LED_MATRIX_TIMER_RESOLUTION 15 // timer = 28 // LED_MATRIX_TIMER_ALARM 16 // bit0 = speed / (timer * 2^0) // 0.0007 ms 17 // bit1 = speed / (timer * 2^1) // 0.0014 ms 18 // bit2 = speed / (timer * 2^2) // 0.0028 ms 19 // bit3 = speed / (timer * 2^3) // 0.0056 ms 20 // bit4 = speed / (timer * 2^4) // 0.0112 ms 21 // bit5 = speed / (timer * 2^5) // 0.0224 ms 22 // bit6 = speed / (timer * 2^6) // 0.0448 ms 23 // bit7 = speed / (timer * 2^7) // 0.0896 ms 24 // rowTimers = bit0 + ... + bit7 // 0.1785 ms 25 // row = rowTimers + oneByte // 0.2625 ms 26 // screen = row × 32 // 8.4 ms 27 // hz = screen to hz // 119.05 Hz
In this calculation, cycles represents the number of CPU cycles required to execute the algorithm for a single bit. Dividing this value by the CPU frequency, accounting for miscellaneous overhead, and then multiplying by 8 gives us the total active CPU time needed to process one byte of color data (oneByte).
Next, the total delay time spent waiting between each bit (rowTimers) is added, and the result is multiplied by 32 to account for all rows in the display. This produces a full-screen refresh rate of 119.05 Hz. It's not a perfect 120, but it's good enough for me.
There are many ways to implement this algorithm on the ESP32-S3. For this project, I chose a combination of Dedicated GPIO, Standard GPIO, and General Purpose Timer. While there are more efficient approaches (like using SPI with DMA) I wanted direct, fine-grained control over the output signals to keep the overall design and behavior easier to understand.
Using the ESP-IDF high-level APIs introduces noticeable overhead due to safety checks and conditional logic. Since the display driver algorithm needs to execute within a few thousand CPU cycles, that overhead adds up quickly. For lower-level, performance-critical code paths, ESP-IDF also exposes Hardware Abstraction APIs. These APIs underpin large portions of the logic that the ESP-IDF is built on top of, and at the lower level they often interact with the hardware through inlined assembly calls in C, allowing GPIO operations to be performed in just a handful of CPU instructions. The tradeoff is that they require more care and discipline when writing and maintaining the code.
All runtime state for the LED matrix is stored in the led_matrix_state_t struct:
100 typedef struct { 101 led_matrix_pins_t *pins; 102 dedic_gpio_bundle_handle_t gpio_bundle; 103 gptimer_handle_t timer; 104 uint8_t *buffer; 105 uint8_t rowNum; 106 uint8_t bitNum; 107 uint8_t width; 108 uint8_t height; 109 uint8_t halfHeight; 110 uint16_t splitOffset; 111 bool fiveBitAddress; 112 uint16_t currentBufferOffset; 113 } led_matrix_state_t;
This struct contains pointers to the configured pins, pointers to the Dedicated GPIO bundle and the general-purpose timer, the display data buffer, and some additional state used by the driver. The row address pins (A0 through A4) and the output-enable pin are driven using standard GPIO, while the shift register pins for both halves of the display (the RGB data lines, clock, and latch) are controlled using Dedicated GPIO.
This split allows both the top and bottom row RGB values to be written to the shift registers in just two instructions. At 240 MHz, these writes occur so quickly that the shift registers themselves cannot keep up, requiring an explicit nop instruction to introduce a tiny delay between clock edges:
20 #define shift_out_val(_val) \ 21 ({ \ 22 dedic_gpio_cpu_ll_write_mask(0b01111111, (_val)); \ 23 asm volatile("nop"); /* delay for ICN2037 timing needs */ \ 24 dedic_gpio_cpu_ll_write_mask(0b01111111, (_val) | 0b01000000); \ 25 })
The implementation does not use double buffering, but the display data is pre-processed before being rendered. The frame buffer is stored as a byte array where each byte directly corresponds to a Dedicated GPIO output value. Each byte is laid out in the form 0,0,R1,G1,B1,R2,G2,B2, with the remaining two bits reserved for control signals such as the clock and latch. This allows each column's RGB data for both rows to be shifted out with a single write operation.
The core rendering logic of the algorithm itself is relatively straightforward, and the loop for running this logic is controlled by the General Purpose Timer. One detail worth noting is that the shift_out_row function is a macro that unrolls the writes for all 64 columns.
Here's the implementation:
105 static bool IRAM_ATTR led_matrix_timer_callback( 106 gptimer_handle_t timer, const gptimer_alarm_event_data_t *event_data, 107 void *user_data) { 108 static led_matrix_handle_t matrix; 109 matrix = (led_matrix_handle_t)user_data; 110 111 // stop the timer to prevent it from going off again during this run. It will 112 // not interrupt this function since it's the same priority, but it will cause 113 // this to re-run immediately. 114 gptimer_stop(matrix->timer); 115 116 // cycle through rows and bits 117 matrix->rowNum++; 118 if (matrix->rowNum >= matrix->halfHeight) { 119 matrix->rowNum = 0; 120 matrix->bitNum++; 121 if (matrix->bitNum >= LED_MATRIX_BIT_DEPTH) { 122 matrix->bitNum = 0; 123 } 124 } 125 126 // calculate the current offset into the buffer so that we don't need to for 127 // every pixel 128 matrix->currentBufferOffset = 129 (matrix->rowNum * matrix->width) + 130 (matrix->bitNum * matrix->width * matrix->height); 131 132 // shift out RGB for both rows at once using dedicated GPIO 133 shift_out_row(matrix->buffer, matrix->currentBufferOffset); 134 135 // blank screen 136 gpio_ll_set_level(&GPIO, matrix->pins->oe, 1); 137 138 // set new address. 139 gpio_ll_set_level(&GPIO, matrix->pins->a0, matrix->rowNum & 0b00001); 140 gpio_ll_set_level(&GPIO, matrix->pins->a1, matrix->rowNum & 0b00010); 141 gpio_ll_set_level(&GPIO, matrix->pins->a2, matrix->rowNum & 0b00100); 142 gpio_ll_set_level(&GPIO, matrix->pins->a3, matrix->rowNum & 0b01000); 143 // for this project, it's always 5 bits, but this is here incase I ever 144 // reuse the logic 145 if (matrix->fiveBitAddress) { 146 gpio_ll_set_level(&GPIO, matrix->pins->a4, matrix->rowNum & 0b10000); 147 } 148 149 // latch, then reset all bundle outputs 150 dedic_gpio_cpu_ll_write_mask(0b10000000, 0b00000000); 151 asm volatile("nop"); // delay so the ICN2037 can keep up 152 dedic_gpio_cpu_ll_write_mask(0b10000000, 0b10000000); 153 154 // show the new row 155 gpio_ll_set_level(&GPIO, matrix->pins->oe, 0); 156 157 // reset and start the timer with the delay that is appropriate for this bit. 158 // this likely could be adjusted by the time it took to run the above code, 159 // but this is close enough for now. 160 gptimer_set_raw_count(matrix->timer, timer_count_values[matrix->bitNum]); 161 gptimer_start(matrix->timer); 162 163 return false; 164 }
Outside of the display driver, most of the software architecture is fairly mundane, so I will only cover it at a high level. The firmware is broken into logical units based on responsibility, using the ESP-IDF concept of Components. Together, these components form a simple pipeline that moves data from the network to the display. At a glance, the firmware is composed of the following major components:
The gfx component provides low-level drawing primitives, including a display buffer and font support. These utilities operate on bitmapped graphics and expose functions for drawing text and lines.
The network component wraps ESP-IDF networking APIs and is responsible for managing the WiFi connection and making HTTP requests.
The commands component is responsible for transforming API responses into simplified C structures that represent drawing operations. Instead of transferring full bitmap data over the network, the API returns a compact set of commands that describe how to generate the image locally. These commands are then applied to the display buffer using the gfx primitives.
Finally, the display and state components tie the system together. They initialize all subsystems, periodically fetch new data via the network component, pass responses to commands for processing, apply the resulting drawing commands to the display buffer, and preprocess the final buffer for consumption by the LED matrix driver.
It’s the end of 2025 at the time of writing. LLMs are everywhere and anyone in the tech industry is aware that they're nearly inescapable at this point. Endless discussions are being had about the future of software engineering and how LLMs fit into it – or maybe, how humans fit into it. I don’t believe this article is the place for me to brain dump my opinions. But I'll say that I am concerned with the patterns that we are beginning to see. More and more engineers use LLMs to achieve a goal, but then move on without taking time to learn from the problems that they have just solved. This velocity is important for business objectives, but is not ideal for skills growth.
I believe that growth in engineering fundamentals is a journey that never ends, and outsourcing thought and understanding to LLMs will have deep, negative impacts on engineers in the long run. Less human involvement in software engineering may be the future, and that's okay. But for now, I like software engineering and deeply value the growth that comes from understanding a problem and designing a solution.
This project was written without any LLM "agent" involvement. It was written with LLM autocomplete "suggestions". To me, this is the perfect symbiosis between LLMs and software engineering. It allows me to drive the line-by-line architecture of the system, encounter problems, explore solutions, choose the directions of the software, and maintain a deep understanding of the system, while at the same time speeding up development and reducing physical fatigue.
This article, however, has been significantly reviewed and edited by an LLM. I've written all initial versions, but I've also passed the output to an LLM for corrections and consistency.
The hardware for this project is nothing special. The MCU is an ESP32-S3 development board, connected to a 64x64 RGB LED Matrix panel. It is powered with a 5V 4A switching power supply. I also added a switch to toggle the MCU's power source between the USB port and the power supply, since there are no protection diodes for the USB port on the development board I used.
| Part | Description |
|---|---|
| Adafruit ESP32-S3 Feather | The main MCU |
| 64x64 RGB LED matrix panel | The LED matrix panel |
| 5V 4A (4000mA) switching power supply | Power supply |
| 2x8 IDC breakout pins | Connection pins for the ribbon cable |
| 2x8 IDC ribbon cable | Connection between the MCU and the LED matrix |
| Black LED diffusion acrylic | Makes the LEDs less harsh and easier to look at |
| Adafruit Perma-Proto Half-sized Breadboard PCB | The board that everything is soldered to |









The body was printed with PETG. If I were to improve this design a little, I would have made it easier to plug in the USB cord without needing to disassemble the body.
The schematic was designed with KiCad.
