Introduction
What if I told you that your tiny ESP32 board the same one you use to blink LEDs or log sensor data could run a Language Model like a miniature version of ChatGPT?
Sounds impossible, right? But it’s not.
Yes, you can run a Local Language Model (LLM) on a microcontroller!
Thanks to an amazing open-source project, you can now run a Tiny LLM (Language Learning Model) on an ESP32-S3 microcontroller. That means real AI inference text generation and storytelling running directly on a chip that costs less than a cup of coffee
In this blog, I’ll show you how to make that magic happen using both the Arduino IDE (for quick prototyping) and ESP-IDF (for full control and performance). Whether you’re an embedded tinkerer, a hobbyist, or just curious about what’s next in edge AI this is for you.
Ready to bring AI to the edge? Let’s dive in!
In this blog, you'll learn two ways to run a small LLM on ESP32:
- Using Arduino IDE
- Using ESP-IDF (Espressif’s official SDK)
Understanding the ESP32-S3 Architecture and Pinout
The ESP32-S3 is a powerful dual-core microcontroller from Espressif, designed for AIoT and edge computing applications. At its heart lies the Xtensa® LX7 dual-core processor running up to 240 MHz, backed by ample on-chip SRAM, cache, and support for external PSRAM—making it uniquely capable of running lightweight AI models like Tiny LLMs. It features integrated Wi-Fi and Bluetooth Low Energy (BLE) radios, multiple I/O peripherals (SPI, I2C, UART, I2S), and even native USB OTG support. The development board includes essential components such as a USB-to-UART bridge, 3.3V LDO regulator, RGB LED, and accessible GPIO pin headers. With buttons for boot and reset, and dual USB ports, the ESP32-S3 board makes flashing firmware and experimenting with peripherals effortless. Its advanced security features like secure boot, flash encryption, and cryptographic accelerators also ensure your edge AI applications stay safe and reliable. All of these capabilities together make the ESP32-S3 a perfect platform to explore and deploy tiny LLMs in real-time, even without the cloud.
What Is This Tiny LLM?
-
Based on the
llama2.c
model (a minimal C-based transformer). - Trained on TinyStories dataset (child-level English content).
- Supports basic token generation at ~19 tokens/sec.
- Model Size: ~1MB (fits in ESP32-S3 with 2MB PSRAM).
What You Need?
Item | Details |
---|---|
Board | ESP32-S3 with PSRAM (e.g., ESP32-S3FH4R2) |
Toolchain | Arduino IDE or ESP-IDF |
Model | tinyllama.bin (260K parameters) |
Cable | USB-C or micro-USB for flashing |
Method 1: Using Arduino IDE
Step 1: Install Arduino Core for ESP32
- Open Arduino IDE.
Go to Preferences > Additional Board URLs
Add:
- Go to Board Manager, search and install ESP32 by Espressif.
Step 2: Download the Code
The current project is in ESP-IDF format. For Arduino IDE, you can adapt it or wait for an Arduino port (coming soon). Meanwhile, here's a simple structure.
-
Create a new sketch:
esp32_llm_arduino.ino
- Add this example logic:
Note: You'll need to convert the model weights (
tinyllama.bin
) into a C header file or read from PSRAM/flash.
Step 3: Upload and Run
- Select your ESP32 board.
- Upload the code.
- Open Serial Monitor at 115200 baud.
- You’ll see the model generate a few simple tokens based on your prompt!
Method 2: Using ESP-IDF
Step 1: Install ESP-IDF
Follow the official guide: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/get-started/
Step 2: Clone the Repo
Step 3: Build the Project
Step 4: Flash to Board
Output:
You’ll see generated text like:
Tips to Improve
- Use ESP32-S3 with 2MB PSRAM.
- Enable dual-core execution.
- Use ESP-DSP for vector operations.
- Optimize model size using quantization (optional).
Demo Video
See it in action:
YouTube: Tiny LLM Running on ESP32-S3
Why Would You Do This?
While it's not practical for production AI, it proves:
- AI inference can run on constrained hardware
- Great for education, demos, and edge experiments
- Future of embedded AI is exciting!
Link | Description |
---|---|
esp32-llm | Main GitHub repo |
llama2.c | Original LLM C implementation |
ESP-IDF | Official ESP32 SDK |
TinyStories Dataset | Dataset used for training |
Bibliography / References
DaveBben / esp32-llm (GitHub Repository)A working implementation of a Tiny LLM on ESP32-S3 with ESP-IDF
URL: https://github.com/DaveBben/esp32-llm
Karpathy / llama2.c (GitHub Repository)
A minimal, educational C implementation of LLaMA2-style transformers
URL: https://github.com/karpathy/llama2.c
TinyStories Dataset – HuggingFace
A synthetic dataset used to train small LLMs for children’s story generation
URL: https://huggingface.co/datasets/roneneldan/TinyStories
Espressif ESP-IDF Official Documentation
The official SDK and development guide for ESP32, ESP32-S2, ESP32-S3 and ESP32-C3
URL: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/get-started/
Hackaday – Large Language Models on Small Computers
A blog exploring the feasibility and novelty of running LLMs on microcontrollers
URL: https://hackaday.com/2024/09/07/large-language-models-on-small-computers
YouTube – Running an LLM on ESP32 by DaveBben
A real-time demonstration of Tiny LLM inference running on the ESP32-S3 board
URL: https://www.youtube.com/watch?v=E6E_KrfyWFQ
Arduino ESP32 Board Support Package
Arduino core for ESP32 microcontrollers by Espressif
URL: https://github.com/espressif/arduino-esp32
Image Links:
https://www.elprocus.com/wp-content/uploads/ESP32-S3-Development-Board-Hardware.jpg
https://krishworkstech.com/wp-content/uploads/2024/11/Group-1000006441-1536x1156.jpg
https://www.electronics-lab.com/wp-content/uploads/2023/01/esp32-s3-block-diagram-1.png
0 comments:
Post a Comment