Run AI on ESP32: How to Deploy a Tiny LLM Using Arduino IDE & ESP-IDF (Step-by-Step Guide) ~ RRJ

Introduction

What if I told you that your tiny ESP32 board the same one you use to blink LEDs or log sensor data could run a Language Model like a miniature version of ChatGPT?

Sounds impossible, right? But it’s not.

Yes, you can run a Local Language Model (LLM) on a microcontroller!

Thanks to an amazing open-source project, you can now run a Tiny LLM (Language Learning Model) on an ESP32-S3 microcontroller. That means real AI inference text generation and storytelling running directly on a chip that costs less than a cup of coffee

In this blog, I’ll show you how to make that magic happen using both the Arduino IDE (for quick prototyping) and ESP-IDF (for full control and performance). Whether you’re an embedded tinkerer, a hobbyist, or just curious about what’s next in edge AI this is for you.

Ready to bring AI to the edge? Let’s dive in!

In this blog, you'll learn two ways to run a small LLM on ESP32:

Using Arduino IDE
Using ESP-IDF (Espressif’s official SDK)

Understanding the ESP32-S3 Architecture and Pinout

The ESP32-S3 is a powerful dual-core microcontroller from Espressif, designed for AIoT and edge computing applications. At its heart lies the Xtensa® LX7 dual-core processor running up to 240 MHz, backed by ample on-chip SRAM, cache, and support for external PSRAM—making it uniquely capable of running lightweight AI models like Tiny LLMs. It features integrated Wi-Fi and Bluetooth Low Energy (BLE) radios, multiple I/O peripherals (SPI, I2C, UART, I2S), and even native USB OTG support. The development board includes essential components such as a USB-to-UART bridge, 3.3V LDO regulator, RGB LED, and accessible GPIO pin headers. With buttons for boot and reset, and dual USB ports, the ESP32-S3 board makes flashing firmware and experimenting with peripherals effortless. Its advanced security features like secure boot, flash encryption, and cryptographic accelerators also ensure your edge AI applications stay safe and reliable. All of these capabilities together make the ESP32-S3 a perfect platform to explore and deploy tiny LLMs in real-time, even without the cloud.

What Is This Tiny LLM?

Based on the llama2.c model (a minimal C-based transformer).
Trained on TinyStories dataset (child-level English content).
Supports basic token generation at ~19 tokens/sec.
Model Size: ~1MB (fits in ESP32-S3 with 2MB PSRAM).

What You Need?

Item	Details
Board	ESP32-S3 with PSRAM (e.g., ESP32-S3FH4R2)
Toolchain	Arduino IDE or ESP-IDF
Model	`tinyllama.bin` (260K parameters)
Cable	USB-C or micro-USB for flashing

Method 1: Using Arduino IDE

Step 1: Install Arduino Core for ESP32

Open Arduino IDE.
Go to Preferences > Additional Board URLs

Add:

https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json

Go to Board Manager, search and install ESP32 by Espressif.

Step 2: Download the Code

The current project is in ESP-IDF format. For Arduino IDE, you can adapt it or wait for an Arduino port (coming soon). Meanwhile, here's a simple structure.

Create a new sketch: esp32_llm_arduino.ino
Add this example logic:

#include <Arduino.h>
#include "tinyllama.h" // Assume converted C array of model weights

void setup() {
  Serial.begin(115200);
  delay(1000);
  Serial.println("Starting Tiny LLM...");
  // Initialize model
  llama_init();
}

void loop() {
  String prompt = "Once upon a time";
  String result = llama_generate(prompt.c_str(), 100);
  Serial.println(result);
  delay(10000); // Wait before next run
}

Note: You'll need to convert the model weights (tinyllama.bin) into a C header file or read from PSRAM/flash.

Step 3: Upload and Run

Select your ESP32 board.
Upload the code.
Open Serial Monitor at 115200 baud.
You’ll see the model generate a few simple tokens based on your prompt!

Method 2: Using ESP-IDF

Step 1: Install ESP-IDF

Follow the official guide: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/get-started/

Step 2: Clone the Repo


git clone https://github.com/DaveBben/esp32-llm.git
cd esp32-llm

Step 3: Build the Project


idf.py set-target esp32s3
idf.py menuconfig  # Optional: Set serial port or PSRAM settings
idf.py build

Step 4: Flash to Board


idf.py -p /dev/ttyUSB0 flash
idf.py monitor

Output:

You’ll see generated text like:


Example Prompts and Outputs

Prompt: Once upon a time

→ Once upon a time there was a man who loved to build robots in his tiny shed.


Prompt: The sky turned orange and

→ The sky turned orange and the birds flew home to tell stories of the wind.

Prompt: In a small village, a girl

→ In a small village, a girl found a talking Cow who knew the future.

Prompt: He opened the old book and

→ He opened the old book and saw a map that led to a secret forest.

Prompt: Today is a good day to

→ Today is a good day to dance, to smile, and to chase butterflies.

Prompt: My robot friend told me

→ My robot friend told me that humans dream of stars and pancakes.

Prompt: The magic door appeared when

→ The magic door appeared when the moon touched the lake.

Prompt: Every night, the owl would

→ Every night, the owl would tell bedtime stories to the trees.

Prompt: Under the bed was

→ Under the bed was a box full of laughter and forgotten dreams.

Prompt: She looked into the mirror and

→ She looked into the mirror and saw a future full of colors and songs.

Tips to Improve

Use ESP32-S3 with 2MB PSRAM.
Enable dual-core execution.
Use ESP-DSP for vector operations.
Optimize model size using quantization (optional).

Demo Video

See it in action:
YouTube: Tiny LLM Running on ESP32-S3

Why Would You Do This?

While it's not practical for production AI, it proves:

AI inference can run on constrained hardware
Great for education, demos, and edge experiments
Future of embedded AI is exciting!

Link	Description
esp32-llm	Main GitHub repo
llama2.c	Original LLM C implementation
ESP-IDF	Official ESP32 SDK
TinyStories Dataset	Dataset used for training

Running an LLM on an ESP32-S3 is no longer a fantasy, it’s here. Whether you're an embedded dev, AI enthusiast, or maker, this project shows what happens when edge meets intelligence.

Bibliography / References

DaveBben / esp32-llm (GitHub Repository)
A working implementation of a Tiny LLM on ESP32-S3 with ESP-IDF
URL: https://github.com/DaveBben/esp32-llm
Karpathy / llama2.c (GitHub Repository)
A minimal, educational C implementation of LLaMA2-style transformers
URL: https://github.com/karpathy/llama2.c
TinyStories Dataset – HuggingFace
A synthetic dataset used to train small LLMs for children’s story generation
URL: https://huggingface.co/datasets/roneneldan/TinyStories
Espressif ESP-IDF Official Documentation
The official SDK and development guide for ESP32, ESP32-S2, ESP32-S3 and ESP32-C3
URL: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/get-started/
Hackaday – Large Language Models on Small Computers
A blog exploring the feasibility and novelty of running LLMs on microcontrollers
URL: https://hackaday.com/2024/09/07/large-language-models-on-small-computers
YouTube – Running an LLM on ESP32 by DaveBben
A real-time demonstration of Tiny LLM inference running on the ESP32-S3 board
URL: https://www.youtube.com/watch?v=E6E_KrfyWFQ

Arduino ESP32 Board Support Package
Arduino core for ESP32 microcontrollers by Espressif
URL: https://github.com/espressif/arduino-esp32

Image Links:

https://www.elprocus.com/wp-content/uploads/ESP32-S3-Development-Board-Hardware.jpg

https://krishworkstech.com/wp-content/uploads/2024/11/Group-1000006441-1536x1156.jpg

https://www.electronics-lab.com/wp-content/uploads/2023/01/esp32-s3-block-diagram-1.png

RRJ

(RAKESH RANJAN JENA)

Categories

Social

Translate

Thursday, 17 July 2025

Run AI on ESP32: How to Deploy a Tiny LLM Using Arduino IDE & ESP-IDF (Step-by-Step Guide)