Teaching a Small LLM to Design Electronic Circuits: Fine-Tuning Qwen3-4B on 100K KiCad Netlists

How I built a 100K-example dataset of executable circuit netlists and fine-tuned a 4B parameter model that scores 88% on functional circuit generation, rivaling GPT-4o's published benchmarks.

The Problem: LLMs Can't Do PCB Design (Yet)

If you've ever asked ChatGPT to design you a circuit, you know the pain. It'll happily describe how an H-bridge works, but ask it to produce an actual netlist you can open in KiCad? You'll get hallucinated pin numbers, impossible connections, and formats that parse to nothing.

The core issue is KiCad's native s-expression netlist format. It's deeply nested parentheses with domain-specific tokens, completely alien to language models trained on code and prose. Recent research confirms this: PCBSchemaGen (arXiv:2602.00510) found that even GPT-4o struggles with raw s-expressions but achieves 87% Pass@1 when you reframe the problem as Python code generation using SKiDL.

I can confirm this firsthand. I tried the s-expression approach before pivoting to SKiDL. My earlier models (AbijahKaj/qwen3.5-4b-kicad-netlist and AbijahKaj/qwen3-4b-kicad-netlist) were trained on raw KiCad s-expression netlists, and they failed poorly. The models simply couldn't keep track of which nested block they were generating. Too many parentheses stacking up, and the output would degenerate into garbage mid-way through a component definition. They'd lose count of opening/closing parens, mix up which (property ...) belonged to which (symbol ...), and produce structurally invalid netlists that KiCad couldn't parse at all.

That hard-won insight, "circuits as Python, not s-expressions", is what this entire project is built on.

The Idea: SKiDL as the Bridge

SKiDL is a Python library that lets you define electronic circuits programmatically:

from skidl import *

mcu = Part("MCU_Microchip", "ATmega328P-AU", footprint="Package_QFP:TQFP-32_7x7mm_P0.8mm")
cap = Part("Device", "C", value="100nF", footprint="Capacitor_SMD:C_0402_1005Metric")

vcc = Net("VCC")
gnd = Net("GND")

vcc += mcu["VCC"], cap["1"]
gnd += mcu["GND"], cap["2"]

generate_netlist()

Run that Python script and you get a valid .net file KiCad can import directly. The format is:

3× more compact than s-expression netlists
Executable, so syntax errors are immediately caught
Python, exactly what LLMs are best at generating

If I could build a large enough dataset of (natural language → SKiDL Python) pairs, a small fine-tuned model might actually learn to design circuits.

Building the Dataset: From 0 to 100K Examples

Dataset: AbijahKaj/kicad-netlist-sft-dataset

The dataset was built over about a week (April 28 to May 2, 2026) through multiple conversion pipelines, each contributing a different source of circuit knowledge:

Source 1: Real KiCad Schematics from GitHub (~45,000 examples)

Starting from bshada/open-schematics, a collection of .kicad_sch files scraped from 6,000+ GitHub repositories, I wrote a pure-Python Union-Find converter that:

Parses KiCad schematic files (both modern v6+ and legacy EESchema formats)
Extracts components, pin assignments, and wire connectivity
Generates equivalent SKiDL Python code
Produces a parallel structured JSON netlist

These are real circuits designed by real engineers: ESP32 IoT boards, STM32 flight controllers, Raspberry Pi CM4 carriers, mechanical keyboard PCBs, FPGA eval boards, VESC motor controllers. The full breadth of open-source hardware.

Source 2: LTspice Circuits → SKiDL (~54,000 examples)

Si7li/ltspice-spice-circuits provided a large corpus of simulation circuits. These needed a SPICE-to-SKiDL translation layer since the component naming conventions, net syntax, and pin numbering are all different. But the underlying circuit knowledge (op-amp configurations, filter topologies, power supply designs) is invaluable training signal.

Source 3: Quality Filtering & Cleanup

Not every conversion produces a useful training example. I scored each example on:

Component and net presence
GND/power net inclusion
Multi-node connectivity (circuits where nets actually connect multiple things)
Net-to-component ratio

Filtered at score ≥ 0.6 and ≥ 2 multi-node nets. Removed 595 examples with empty netlist_json. Then relaxed filters slightly to recover 528 borderline schematics that were still useful.

Source 4: Tool-Augmented Examples (285)

A small but important subset: multi-turn conversations where the model first calls search_component and get_datasheet_info tools to look up IC pinouts before generating the netlist. This teaches the model to reason about component selection rather than just pattern-match.

Final Dataset Stats

Metric	Value
Total examples	100,179
Format	ChatML (system + user + assistant)
Output format	SKiDL Python (in messages) + JSON netlist (parallel column)
Unique source repos	~6,000+
Tool-augmented	285

Every single example includes both representations, so you can train on SKiDL Python, structured JSON, or both.

Training: LoRA on Qwen3-4B

Model: AbijahKaj/qwen3-4b-skidl

I chose Qwen3-4B as the base. It's a strong code-capable model at a size that's actually deployable on consumer hardware. The full training run took about 17 hours on my RTX 5090 on May 4–5, checkpointing every 500 steps. The 32GB of VRAM on the 5090 made it comfortable to run LoRA at r=64 with 8192 context length without any gradient checkpointing tricks.

For the dataset curation and training script iteration, I leaned heavily on ML-Intern, Hugging Face's AI coding assistant. It handled a lot of the tedious parts: researching current TRL/PEFT APIs, writing the conversion pipelines, debugging dataset formatting issues, and setting up the training configs. Having it look up documentation and find working examples saved me from the usual cycle of "try outdated API, get cryptic error, google for 20 minutes, repeat." I could focus on the actual design decisions (which format to use, how to score quality, what LoRA rank makes sense) while it handled the boilerplate and plumbing.

Training Configuration

Parameter	Value
Method	SFT + LoRA (PEFT)
LoRA rank / alpha	r=64, α=32, dropout=0.05
Target modules	q/k/v/o_proj, gate/up/down_proj
Trainable params	~132M (5.65% of total)
Epochs	2
Peak learning rate	2e-4 (cosine decay)
Effective batch size	8
Max sequence length	8192 tokens
Total steps	~23,618

Why LoRA at r=64?

Circuit netlists are structurally repetitive (lots of Part(...), Net(...), += patterns) but semantically diverse (thousands of different component libraries, pin assignments, connection topologies). A relatively high rank (64) gives the adapter enough capacity to learn the format without losing the nuance of which pins connect where.

Loss Curve

The model converged quickly since circuits have strong structural regularity:

Phase	Train Loss	Token Accuracy
Start	0.86	75%
End of epoch 1	0.17	95%
End of epoch 2	0.15	95–96%
Final eval loss	0.158	95.3%

Evaluation: Does It Actually Work?

I ran functional validation on 5 held-out circuits, scoring each on:

Python syntax correctness
SKiDL import/structure validity
Correct number of nets and components
Proper connectivity (are pins actually wired together?)
GND net presence and correctness

Circuit	Score
LED blink (ATtiny85)	0.85
USB power meter (ATmega328P + INA219)	0.85
CAN bus (MCP2515 + TJA1050)	0.95
Average	0.883

An 88.3% functional score from a 4B model is noteworthy. For context, PCBSchemaGen reports 87% Pass@1 with GPT-4o (a model orders of magnitude larger) on a similar SKiDL generation task. The benchmarks aren't directly comparable since they use different circuits and different scoring rubrics, but it suggests that focused fine-tuning on a high-quality domain dataset can close the gap with general-purpose frontier models.

Try It

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AbijahKaj/qwen3-4b-skidl", dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("AbijahKaj/qwen3-4b-skidl")

messages = [
    {"role": "system", "content": "You are an expert electronics engineer and KiCad schematic designer. When given a description of an electronic circuit, generate executable SKiDL Python code that defines the circuit using the SKiDL library."},
    {"role": "user", "content": "Design a CAN bus interface using MCP2515 controller and TJA1050 transceiver on SPI, with 120 ohm termination resistor."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

The output is runnable. python my_circuit.py produces a .net file you can import into KiCad.

What I Learned

1. Format matters more than data volume. The pivot from s-expressions to SKiDL Python was the single biggest decision. Research (PCBSchemaGen, CircuitLM) guided this. Always check the literature before committing to a data format.

2. Real schematics > synthetic circuits. The 45K examples from actual GitHub repos carry design patterns (decoupling cap placement, pull-up resistor conventions, power supply topologies) that synthetic generation struggles to replicate.

3. Parallel representations are cheap insurance. Every example has both SKiDL Python and structured JSON. It cost almost nothing to generate both during conversion, and it means the dataset supports multiple training strategies without re-processing.

4. Small models can punch above their weight. 4B parameters + 100K focused examples + LoRA = competitive with frontier models on a narrow domain task. The model doesn't know what a haiku is anymore, but it can wire up an STM32.

What's Next

Validation harness: Actually running the generated SKiDL code and checking it produces valid netlists (beyond the 5-circuit spot check)
ERC pass rate: Running KiCad's Electrical Rules Check on generated circuits
More tool-augmented data: The 285 tool-use examples are promising but tiny. Scaling this could teach the model to verify its own pin assignments
Bigger models: The same dataset on Qwen3-8B or 14B could push scores higher

Teaching a Small LLM to Design Electronic Circuits: Fine-Tuning Qwen3-4B on 100K KiCad Netlists

The Problem: LLMs Can't Do PCB Design (Yet)

The Idea: SKiDL as the Bridge

Building the Dataset: From 0 to 100K Examples

Source 1: Real KiCad Schematics from GitHub (~45,000 examples)

Source 2: LTspice Circuits → SKiDL (~54,000 examples)

Source 3: Quality Filtering & Cleanup

Source 4: Tool-Augmented Examples (285)

Final Dataset Stats

Training: LoRA on Qwen3-4B

Training Configuration

Why LoRA at r=64?

Loss Curve

Evaluation: Does It Actually Work?

Try It

What I Learned

What's Next

Links

Comments

More from this blog

Declarative vs Imperative APIs in JavaScript (and Why Chainable Code Feels So Different)

Offline AI with React Native

Using Raspberry Pi Pico W to send data via Bluetooth to a SwiftUI app

Animating map region with Animated API

Command Palette

The Problem: LLMs Can't Do PCB Design (Yet)

The Idea: SKiDL as the Bridge

Building the Dataset: From 0 to 100K Examples

Source 1: Real KiCad Schematics from GitHub (~45,000 examples)

Source 2: LTspice Circuits → SKiDL (~54,000 examples)

Source 3: Quality Filtering & Cleanup

Source 4: Tool-Augmented Examples (285)

Final Dataset Stats

Training: LoRA on Qwen3-4B

Training Configuration

Why LoRA at r=64?

Loss Curve

Evaluation: Does It Actually Work?

Try It

What I Learned

What's Next

Links

Comments

More from this blog