Teaching a Small LLM to Design Electronic Circuits: Fine-Tuning Qwen3-4B on 100K KiCad Netlists
Building stuff
How I built a 100K-example dataset of executable circuit netlists and fine-tuned a 4B parameter model that scores 88% on functional circuit generation, rivaling GPT-4o's published benchmarks.
The Problem: LLMs Can't Do PCB Design (Yet)
If you've ever asked ChatGPT to design you a circuit, you know the pain. It'll happily describe how an H-bridge works, but ask it to produce an actual netlist you can open in KiCad? You'll get hallucinated pin numbers, impossible connections, and formats that parse to nothing.
The core issue is KiCad's native s-expression netlist format. It's deeply nested parentheses with domain-specific tokens, completely alien to language models trained on code and prose. Recent research confirms this: PCBSchemaGen (arXiv:2602.00510) found that even GPT-4o struggles with raw s-expressions but achieves 87% Pass@1 when you reframe the problem as Python code generation using SKiDL.
I can confirm this firsthand. I tried the s-expression approach before pivoting to SKiDL. My earlier models (AbijahKaj/qwen3.5-4b-kicad-netlist and AbijahKaj/qwen3-4b-kicad-netlist) were trained on raw KiCad s-expression netlists, and they failed poorly. The models simply couldn't keep track of which nested block they were generating. Too many parentheses stacking up, and the output would degenerate into garbage mid-way through a component definition. They'd lose count of opening/closing parens, mix up which (property ...) belonged to which (symbol ...), and produce structurally invalid netlists that KiCad couldn't parse at all.
That hard-won insight, "circuits as Python, not s-expressions", is what this entire project is built on.
The Idea: SKiDL as the Bridge
SKiDL is a Python library that lets you define electronic circuits programmatically:
from skidl import *
mcu = Part("MCU_Microchip", "ATmega328P-AU", footprint="Package_QFP:TQFP-32_7x7mm_P0.8mm")
cap = Part("Device", "C", value="100nF", footprint="Capacitor_SMD:C_0402_1005Metric")
vcc = Net("VCC")
gnd = Net("GND")
vcc += mcu["VCC"], cap["1"]
gnd += mcu["GND"], cap["2"]
generate_netlist()
Run that Python script and you get a valid .net file KiCad can import directly. The format is:
3× more compact than s-expression netlists
Executable, so syntax errors are immediately caught
Python, exactly what LLMs are best at generating
If I could build a large enough dataset of (natural language → SKiDL Python) pairs, a small fine-tuned model might actually learn to design circuits.
Building the Dataset: From 0 to 100K Examples
Dataset: AbijahKaj/kicad-netlist-sft-dataset
The dataset was built over about a week (April 28 to May 2, 2026) through multiple conversion pipelines, each contributing a different source of circuit knowledge:
Source 1: Real KiCad Schematics from GitHub (~45,000 examples)
Starting from bshada/open-schematics, a collection of .kicad_sch files scraped from 6,000+ GitHub repositories, I wrote a pure-Python Union-Find converter that:
Parses KiCad schematic files (both modern v6+ and legacy EESchema formats)
Extracts components, pin assignments, and wire connectivity
Generates equivalent SKiDL Python code
Produces a parallel structured JSON netlist
These are real circuits designed by real engineers: ESP32 IoT boards, STM32 flight controllers, Raspberry Pi CM4 carriers, mechanical keyboard PCBs, FPGA eval boards, VESC motor controllers. The full breadth of open-source hardware.
Source 2: LTspice Circuits → SKiDL (~54,000 examples)
Si7li/ltspice-spice-circuits provided a large corpus of simulation circuits. These needed a SPICE-to-SKiDL translation layer since the component naming conventions, net syntax, and pin numbering are all different. But the underlying circuit knowledge (op-amp configurations, filter topologies, power supply designs) is invaluable training signal.
Source 3: Quality Filtering & Cleanup
Not every conversion produces a useful training example. I scored each example on:
Component and net presence
GND/power net inclusion
Multi-node connectivity (circuits where nets actually connect multiple things)
Net-to-component ratio
Filtered at score ≥ 0.6 and ≥ 2 multi-node nets. Removed 595 examples with empty netlist_json. Then relaxed filters slightly to recover 528 borderline schematics that were still useful.
Source 4: Tool-Augmented Examples (285)
A small but important subset: multi-turn conversations where the model first calls search_component and get_datasheet_info tools to look up IC pinouts before generating the netlist. This teaches the model to reason about component selection rather than just pattern-match.
Final Dataset Stats
| Metric | Value |
|---|---|
| Total examples | 100,179 |
| Format | ChatML (system + user + assistant) |
| Output format | SKiDL Python (in messages) + JSON netlist (parallel column) |
| Unique source repos | ~6,000+ |
| Tool-augmented | 285 |
Every single example includes both representations, so you can train on SKiDL Python, structured JSON, or both.
Training: LoRA on Qwen3-4B
Model: AbijahKaj/qwen3-4b-skidl
I chose Qwen3-4B as the base. It's a strong code-capable model at a size that's actually deployable on consumer hardware. The full training run took about 17 hours on my RTX 5090 on May 4–5, checkpointing every 500 steps. The 32GB of VRAM on the 5090 made it comfortable to run LoRA at r=64 with 8192 context length without any gradient checkpointing tricks.
For the dataset curation and training script iteration, I leaned heavily on ML-Intern, Hugging Face's AI coding assistant. It handled a lot of the tedious parts: researching current TRL/PEFT APIs, writing the conversion pipelines, debugging dataset formatting issues, and setting up the training configs. Having it look up documentation and find working examples saved me from the usual cycle of "try outdated API, get cryptic error, google for 20 minutes, repeat." I could focus on the actual design decisions (which format to use, how to score quality, what LoRA rank makes sense) while it handled the boilerplate and plumbing.
Training Configuration
| Parameter | Value |
|---|---|
| Method | SFT + LoRA (PEFT) |
| LoRA rank / alpha | r=64, α=32, dropout=0.05 |
| Target modules | q/k/v/o_proj, gate/up/down_proj |
| Trainable params | ~132M (5.65% of total) |
| Epochs | 2 |
| Peak learning rate | 2e-4 (cosine decay) |
| Effective batch size | 8 |
| Max sequence length | 8192 tokens |
| Total steps | ~23,618 |
Why LoRA at r=64?
Circuit netlists are structurally repetitive (lots of Part(...), Net(...), += patterns) but semantically diverse (thousands of different component libraries, pin assignments, connection topologies). A relatively high rank (64) gives the adapter enough capacity to learn the format without losing the nuance of which pins connect where.
Loss Curve
The model converged quickly since circuits have strong structural regularity:
| Phase | Train Loss | Token Accuracy |
|---|---|---|
| Start | 0.86 | 75% |
| End of epoch 1 | 0.17 | 95% |
| End of epoch 2 | 0.15 | 95–96% |
| Final eval loss | 0.158 | 95.3% |
Evaluation: Does It Actually Work?
I ran functional validation on 5 held-out circuits, scoring each on:
Python syntax correctness
SKiDL import/structure validity
Correct number of nets and components
Proper connectivity (are pins actually wired together?)
GND net presence and correctness
| Circuit | Score |
|---|---|
| LED blink (ATtiny85) | 0.85 |
| USB power meter (ATmega328P + INA219) | 0.85 |
| CAN bus (MCP2515 + TJA1050) | 0.95 |
| Average | 0.883 |
An 88.3% functional score from a 4B model is noteworthy. For context, PCBSchemaGen reports 87% Pass@1 with GPT-4o (a model orders of magnitude larger) on a similar SKiDL generation task. The benchmarks aren't directly comparable since they use different circuits and different scoring rubrics, but it suggests that focused fine-tuning on a high-quality domain dataset can close the gap with general-purpose frontier models.
Try It
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"AbijahKaj/qwen3-4b-skidl", dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("AbijahKaj/qwen3-4b-skidl")
messages = [
{"role": "system", "content": "You are an expert electronics engineer and KiCad schematic designer. When given a description of an electronic circuit, generate executable SKiDL Python code that defines the circuit using the SKiDL library."},
{"role": "user", "content": "Design a CAN bus interface using MCP2515 controller and TJA1050 transceiver on SPI, with 120 ohm termination resistor."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
The output is runnable. python my_circuit.py produces a .net file you can import into KiCad.
What I Learned
1. Format matters more than data volume. The pivot from s-expressions to SKiDL Python was the single biggest decision. Research (PCBSchemaGen, CircuitLM) guided this. Always check the literature before committing to a data format.
2. Real schematics > synthetic circuits. The 45K examples from actual GitHub repos carry design patterns (decoupling cap placement, pull-up resistor conventions, power supply topologies) that synthetic generation struggles to replicate.
3. Parallel representations are cheap insurance. Every example has both SKiDL Python and structured JSON. It cost almost nothing to generate both during conversion, and it means the dataset supports multiple training strategies without re-processing.
4. Small models can punch above their weight. 4B parameters + 100K focused examples + LoRA = competitive with frontier models on a narrow domain task. The model doesn't know what a haiku is anymore, but it can wire up an STM32.
What's Next
Validation harness: Actually running the generated SKiDL code and checking it produces valid netlists (beyond the 5-circuit spot check)
ERC pass rate: Running KiCad's Electrical Rules Check on generated circuits
More tool-augmented data: The 285 tool-use examples are promising but tiny. Scaling this could teach the model to verify its own pin assignments
Bigger models: The same dataset on Qwen3-8B or 14B could push scores higher
Links
🗂️ Dataset: AbijahKaj/kicad-netlist-sft-dataset (100,179 examples, Apache 2.0)
🤖 Model: AbijahKaj/qwen3-4b-skidl (merged weights + LoRA adapter)
🚫 Failed s-expression models: AbijahKaj/qwen3.5-4b-kicad-netlist, AbijahKaj/qwen3-4b-kicad-netlist (the "before" attempts)
📄 PCBSchemaGen: arXiv:2602.00510
📄 CircuitLM: arXiv:2601.04505
🔧 SKiDL: github.com/devbisme/skidl
All open-source, all Apache 2.0.