Skip to main content

Command Palette

Search for a command to run...

Teaching a Small LLM to Design Electronic Circuits: Fine-Tuning Qwen3-4B on 100K KiCad Netlists

Published
9 min read
A

Building stuff

How I built a 100K-example dataset of executable circuit netlists and fine-tuned a 4B parameter model that scores 88% on functional circuit generation, rivaling GPT-4o's published benchmarks.


The Problem: LLMs Can't Do PCB Design (Yet)

If you've ever asked ChatGPT to design you a circuit, you know the pain. It'll happily describe how an H-bridge works, but ask it to produce an actual netlist you can open in KiCad? You'll get hallucinated pin numbers, impossible connections, and formats that parse to nothing.

The core issue is KiCad's native s-expression netlist format. It's deeply nested parentheses with domain-specific tokens, completely alien to language models trained on code and prose. Recent research confirms this: PCBSchemaGen (arXiv:2602.00510) found that even GPT-4o struggles with raw s-expressions but achieves 87% Pass@1 when you reframe the problem as Python code generation using SKiDL.

I can confirm this firsthand. I tried the s-expression approach before pivoting to SKiDL. My earlier models (AbijahKaj/qwen3.5-4b-kicad-netlist and AbijahKaj/qwen3-4b-kicad-netlist) were trained on raw KiCad s-expression netlists, and they failed poorly. The models simply couldn't keep track of which nested block they were generating. Too many parentheses stacking up, and the output would degenerate into garbage mid-way through a component definition. They'd lose count of opening/closing parens, mix up which (property ...) belonged to which (symbol ...), and produce structurally invalid netlists that KiCad couldn't parse at all.

That hard-won insight, "circuits as Python, not s-expressions", is what this entire project is built on.

The Idea: SKiDL as the Bridge

SKiDL is a Python library that lets you define electronic circuits programmatically:

from skidl import *

mcu = Part("MCU_Microchip", "ATmega328P-AU", footprint="Package_QFP:TQFP-32_7x7mm_P0.8mm")
cap = Part("Device", "C", value="100nF", footprint="Capacitor_SMD:C_0402_1005Metric")

vcc = Net("VCC")
gnd = Net("GND")

vcc += mcu["VCC"], cap["1"]
gnd += mcu["GND"], cap["2"]

generate_netlist()

Run that Python script and you get a valid .net file KiCad can import directly. The format is:

  • 3× more compact than s-expression netlists

  • Executable, so syntax errors are immediately caught

  • Python, exactly what LLMs are best at generating

If I could build a large enough dataset of (natural language → SKiDL Python) pairs, a small fine-tuned model might actually learn to design circuits.

Building the Dataset: From 0 to 100K Examples

Dataset: AbijahKaj/kicad-netlist-sft-dataset

The dataset was built over about a week (April 28 to May 2, 2026) through multiple conversion pipelines, each contributing a different source of circuit knowledge:

Source 1: Real KiCad Schematics from GitHub (~45,000 examples)

Starting from bshada/open-schematics, a collection of .kicad_sch files scraped from 6,000+ GitHub repositories, I wrote a pure-Python Union-Find converter that:

  1. Parses KiCad schematic files (both modern v6+ and legacy EESchema formats)

  2. Extracts components, pin assignments, and wire connectivity

  3. Generates equivalent SKiDL Python code

  4. Produces a parallel structured JSON netlist

These are real circuits designed by real engineers: ESP32 IoT boards, STM32 flight controllers, Raspberry Pi CM4 carriers, mechanical keyboard PCBs, FPGA eval boards, VESC motor controllers. The full breadth of open-source hardware.

Source 2: LTspice Circuits → SKiDL (~54,000 examples)

Si7li/ltspice-spice-circuits provided a large corpus of simulation circuits. These needed a SPICE-to-SKiDL translation layer since the component naming conventions, net syntax, and pin numbering are all different. But the underlying circuit knowledge (op-amp configurations, filter topologies, power supply designs) is invaluable training signal.

Source 3: Quality Filtering & Cleanup

Not every conversion produces a useful training example. I scored each example on:

  • Component and net presence

  • GND/power net inclusion

  • Multi-node connectivity (circuits where nets actually connect multiple things)

  • Net-to-component ratio

Filtered at score ≥ 0.6 and ≥ 2 multi-node nets. Removed 595 examples with empty netlist_json. Then relaxed filters slightly to recover 528 borderline schematics that were still useful.

Source 4: Tool-Augmented Examples (285)

A small but important subset: multi-turn conversations where the model first calls search_component and get_datasheet_info tools to look up IC pinouts before generating the netlist. This teaches the model to reason about component selection rather than just pattern-match.

Final Dataset Stats

Metric Value
Total examples 100,179
Format ChatML (system + user + assistant)
Output format SKiDL Python (in messages) + JSON netlist (parallel column)
Unique source repos ~6,000+
Tool-augmented 285

Every single example includes both representations, so you can train on SKiDL Python, structured JSON, or both.

Training: LoRA on Qwen3-4B

Model: AbijahKaj/qwen3-4b-skidl

I chose Qwen3-4B as the base. It's a strong code-capable model at a size that's actually deployable on consumer hardware. The full training run took about 17 hours on my RTX 5090 on May 4–5, checkpointing every 500 steps. The 32GB of VRAM on the 5090 made it comfortable to run LoRA at r=64 with 8192 context length without any gradient checkpointing tricks.

For the dataset curation and training script iteration, I leaned heavily on ML-Intern, Hugging Face's AI coding assistant. It handled a lot of the tedious parts: researching current TRL/PEFT APIs, writing the conversion pipelines, debugging dataset formatting issues, and setting up the training configs. Having it look up documentation and find working examples saved me from the usual cycle of "try outdated API, get cryptic error, google for 20 minutes, repeat." I could focus on the actual design decisions (which format to use, how to score quality, what LoRA rank makes sense) while it handled the boilerplate and plumbing.

Training Configuration

Parameter Value
Method SFT + LoRA (PEFT)
LoRA rank / alpha r=64, α=32, dropout=0.05
Target modules q/k/v/o_proj, gate/up/down_proj
Trainable params ~132M (5.65% of total)
Epochs 2
Peak learning rate 2e-4 (cosine decay)
Effective batch size 8
Max sequence length 8192 tokens
Total steps ~23,618

Why LoRA at r=64?

Circuit netlists are structurally repetitive (lots of Part(...), Net(...), += patterns) but semantically diverse (thousands of different component libraries, pin assignments, connection topologies). A relatively high rank (64) gives the adapter enough capacity to learn the format without losing the nuance of which pins connect where.

Loss Curve

The model converged quickly since circuits have strong structural regularity:

Phase Train Loss Token Accuracy
Start 0.86 75%
End of epoch 1 0.17 95%
End of epoch 2 0.15 95–96%
Final eval loss 0.158 95.3%

Evaluation: Does It Actually Work?

I ran functional validation on 5 held-out circuits, scoring each on:

  • Python syntax correctness

  • SKiDL import/structure validity

  • Correct number of nets and components

  • Proper connectivity (are pins actually wired together?)

  • GND net presence and correctness

Circuit Score
LED blink (ATtiny85) 0.85
USB power meter (ATmega328P + INA219) 0.85
CAN bus (MCP2515 + TJA1050) 0.95
Average 0.883

An 88.3% functional score from a 4B model is noteworthy. For context, PCBSchemaGen reports 87% Pass@1 with GPT-4o (a model orders of magnitude larger) on a similar SKiDL generation task. The benchmarks aren't directly comparable since they use different circuits and different scoring rubrics, but it suggests that focused fine-tuning on a high-quality domain dataset can close the gap with general-purpose frontier models.

Try It

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AbijahKaj/qwen3-4b-skidl", dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("AbijahKaj/qwen3-4b-skidl")

messages = [
    {"role": "system", "content": "You are an expert electronics engineer and KiCad schematic designer. When given a description of an electronic circuit, generate executable SKiDL Python code that defines the circuit using the SKiDL library."},
    {"role": "user", "content": "Design a CAN bus interface using MCP2515 controller and TJA1050 transceiver on SPI, with 120 ohm termination resistor."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

The output is runnable. python my_circuit.py produces a .net file you can import into KiCad.

What I Learned

1. Format matters more than data volume. The pivot from s-expressions to SKiDL Python was the single biggest decision. Research (PCBSchemaGen, CircuitLM) guided this. Always check the literature before committing to a data format.

2. Real schematics > synthetic circuits. The 45K examples from actual GitHub repos carry design patterns (decoupling cap placement, pull-up resistor conventions, power supply topologies) that synthetic generation struggles to replicate.

3. Parallel representations are cheap insurance. Every example has both SKiDL Python and structured JSON. It cost almost nothing to generate both during conversion, and it means the dataset supports multiple training strategies without re-processing.

4. Small models can punch above their weight. 4B parameters + 100K focused examples + LoRA = competitive with frontier models on a narrow domain task. The model doesn't know what a haiku is anymore, but it can wire up an STM32.

What's Next

  • Validation harness: Actually running the generated SKiDL code and checking it produces valid netlists (beyond the 5-circuit spot check)

  • ERC pass rate: Running KiCad's Electrical Rules Check on generated circuits

  • More tool-augmented data: The 285 tool-use examples are promising but tiny. Scaling this could teach the model to verify its own pin assignments

  • Bigger models: The same dataset on Qwen3-8B or 14B could push scores higher


All open-source, all Apache 2.0.