The cuckoo's nest

STM8 microprocessors utilize a proprietary (but well documented) programming protocol called "SWIM". The usual way to program these is with a ST-Link programmer. Without such luxury at hand, I managed to talk to the SWIM interface using a Raspberry Pi's SPI interface, one transistor and one resistor. Of course, it would be simpler to just buy an ST-Link, but that's not the point :)

Here's the source code if you're interested.

TL,DR: connect a pull-up resistor, MISO and MOSI to SWIM (MOSI through an open-collector transistor), ignore SCLK. Drive the SPI hardware at 8MHz, generate 8MHz-resolution output bit sequences in software, send out these buffers through SPI, parse the returned buffers as bit sequences in software. Implement SWIM on top of that.

STM8 SWIM protocolTop

The programming (and debugging) protocol for STM8 is documented on the manufacturer's website. The flash programming manual is also relevant. If you are interested in the details, go ahead and read these. I will try to describe the essentials here.

To quote from the document, "The SWIM is a single wire interface based on asynchronous, high sink (8mA), open-drain, bidirectional communication". In other words it uses one bidirectional pin with a pull-up resistor which is driven low by either the STM8 or the debugger. SWIM communications are synchronized to the STM8's SWIM clock, which defaults to 8MHz (regardless of the clock source of the microcontroller). Data bits are transmitted with a sequence of low/high pulses on the SWIM line. For example, in the "low speed bit format":

On the receiving side, the signal is decoded as a 0 bit if the low signal exceeds 9 SWIM clocks, or 1 otherwise.

data bit idle 0 1 idle
time [us] 0 1 2 3 4 5 6
SWIM line 1 0 1 0 1 1

Most of the high-level operations are defined as sequences of these bits, there is also the "entry sequence" which consists of slower signalling.

Any device implementing this protocol needs to read/write the SWIM line quickly (8MHz resolution) and with predictable timing. A bit-banging implementation would not be practical on a Linux-based board (like the Raspberry Pi), because meeting the timing requirements could be difficult (feel free to prove me wrong :).

(ab)using SPITop

SPI, or Serial Peripheral Interface is a simple serial protocol for inter-IC communications. In its most basic form, two devices communicate using three serial lines: SCLK, MISO and MOSI (CS - chip select is not very interesting for our purposes). One of the devices, designated as the "master", generates the clock signal (SCLK). During each cycle of the clock signal, both of the devices send and receive one bit of data, master transmits on the MOSI (master out, slave in) line, slave transmits on the MISO (master in, slave out) line. Usually, the sequence of bits is interpreted in packs of 8 (bytes), with the most significant bit transmitted first.

bytes (master->slave) idle 0x12 0x34 idle
bytes (slave->master) idle 0xfe 0xdc idle
SCLK 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0
MOSI X 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0
MISO X 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 0 0

The master device is responsible for driving the clock signal whenever it wants to communicate with the slave. Obviously, it also determines the transmission speed by controlling the clock frequency. Most microcontrollers have dedicated SPI hardware which allows for precise clock generation (on the master side) or synchronization (on the slave side) and timing of the MISO/MOSI signals. More advanced devices support DMA, meaning that whole sequences of bytes can be streamed from the device's RAM to SPI (and back) while the CPU executes other code.

Note that the SPI master hardware can be thought of as a digital signal sampler/generator - whenever data is to be transmitted/received, the SPI controller generates the clock signal and outputs some bits in sequence on MOSI while sampling another sequence of bits on MISO. We can ignore SCLK, and use MISO/MOSI as a high-speed signal generator/analyzer. This allows us to implement a high-speed "bit-banging" protocol in software, with some limitations:

The implementationTop


I've decided to use a Raspberry Pi (lying around) as the SWIM programmer. First, I've needed to combine the MISO and MOSI lines to operate on one wire. The schematic below shows a simple open-collector (or rather open-drain) configuration. Apart from that, the Raspberry Pi needs to drive the STM8's reset line with one of it's GPIO's, but that's not very interesting.

The resistor was chosen to provide minimum recovery time while staying within the 8mA current limit at 3.3V Vcc. The (field-effect-) transistor was chosen from the ones I've had on hand. It is important to pick a transistor which can turn off quickly, otherwise any low pulse will be prolonged. First I've tried with a BC547, but it would stay turned on too long when saturated (It could be improved with some bias voltages but I was too lazy to experiment with that).

The transistor configuration works as an inverter, so that driving MOSI high pulls SWIM low. The MISO line monitors SWIM directly, so that it will receive any bits transmitted by the Pi or the STM8. Transmitting a sequence of bits on MOSI without the STM8 attached should in theory return the same sequence, inverted, on MISO. In reality, the transistor's turn-off time turned out to be skewing the results, extending each low-level period considerably. My software works with a simple work-around - the low periods are simply transmitted shorter than required. It sucks, but it works.


The programming software accesses SPI using the Linux spidev driver (/dev/spidev#.#). The driver allows programs to send out a reasonably long buffer of bytes and simultaneously receive the response into another buffer. The software fills the SPI buffer using three primitives:

Analogously, the received buffers are interpreted as low-speed bits 0/1, skipping the parts transmitted by us to recover data transmitted by the STM8.

Based on these primitives, SWIM operations such as ROTF (read-on-the-fly) and WOTF (write-on-the-fly) can be implemented. An example transmission is shown below. Abbreviations: a - ack, X - the host sends a "space" instead of 0/1 bit

operation Pi: Read 1 byte from 0x123456 STM: 0x78
SWIM (Pi->STM) ROTF 1 0x12 0x34 0x56 a
ls bits (output) 0 0 0 1 1 X 0 0 0 0 0 0 0 0 1 1 X 0 0 0 0 1 0 0 1 0 0 X 0 0 0 1 1 0 1 0 0 1 X 0 0 1 0 1 0 1 1 0 0 X X 1
SWIM (STM->Pi) a a a a a 0x78
ls bits (input) 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 1 1 1 0 0 0 0 1

The host prepares a SPI buffer with the following operations:

  1. Send out a low-speed-bit sequence representing the "ROTF" (read-on-the-fly) operation.
  2. Make some space to receive an acknowledgement from the STM (X)
  3. Assume that the ack ('1' low-speed-bit from the STM) was received correctly. This will be verified only after the whole buffer is transferred and response is received.
  4. Send subsequent data bytes (1, 0x12, 0x34, 0x56) making space for acks.
  5. Make some more space to receive the response byte from the STM.
  6. Preemptively send an acknowledgement confirming that the data was received correctly.

After the I/O is performed, the response buffer is scanned for ack's, and responses. Occasionally, STM doesn't acknowledge these operations properly, I assume that it's because of timing disruptions mentioned above. In such case, the operation is simply retried.

ROTF and WOTF operations, in turn, allow the whole flash programming procedure to be performed. See the manufacturer's documentation or source code for the boring details.

Source codeTop

I've used stm8flash as the base for my programming software. The original code supports SWIM, although only using the ST-LINK hardware. I've extended it to support SPI-based SWIM programming, and uploaded my version to github.

2017-06-25 21:22:00
Excellent tool and a nice tutorial. According to your experience, how hard would it be to implemented the debugging part of the SWIM protocol also?
2017-06-25 21:22:00
Thank you :)

I don't have any experience with debugging over SWIM. A brief look at the datasheets suggests that the debugging protocol is based on simple register reads/writes, so theoretically this wouldn't be any more difficult than the flash programming part. I wouldn't attempt to "roll my own" debugger, but OpenOCD claims to support SWIM (using an st-link USB adapter). If you are interested in a working debugger, that's where you should look.
On the other hand, if you wish to use my SPI hack (why?:), you could just replace the USB-based transport in OpenOCD with my SPI code and it should "just work".
2017-06-25 21:22:00
Nice job!
Luke Kenneth Casson Leighton
2017-10-23 10:02:58
hiya andrzej, this is excellent stuff. i am creating an open hardware series of projects for 3D printing, where i am using this amazingly-low-cost processor to control A4982 steppers, and for a z-probe height sensor, and also a filament width sensor. normally people use ATTiny or in one case an Arduino Uno for those, but i want to keep cost and size down.

the only thing is: most people would be very reluctant to also have to buy an STLinkv2 in addition to the sensors, just to *maybe* reprogram it, so what you've done here is actually very very important. the possibility exists to port this version stm8flash onto an Arduino, or actually integrate it into Marlin or RepRapFirmware as a way to upload new firmware binaries to the various sensors.

however (ab)using SPI would be... quite awkward, as those particular pins would not necessarily be available. would you be interested to help make a GPIO bit-banging equivalent of the SPI functions? so that a user could simply... pick absolutely any spare GPIO pins, one for reset and one for SWIM?
Luke Kenneth Casson Leighton
2017-10-23 10:16:05
p.s. bit-banging using timers to do PWM (for write) and timer interrupt service routines (for read) would get you the speeds needed to use plain GPIO with accurate enough resolution
2017-10-28 22:13:48
This sounds like a good idea.

A bit-banging implementation of SWIM would probably be simpler than trying to use SPI. Also, you could try to skip the transistor and use the GPIO in a pseudo-open-collector mode, basically switching between output-low and input mode.

I'm not sure if my code will be very useful for you, maybe more as an inspiration. The SWIM protocol is quite simple and implementing it in bit-banging mode should not be very troublesome. The only issue would be the frequency. If you mean to run it on a 16MHz AVR, that would mean producing/consuming one bit in two clock cycles == two instructions. Forget about timers and interrupts - for such tight timing you would need hand-written assembly. It seems possible - the bit states don't change that often so you could squeeze in some logic between the GPIO state switching, but it's certainly not a trivial task.

I don't have much free time these days, so I can't offer any "hands-on" help, but if you have any further questions feel free to ask.
sohaib qamar
2018-12-19 12:44:58
Excellent work.
I need to program STM8 (STNRG388A) using mbed LPC1768, can i use your source code and related stuff?
Waiting for your positive response.
2018-12-20 22:40:00
Your case seems to be similar to the question posted above. When using a dedicated microcontroller, you may achieve the required timing with GPIO bit-banging, and you wouldn't be forced to use the SPI trick. I see that the LPC1768 runs at 96Mhz which should give you plenty of time to manage the bitstream.

Anyway, feel free to use anything you like. The source code is available on github (see link at the top of the page) under GPLv2.
2020-06-29 20:38:30
Tried with AO3442, but shows error
"Determine FLASH area
Could not communicate with target
Error communicating with MCU. Please check your SWIM connection."

Can I use AO3442 or NX7002AK instead of FDV301N since I have these two in my hands?
2020-07-02 22:14:18
My best advice would be to check the signal using an oscilloscope. The transistor needs to turn on/off fast enough for the edges to be recognizable by the STM.

A quick check reveals that AO3442 is a high-current device and has *huge* capacitances compared to the FDV, and it will limit the switching speed. Its threshold voltage is also quite high, meaning that it takes more to turn on. The NX7002 should be a better choice, although still somewhat worse than the FDV301N.
2023-08-28 16:28:40
Just found this post. Very interesting.
I only have a IRLML2502 and a IRF7807ZTRPBF N Channel Mosfet available.
I tried the first one (IRLML2502) with a 470 ohm resistor on a rpi3B+
My logic analyzer can see RST going low and some traffic on MISO but it doesn't seem to work (ive debugged the sync value) :

./stm8flash -c spi -p stm8s003?3 -w test.ihx
Determine FLASH area
sync = 0
sync = 1
sync = 1
sync = 1

Could not communicate with target

Do you think the mosfet is not fast enough ?