A statistical insight

I’ve been working during the weekends on an instrumentation frontend to precisely measure the resistance of an RTD sensor using a ratiometric approach.

After building it and waiting a prudential time to let it warm I saved an hour of samples (3600) and fired Octave.

The mean and standard deviation looked ok and while a plot showed a bit of noise it was well within reasonable limits.

Just for the sake of it I did a histogram and, oh the horror:

This is clearly not OK. It should be more like a Gaussian (the real formula is quite daunting but still retains symmetry) and that looks a lot like a bimodal distribution. Changing the number of bins does not help.

The ADC I used does not have a reference input so I make two differential reads and then take the quotient (I know… but it was the only one available when started).

Perhaps the input multiplexer is at fault? (the unused channels are grounded, so I discarded that as a cause). I repeated the experiment but this time doing a full run on each channel instead of switching them and this is the result:

Well, both are skewed so there’s something else going on.

Scoping at the inputs shows what seems to be AM at around 70MHz even without power applied (that’s on the tv broadcast band here) and it kind of makes sense because I didn’t use a shield. Head bangs on the desk.

Anyways, using a quick digital filter makes everything look nicer but I’ll still have to shield this:

The transient at the beginning is not going to be an issue, as in real life I don’t expect such a step change (from 0 to ~3k) and in any case the antialias filter will get rid of it.

On a second thought, those chunks skewed up are really interesting and I should spotted that as a failure symptom earlier.

Learning again

I got into computers at the age of 8 (1994), when a very weird lady came to our school and offered to teach us. A week after that she was back with a bunch of friends and unloaded a stack of XT’s and AT’s on an empty room.

We spent two months doing exercises on paper and then she introduced us to QBasic (and DOS).

After that I continued with Visual Basic (and stopped at 6, I made a couple of programs for a very good amount of money), assembler for PIC micros. I also used C with TurboC, DJGPP and even Visual C.

I stumbled upon Python and its community and I fell in love. It is still my go to language for almost everything, even if only to try out solutions and then port them to the target language.

The last major event was a couple (and a bit more…) years ago when I started to use more and more Javascript, from plain servers using express to fully encompassing frameworks like LoopBack. From big iron to smaller micros (besides code isomorphism is really interesting)

After that I toyed with some languages but nothing really caught my attention. Among them I used Ruby, Haskell, Go and a bit of Java just for kicks or helping out long time friends. Also TCL, I   liked it as an extension language but I wouldn’t write an application from scratch using it. Go is really interesting (mainly because Docker and all the tools revolving around it) but I really need some itch that calls for it.

The major breakthrough came when I rediscovered Forth and Erlang.

Forth is really interesting as an embedded language because it’s so easy to implement and there are so many things powered by it.

Erlang was on my radar for a while and has, at least on my circle of friends, this aura of mystery.

After a post on PyAr I decided to join a local group and give it a go. I started with the exercises of Learn You Some Erlang and I couldn’t be happier.

It makes my head twitch and think in new ways. I just lust for the time of the day devoted to it.

Breaking a simple captcha with Python and Pillow

A while ago one of our long time customers approached us to automate tasks on a government portal. At least here most of them are kind of ugly, work on a specific set of browser versions and are painfully slow. We already helped him with problems like this before, so instead of having someone enter manually all the data they just populate a database and then our robot does all the work, simulating the actions on the web portal.

This one is a bit different, because they introduced a captcha in order to infuriate users (seriously, it looks like they don’t want people logging in).

Most of the time they look like this:

The first thing I tried was to remove the lines and feed the result into an ocr engine. So I made a very simple filter using Pillow:


from PIL import Image
import sys, os

def filter_lines(src):
    w,h = src.size

    stripes = []
    ss = {}

    for x in range(w):
        count = 0
        for y in range(h):
            if src.getpixel( (x,y) ) != (248, 255, 255):
                count += 1
        if count == h:

    for x in stripes:
        for y in range(h):
            src.putpixel( (x,y),  (248, 255, 255) )
    return src

if __name__ == '__main__':
    src = Image.open(sys.argv[1])
    region = filter_lines(src)

Now it looks better but after trying gocr and tesseract it still needs more work:

Just for kicks I decided to filter 100 images and overlap them, this is what I got:

That is interesting… I used this script (not the most efficient approach, but still..)


from PIL import Image
import sys, os

dst = Image.new('RGB', (86, 21) )

w,h = 86, 21

for x in range(w):
    for y in range(h):
        dst.putpixel( (x,y),  (255, 255, 255) )

for idx in range(30):
    src = Image.open('filtradas/%i.bmp'%idx)

    for x in range(w):
        for y in range(h):
            if src.getpixel( (x,y) ) != (248, 255, 255):
                dst.putpixel( (x,y),  (255, 0, 0) )


With this piece of information I can focus my efforts on that area only.
That font, even distorted, looks quite familiar to me. And indeed it is, it’s Helvetica.
This makes the problem a lot easier.

I grabbed a bitmapped version of the same size and made a grid that shows were can a number land assuming 8×13 symbols:

This shows that there is a slightly overlap between digits.
I went for a brute force approach, dividing the captcha in cells and comparing each one with every digit on the font with a small amount of overlap between them.
The symbols are smaller than the cell, so for every one of them I build regions on the cell and assign a score for the number of pixels that are equal on both.
The one that has a highest score is (likely) the correct number.

This is really simple, event tough we do a lot of comparisons performs ok (the images are quite small), and without tunning we got about 30% success rate (the server also adds noise and more aggressive distortions from time to time).

Have a difficult or non conventional problem? Give us a call, we are like the A-Team of technology.

This is the complete algorithm (it’s in Spanish but shouldn’t be hard to follow), can also be found here: https://gist.github.com/pardo-bsso/a6ab7aa41bad3ca32e30


from PIL import Image
import sys, os

imgpatrones = []
pixelpatrones = []

for idx in range(10):
    img = Image.open("patrones/%i.png" % idx).convert('RGB')
    pixelpatrones.append( list(img.getdata()) )

def compara(region, patron):
    pixels = list(region.getdata())
    size = min(len(pixels), len(patron))

    res = 0.0
    for idx in range(size):
        if pixels[idx] == patron[idx]:
            res = res + 1

    return res / size

def elimina_lineas(src):
    cropeada = src.crop( (4, 1, 49, 19) )
    w,h = cropeada.size
    stripes = []

    for x in range(w):
        count = 0
        for y in range(h):
            if cropeada.getpixel( (x,y) ) != (248, 255, 255):
                count += 1

        if count == h:

    for x in stripes:
        for y in range(h):
            cropeada.putpixel( (x,y),  (248, 255, 255) )
            cropeada.putpixel( (x,y),  (255, 0, 0) )

    return cropeada

def crear_crops(src, celda):
    limites = range(38)
    xceldas = [0, 8, 16, 24, 32, 40]
    xoffsets = range(-3,4)
    yceldas = range(6)
    boxes = []
    crops = []

    x = xceldas[celda]
    x = [ (x+off) for off in xoffsets if (x+off) in limites ]

    for left in x:
        for top in yceldas:
            boxes.append( (left, top, left+8, top+13) )

    for box in boxes:
        crops.append( src.crop(box) )

    return crops

def compara_crops_con_patron(crops, patron):
    scores = []
    for crop in crops:
        scores.append( compara(crop, pixelpatrones[patron] ))
    return max(scores)

def decodifica_celda(src, celda):
    pesos = []
    crops = crear_crops(src, celda)

    for patron in range(10):
        pesos.append( compara_crops_con_patron(crops, patron) )

    return pesos.index( max(pesos) )

def decodifica(filename):
    original = Image.open(filename)
    src = elimina_lineas(original)
    res = []

    for celda in range(6):
        res.append( decodifica_celda(src, celda) )

    return ''.join( str(x) for x in res )

if __name__ == '__main__':
    print decodifica(sys.argv[1])

How to make an awesome Android dashboard for your embedded widget

– Use a standard protocol. We chose Firmata.

– Play with nice things. We used Ionic, Apache Cordoba and the wonderful BluetoothSerial plugin.

– Remember to either modify the Firmata firmware to use the default 9600 bps speed of our HC06 adapter or change it to work at 57600.

Browserify a Node implementation of the protocol and make a port-like object so it talks over Bluetooth.

– The bluetooth plugin doesn’t like to work with binary data so we improve it.

After writing some glue code you end up with a nice and working control panel. I only got the working part but there are a ton of cool reusable widgets out there (like NexusUI or KievII)

New life for an old tv tuner.

A long long time ago I bought a cheap pci tuner card to listen to the radio. I was able to watch tv with it out of the box but the radio didn’t work. So I patched the driver and it made it into mainline.

Fast forward today, I don’t have a machine with a pci slot anymore but still wanted to listen to airwaves again as my other stereo broke. So I took out the tuner (a tnf 9835) and cooked a simple pygtk app to control it. I looks just like what you can expect from being born from such a quick hack. After trying for a while to make the hardware i2c interface work I settled for a pure software implementation. (also, pull-ups are not optional) I got all the bit masking ops right the first time (but on the other hand that was just copy pasted from drivers/media/tuners with some small edits). It is basically a nice interface around the SN761677.

Compared with my phone the sensitivity is rather poor as it needs quite a bit of an aerial but the sound quality feels better. Alas, the latter is totally subjective.


Things are looking different today…

Clearly I’m a glutton for punishment.

Today I decided that I can’t have enough and it was time to upgrade my distro so I can play with newer things (and also because google was nagging me to use an updated browser).

I did a dist-upgrade and that not only completed without a hitch but I also had a bit more of free space afterwards. Previously I hammered the thing and then just gave up all hope.

I’m starting to like this new future were things work like they should.

Now that I jinxed it I went full steam with a do-release-upgrade. It is downloading 3277 files at the blazing speed of 20 kB/s.

Let’s see if I still have a working machine by monday.

Chasing misterious 500 errors within php.

For the last couple of days I’ve been investing time learning doing some (serious) things with Drupal and I quite like it, given that for my previous gig involving php I had to manually compile and patch php 5.2 in order to work with a monstrosity made with Textpattern and CakePHP (and a spice of hand crafted databased code).

Last morning I was almost ecstatic reading about Features and went on to make a new one just to try it out.

I select a few components, hit “Download feature” and after a while, nothing. Same happened with “Generate feature”.

On the error.log I see:
2014-10-29 07:49:31: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 11992 socket: unix:/tmp/php.socket-3
2014-10-29 07:49:31: (mod_fastcgi.c.3329) response not received, request sent: 1106 on socket: unix:/tmp/php.socket-3 for /some_site/index.php?q=admin/structure/features/create, closing connection

That was a bit odd, since the memory limit is set to an ample 256M and it died long before the time limt.

Just to be sure I tried using Apache instead of Lighttpd but no dice.

On the system log I see:

php-cgi[13015]: segfault at bf7c6fcc ip b738201a sp bf7c6fd0 error 6 in libpcre.so.3.13.1[b736d000+3f000]

With that clue I edit php.ini and shave a couple of zeros out of pcre.recursion_limit from the default of 100000. After restarting the server everything worked fine.

I shudder thinking of something that really needs a call stack 100 thousand levels deep. But on the other hand I cut my teeth on a micro with 68 bytes of ram.

Back to basics.

For a course I’m taking at the uni I spent the last weeks programming entirely in assembler for a not very small micro.

Last time I did that for real work was about 12 years ago, dinosaurs roamed the Earth and the 16F84A was popular among my friends here.

It felt quite refreshing but the compiler sometimes behaved like a real jerk.

For instance this was flagged as invalid:

label   db $BA, $DC, $0E

But trimming the spaces made it happy:

label   db $BA,$DC,$0E

(and I spent my weekly cursing budget chasing it)

Fortunately it runs fine under Wine, can be called from a makefile and the debugger/simulator works.

Adventures in smps carnage I.

A while ago while cleaning the trash pile I thought that it’d be nice to mod one of the many computer supplies to have a variable output. So I picked up the less crappy, replaced the transformer with a one with better turns ratio to achieve a higher voltage output and put a pot on the feedback loop.

At first it kind of worked but with a lot of unstable points and weird modes. Then I realized that I fed the feedback from about 50K when the nominal was near 10K (and also there is considerable input current there). A simple emitter follower took care of that, now there only remains plain oscillations.

The operating point moves a lot considering that I want the output to be adjustable between 5V and 50V and without a fixed load. The original compensation scheme was a plain integrator plus a zero, I can make things a little better slowing it down a lot but what’s the fun on that.

So instead of blindingly doing things I set out to measure the loop response using Middlebrook’s method. I cobbled up a quick python program with Gtk and GStreamer to generate the test signals with a computer soundcard. Initially I expected to just sweep the frequency and measure some points manually on the scope but there is a lot of 50Hz induced interference that together with switching residuals make that task impossible, I really need to perform a synchronous detection in order to get a meaningful result. That means I’ll have to make room for some more quality time coding to get the scope samples in an automated fashion. The usb protocol is documented here ( http://elinux.org/Das_Oszi_Protocol#0x02_Read_sample_data ).

The setup is a far cry from the ones depicted in the famous AN70 by Jim Williams. I used an H-Field probe to rule out magnetics as an interference source. I expected the output filters and the transformer to be troublesome but their effects on the point of injection are negligible. On the other hand, long wires on the feedback path (even twisted) and the snap recovery diodes aren’t a good match.


The root of all evil.

I just love when I forget to add ‘volatile’ and the compiler happily optimizes away a chunk of code.

After staring for a while at the screen trying to figure out why it doesn’t work as expected I went for a quick nap. When I got back I noticed several warnings about it that were invisible to my eyes before.

Installing Flumotion on Debian Jessie.

Being on the edge sometimes hurts.

I grabbed https://github.com/inaes-tic/flumotion and https://github.com/inaes-tic/flumotion-ugly

just to be greeted with

AttributeError: 'EPollReactor' object has no attribute 'listenWith'

Instead of force-installing an older (<=11) python-twisted I fetched http://twistedmatrix.com/Releases/Twisted/11.1/Twisted-11.1.0.tar.bz2 and http://twistedmatrix.com/Releases/Web/11.1/TwistedWeb-11.1.0.tar.bz2.

Uncompressed, did python setup.py build && python setup.py install.

And the thing worked.



Today I  woke up almost as tired as I went to bed yesterday. Most of the people is not working because of a multi day holiday. Or something like that.

Like yesterday I (unsuccessfully) tried to figure out why WebVfx refuses to play nice with gstshm. So I went for a walk to clear my mind.

One of the nicest things about living in Berisso is that I have really really close almost virgin fields and beachs, an island, “normal” city stuff and industrial/maritime landscapes. Today I went to Ensenada, there are many places that look like a still from movies such as Tank Girl or Mad Max; toyed around the docks and abandoned ships. Also met a woman that kinda looked like Lori Petty these days. Scary.

Ver mapa más grande

Google says it was a 12.5Km trip. I took me a bit longer but I tried really hard to slow down and enjoy it instead of just walking.

Back at home I’m out of ideas and this is still broken. I guess it’s time to panic.