Burning down the house

(Watch out)

So many drafts, some stories and pictures from the last PyCon at Bahía Blanca.

I was happily hacking on the kitchen the other Saturday when I hear a strange noise coming from the garden.

To my dismal surprise I see that the shed is on fire and part of the roof collapsed. I went in to take out a propane can to avoid an impending catastrophe and called the firemen (lucky us, they are a few blocks away).

We lost the roof, tools, vinyls and books on an adjacent room but nothing that can’t be replaced. Still fuck.

Some pictures of PyCon at flickr (not mine) https://www.flickr.com/photos/70871182@N04/sets/72157677377824525

Fixing a microwave oven with a broken keypad

This is by far one of the most productive things I did this week outside of work (at least the one I can write about here).

A couple of months ago my ex gave it to me, it started with intermittent display issues and one day it stopped completely. I picked it up and stored it.

The other weekend I was in a bit of cleaning frenzy and I remembered that it was using valuable space on the shack doing nothing so I set to see if it had any hope of working again. Otherwise I’d take the transformer and dish motor, the magnetron would go to a friend and the rest sold as scrap.

This is the second time I fix a microwave oven and I’m amazed at the amount of grease and acid stench that accumulates inside them.

I bridged the safety interlock pads on the control board and powered it with an isolation transformer. It kinda turned on but was not responsive and only some digits were dimly lit. It was also very sticky.

After that I cleaned it using lukewarm water, detergent and a toothbrush, a scoop with a hair drier and then another bath with alcohol.

Now it works!

The keypad is a mess, besides being sticky and stenchy too the conductive traces were broken, like dissolved, on the connector side. For some models there are still replacements on the market but they aren’t cheap and also what’s the fun on that?

I peeled away the layers, traced it and make a replacement using tact switches. The decal will be glued on top of that. It works fine, there’s less waste (but I’m short of a spot welder) and off it goes to Radio Futura.

Doomed

So, I’m facing an issue and the best tools so far (or the ones that are the less worse) seem to involve both php and xslt. And a braindead webservice. Go team.
Not totally unrelated, I’m surprised at the amount of stuff that can be found with “depressed developer”.

Yummy.

The other week I felt like cooking.

I made cubes of calabaza in syrup like grandma used to, with ash or lime to harden the outside. About half day sitting in water and lime, a thorough clean and then five hours give or take on the stove with lots of sugar.

After that I roasted a sweet potato and just for kicks I also fried a banana in a mixture of honey and butter. That was really tasty.

Guidelines for C source code auditing and other tales.

The papers and articles at this site are quite interesting, even if a little dated. Somehow I had many of them opened from a couple of days ago but just now took the time to really read them.

Guidelines for C source code auditing: http://www.ouah.org/mixtercguide.html

Syscall Proxying – Simulating remote execution: http://www.ouah.org/SyscallProxying.pdf

An Overview of Unix Rootkits: http://www.ouah.org/iRootkits.pdf

Know your Enemy: http://www.ouah.org/motives.html

…and part the other series: http://web.archive.org/web/20010607083412/http://project.honeynet.org/papers/

A statistical insight

I’ve been working during the weekends on an instrumentation frontend to precisely measure the resistance of an RTD sensor using a ratiometric approach.

After building it and waiting a prudential time to let it warm I saved an hour of samples (3600) and fired Octave.

The mean and standard deviation looked ok and while a plot showed a bit of noise it was well within reasonable limits.

Just for the sake of it I did a histogram and, oh the horror:

This is clearly not OK. It should be more like a Gaussian (the real formula is quite daunting but still retains symmetry) and that looks a lot like a bimodal distribution. Changing the number of bins does not help.

The ADC I used does not have a reference input so I make two differential reads and then take the quotient (I know… but it was the only one available when started).

Perhaps the input multiplexer is at fault? (the unused channels are grounded, so I discarded that as a cause). I repeated the experiment but this time doing a full run on each channel instead of switching them and this is the result:

Well, both are skewed so there’s something else going on.

Scoping at the inputs shows what seems to be AM at around 70MHz even without power applied (that’s on the tv broadcast band here) and it kind of makes sense because I didn’t use a shield. Head bangs on the desk.

Anyways, using a quick digital filter makes everything look nicer but I’ll still have to shield this:

The transient at the beginning is not going to be an issue, as in real life I don’t expect such a step change (from 0 to ~3k) and in any case the antialias filter will get rid of it.

On a second thought, those chunks skewed up are really interesting and I should spotted that as a failure symptom earlier.

Learning again

I got into computers at the age of 8 (1994), when a very weird lady came to our school and offered to teach us. A week after that she was back with a bunch of friends and unloaded a stack of XT’s and AT’s on an empty room.

We spent two months doing exercises on paper and then she introduced us to QBasic (and DOS).

After that I continued with Visual Basic (and stopped at 6, I made a couple of programs for a very good amount of money), assembler for PIC micros. I also used C with TurboC, DJGPP and even Visual C.

I stumbled upon Python and its community and I fell in love. It is still my go to language for almost everything, even if only to try out solutions and then port them to the target language.

The last major event was a couple (and a bit more…) years ago when I started to use more and more Javascript, from plain servers using express to fully encompassing frameworks like LoopBack. From big iron to smaller micros (besides code isomorphism is really interesting)

After that I toyed with some languages but nothing really caught my attention. Among them I used Ruby, Haskell, Go and a bit of Java just for kicks or helping out long time friends. Also TCL, I   liked it as an extension language but I wouldn’t write an application from scratch using it. Go is really interesting (mainly because Docker and all the tools revolving around it) but I really need some itch that calls for it.

The major breakthrough came when I rediscovered Forth and Erlang.

Forth is really interesting as an embedded language because it’s so easy to implement and there are so many things powered by it.

Erlang was on my radar for a while and has, at least on my circle of friends, this aura of mystery.

After a post on PyAr I decided to join a local group and give it a go. I started with the exercises of Learn You Some Erlang and I couldn’t be happier.

It makes my head twitch and think in new ways. I just lust for the time of the day devoted to it.

Breaking a simple captcha with Python and Pillow

A while ago one of our long time customers approached us to automate tasks on a government portal. At least here most of them are kind of ugly, work on a specific set of browser versions and are painfully slow. We already helped him with problems like this before, so instead of having someone enter manually all the data they just populate a database and then our robot does all the work, simulating the actions on the web portal.

This one is a bit different, because they introduced a captcha in order to infuriate users (seriously, it looks like they don’t want people logging in).

Most of the time they look like this:

The first thing I tried was to remove the lines and feed the result into an ocr engine. So I made a very simple filter using Pillow:


#!/usr/bin/python

from PIL import Image
import sys, os

def filter_lines(src):
    w,h = src.size

    stripes = []
    ss = {}

    for x in range(w):
        count = 0
        for y in range(h):
            if src.getpixel( (x,y) ) != (248, 255, 255):
                count += 1
        if count == h:
            stripes.append(x)

    for x in stripes:
        for y in range(h):
            src.putpixel( (x,y),  (248, 255, 255) )
    return src

if __name__ == '__main__':
    src = Image.open(sys.argv[1])
    region = filter_lines(src)
    region.save(sys.argv[2])

Now it looks better but after trying gocr and tesseract it still needs more work:

Just for kicks I decided to filter 100 images and overlap them, this is what I got:

That is interesting… I used this script (not the most efficient approach, but still..)


#!/usr/bin/python

from PIL import Image
import sys, os

dst = Image.new('RGB', (86, 21) )

w,h = 86, 21

for x in range(w):
    for y in range(h):
        dst.putpixel( (x,y),  (255, 255, 255) )

for idx in range(30):
    src = Image.open('filtradas/%i.bmp'%idx)

    for x in range(w):
        for y in range(h):
            if src.getpixel( (x,y) ) != (248, 255, 255):
                dst.putpixel( (x,y),  (255, 0, 0) )

dst.save('overlapeada.bmp')

With this piece of information I can focus my efforts on that area only.
That font, even distorted, looks quite familiar to me. And indeed it is, it’s Helvetica.
This makes the problem a lot easier.

I grabbed a bitmapped version of the same size and made a grid that shows were can a number land assuming 8×13 symbols:

This shows that there is a slightly overlap between digits.
I went for a brute force approach, dividing the captcha in cells and comparing each one with every digit on the font with a small amount of overlap between them.
The symbols are smaller than the cell, so for every one of them I build regions on the cell and assign a score for the number of pixels that are equal on both.
The one that has a highest score is (likely) the correct number.

This is really simple, event tough we do a lot of comparisons performs ok (the images are quite small), and without tunning we got about 30% success rate (the server also adds noise and more aggressive distortions from time to time).

Have a difficult or non conventional problem? Give us a call, we are like the A-Team of technology.

This is the complete algorithm (it’s in Spanish but shouldn’t be hard to follow), can also be found here: https://gist.github.com/pardo-bsso/a6ab7aa41bad3ca32e30


#!/usr/bin/python

from PIL import Image
import sys, os


imgpatrones = []
pixelpatrones = []

for idx in range(10):
    img = Image.open("patrones/%i.png" % idx).convert('RGB')
    imgpatrones.append(img)
    pixelpatrones.append( list(img.getdata()) )


def compara(region, patron):
    pixels = list(region.getdata())
    size = min(len(pixels), len(patron))

    res = 0.0
    for idx in range(size):
        if pixels[idx] == patron[idx]:
            res = res + 1

    return res / size


def elimina_lineas(src):
    cropeada = src.crop( (4, 1, 49, 19) )
    w,h = cropeada.size
    stripes = []

    for x in range(w):
        count = 0
        for y in range(h):
            if cropeada.getpixel( (x,y) ) != (248, 255, 255):
                count += 1

        if count == h:
            stripes.append(x)

    for x in stripes:
        for y in range(h):
            cropeada.putpixel( (x,y),  (248, 255, 255) )
            cropeada.putpixel( (x,y),  (255, 0, 0) )

    return cropeada

def crear_crops(src, celda):
    limites = range(38)
    xceldas = [0, 8, 16, 24, 32, 40]
    xoffsets = range(-3,4)
    yceldas = range(6)
    boxes = []
    crops = []

    x = xceldas[celda]
    x = [ (x+off) for off in xoffsets if (x+off) in limites ]

    for left in x:
        for top in yceldas:
            boxes.append( (left, top, left+8, top+13) )

    for box in boxes:
        crops.append( src.crop(box) )

    return crops

def compara_crops_con_patron(crops, patron):
    scores = []
    for crop in crops:
        scores.append( compara(crop, pixelpatrones[patron] ))
    return max(scores)

def decodifica_celda(src, celda):
    pesos = []
    crops = crear_crops(src, celda)

    for patron in range(10):
        pesos.append( compara_crops_con_patron(crops, patron) )

    return pesos.index( max(pesos) )

def decodifica(filename):
    original = Image.open(filename)
    src = elimina_lineas(original)
    res = []

    for celda in range(6):
        res.append( decodifica_celda(src, celda) )

    return ''.join( str(x) for x in res )

if __name__ == '__main__':
    print decodifica(sys.argv[1])

(trying to) Measure temperature

In a while I’ll need to characterize an oven and perhaps build a new one.
Just to start I have to apply a power step and measure how the internal temperature evolves.

In order to save time I searched my local distributors and bought a K type thermocouple with amplifier and cold junction compensation. It is not the most accurate but it is more than enough for now. There are a couple of ics available that give a direct digital output but the work needed to breadboard them and have a meaningful reading is beyond the scope at this stage.

This is what I bought:

Appears on many places as a “Grove High Temperature Sensor”. It sports an OPA333 precision opamp and a CJ432 adjusted to provide a 1.5V reference. The rest of the circuit is nothing special, except that the manufacturer called the thermistor “light”. It can be consulted here.

First ligths

While I have more capable hardware at hand I grabbed an Arduino Nano and the official library from https://github.com/Seeed-Studio/Grove_HighTemp_Sensor and lo and behold I had it streaming temperature to my terminal.

Let’s get graphical

I cooked a simple gui on python using Qt and Qwt while listening to Olivia Newton.
It is pretty barebones, only has facilities to export into csv, a couple of tracking cursors and gracefully handles device disconnections (say, I yank the cable). I expect to post process the data using QtiPlot or Kst.

Tweaking

One of the first things I noted was that the measured temperature jumped in big steps of about 2°C.
Using the default setup with a 5V “reference” and considering the amplifier gain every adc bit amounts to:

 Vbit = \frac{5000mV}{1023*54.16} = 0.09024 mV

Looking at the polynomial coefficients used by the library (ITS90) and taking a first order approximation one bit corresponds to a 2.26°C step and it grows bigger with the measured temperature as other terms start to influence the result. Even tough the output is low pass filtered at about 1.6KHz and it is averaged over 32 points there’s still noise.

Changing the reference to use the regulated 3.3V makes it about 1.5°C but even if it is more than enough for what I need it can be better.

With a couple of bits more I can achieve better resolution. Instead of using an external adc I took advantage of the inherent noise on the reference and output and chose to apply a 16 times oversample in order to have 12 bits out of the 10 bit adc. Application note AVR121 explains that nicely. Now I am limited (in theory…) to 0.37°C steps and I can average on top of that to further reduce variations.

The last source of error (besides not knowing for sure the “real” value of the references) is that the library assumes a fixed 350mV output, the circuit ideally floats the amplified thermocouple voltage around that. In order to measure it I added a small relay from my stash (TQ2SA-5V) to short the input. It is not meant to be used as a dry relay but does fine so far.
Upon startup it reads 348 mV; while a 2mV difference may not seem that big it turns out to be at least 185m°C. Anyway the main sources of error now are the thermocouple and adc reference.

How to make an awesome Android dashboard for your embedded widget

– Use a standard protocol. We chose Firmata.

– Play with nice things. We used Ionic, Apache Cordoba and the wonderful BluetoothSerial plugin.

– Remember to either modify the Firmata firmware to use the default 9600 bps speed of our HC06 adapter or change it to work at 57600.

Browserify a Node implementation of the protocol and make a port-like object so it talks over Bluetooth.

– The bluetooth plugin doesn’t like to work with binary data so we improve it.

After writing some glue code you end up with a nice and working control panel. I only got the working part but there are a ton of cool reusable widgets out there (like NexusUI or KievII)

New life for an old tv tuner.

A long long time ago I bought a cheap pci tuner card to listen to the radio. I was able to watch tv with it out of the box but the radio didn’t work. So I patched the driver and it made it into mainline.

Fast forward today, I don’t have a machine with a pci slot anymore but still wanted to listen to airwaves again as my other stereo broke. So I took out the tuner (a tnf 9835) and cooked a simple pygtk app to control it. I looks just like what you can expect from being born from such a quick hack. After trying for a while to make the hardware i2c interface work I settled for a pure software implementation. (also, pull-ups are not optional) I got all the bit masking ops right the first time (but on the other hand that was just copy pasted from drivers/media/tuners with some small edits). It is basically a nice interface around the SN761677.

Compared with my phone the sensitivity is rather poor as it needs quite a bit of an aerial but the sound quality feels better. Alas, the latter is totally subjective.

 

Things are looking different today…

Clearly I’m a glutton for punishment.

Today I decided that I can’t have enough and it was time to upgrade my distro so I can play with newer things (and also because google was nagging me to use an updated browser).

I did a dist-upgrade and that not only completed without a hitch but I also had a bit more of free space afterwards. Previously I hammered the thing and then just gave up all hope.

I’m starting to like this new future were things work like they should.

Now that I jinxed it I went full steam with a do-release-upgrade. It is downloading 3277 files at the blazing speed of 20 kB/s.

Let’s see if I still have a working machine by monday.

Chasing misterious 500 errors within php.

For the last couple of days I’ve been investing time learning doing some (serious) things with Drupal and I quite like it, given that for my previous gig involving php I had to manually compile and patch php 5.2 in order to work with a monstrosity made with Textpattern and CakePHP (and a spice of hand crafted databased code).

Last morning I was almost ecstatic reading about Features and went on to make a new one just to try it out.

I select a few components, hit “Download feature” and after a while, nothing. Same happened with “Generate feature”.

On the error.log I see:
2014-10-29 07:49:31: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 11992 socket: unix:/tmp/php.socket-3
2014-10-29 07:49:31: (mod_fastcgi.c.3329) response not received, request sent: 1106 on socket: unix:/tmp/php.socket-3 for /some_site/index.php?q=admin/structure/features/create, closing connection

That was a bit odd, since the memory limit is set to an ample 256M and it died long before the time limt.

Just to be sure I tried using Apache instead of Lighttpd but no dice.

On the system log I see:

php-cgi[13015]: segfault at bf7c6fcc ip b738201a sp bf7c6fd0 error 6 in libpcre.so.3.13.1[b736d000+3f000]

With that clue I edit php.ini and shave a couple of zeros out of pcre.recursion_limit from the default of 100000. After restarting the server everything worked fine.

I shudder thinking of something that really needs a call stack 100 thousand levels deep. But on the other hand I cut my teeth on a micro with 68 bytes of ram.