A small detour on my way to PyConAR 2016

It’s that time of the year (again) when I rent a car and hit the route.

This time I’m heading to Bahía Blanca in order to help a bit and attend to PyConAR.

At the side of Ruta 51 on Coronel Pringles, a bit after the crossing with Ruta 72 there’s a wonderful lake.
Even if there were about 60 kms left I had to stop to enjoy the day and stretch a bit my legs.

Doomed

So, I’m facing an issue and the best tools so far (or the ones that are the less worse) seem to involve both php and xslt. And a braindead webservice. Go team.
Not totally unrelated, I’m surprised at the amount of stuff that can be found with “depressed developer”.

Why do I even bother…

I tell you it’s for your own good people but no, you keep doing the same horrible things.

While bisecting a nasty bug I land into a monster commit:


$ git show --stat THE_COMMIT_HASH
commit 123456789A04a0d558749337badc0de9deadbeef
Author: root
Date: Tue Aug 4 09:10:16 2015 -0300

THE PROJECT NAME.-THE AUTHOR HANDLE
(files changed...)
38 files changed, 865 insertions(+), 657 deletions(-)

And this is one of the smaller ones. It updates vendor libraries, adds middlewares to our api, changes the authentication scheme and does some touch ups to the web frontend. I couldn’t care less that it was committed as root but the log message is murder to my eyes.

How hard is to understand that doing this is bad for everyone? It’s very easy to do this instead of making a couple of extra commits but when things break you come crying asking me to fix them and instead of being a simple task I have to sift through mountains of unrelated stuff.

You are more than welcome.

Guidelines for C source code auditing and other tales.

The papers and articles at this site are quite interesting, even if a little dated. Somehow I had many of them opened from a couple of days ago but just now took the time to really read them.

Guidelines for C source code auditing: http://www.ouah.org/mixtercguide.html

Syscall Proxying – Simulating remote execution: http://www.ouah.org/SyscallProxying.pdf

An Overview of Unix Rootkits: http://www.ouah.org/iRootkits.pdf

Know your Enemy: http://www.ouah.org/motives.html

…and part the other series: http://web.archive.org/web/20010607083412/http://project.honeynet.org/papers/

10 times.

Somehow this is a grossly exaggerated notion within the software community but in every field we find people that performs better (by some measure) than the average by a very high margin.

I witnessed that with my very own eyes and they are not magical creatures. There’s a lot written about them but they tend to share some common traits of overachievers that make the real difference from the rest:

  • They have a clue about what they are doing.
  • They are focused.
  • They work on things that matter.

(Or as Yosef puts it in http://yosefk.com/blog/10x-more-selective.html, things that aren’t going down the toilet. I like it when he says, “The hardest part of “managing” these 10x folks – people widely known as extremely productive – is actually convincing them to work on something. (The rest of managing them tends to be easy – they know what’s what; once they decide to do something, it’s done.“)

Now, I’m not the brightest bulb but when I tackle a problem I try as much as I can to understand its domain. I ask myself frequently if there’s a better way to approach it, as it’s very rare to come across something so unique that nobody worked on anything that barely resembles it (or that can be applied to the current problem).

I’m working on a system that it’s getting older but the foundation is solid and it shows. Everything makes sense, even if you have no idea about a piece is often it can be found in an intuitive way and the core looks beautiful, even if it’s made with dying technologies. The architecture is very well designed and implemented.

But then people came and started adding little things here and there without very much thought. They built XML files concatenating strings, they copied the routines into 34 (that’s real) places and each one has a little difference (that’s kinda ok, they talk to things so horrible that can’t process CDATA fields and use a custom encoding instead). The idea of having all the common stuff in one place never crossed their minds or, shiver, use a standard library (they existed and were mature back when those things were implemented).

They also wrote lots, and lots, of functions like (php):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
private function frobnicate($the_frob)
{
    // Selects the baz of the frob.
 
    // If it's one kind of baz
    if($the_frob->is_of_type('one kind of baz'))
    {
        return $this->baz = 'Baz1';
    }
 
    // If it's a special baz
    if($the_frob->is_of_type('a special baz'))
    {
        return $this->baz = 'Special';
    }
 
    // If it's a straw one
    if($the_frob->is_of_type('a straw one'))
    {
        return $this->baz = 'nuts';
    }
 
    // ... snip about 200 lines of the same ...
 
    return $this->baz = 'the default value';
}

And this is repeated in about 56 places, intertwined with many more conditionals. I only hope this was generated code and not typed by hand.

Anyway, I nuked it and turned that wall of if statements into:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private function frobnicate($the_frob)
{
    // Selects the baz of the frob.
 
    $map = array(
        'one kind of baz' => 'Baz1',
        'a special baz'   => 'Special',
        'a straw one'     => 'nuts',
        // ... you get the picture ...
    );
 
    foreach($map as $type => $baz) {
        if ($the_frob->is_of_type($type)) {
            return $this->baz = $baz;
        }
    }
 
    return $this->baz = 'the default value';
 
}

And that’s even not clever (table driven programming has been around for quite some time).

I mean, after doing it three times I question if there’s a better, more concise way of expressing the same. But for some people that moment never comes.

Focus.

These days I’m having trouble to keep focused for more than four hours straight, make that six in a very very god day. I guess that’s a given with age and more responsibilities. Doing boring stuff doesn’t help either, but it’s a good incentive to finish as soon as possible without mistakes.

Do things that matter.

This deeply touches me.

Nowadays most of the stuff I do at work to put food on the plate is meaningless and boring to death. It makes life easier for a lot of people but nothing will change if it goes away overnight. Many will cry but nothing terrible.

It doesn’t give me a technical challenge any more. At best it teaches me patience and how to deal with utterly broken and stupid systems that were not designed to be used (not by computers and certainly not by humans). It drains my energy and I’m past the point where it makes sense to put up with it.

It doesn’t make the world a better place, not even by chance.

Do I do things that matter? Yes, on weekends and sometimes by night.
Can I make a living out of them? Not now.
Not yet.

A statistical insight

I’ve been working during the weekends on an instrumentation frontend to precisely measure the resistance of an RTD sensor using a ratiometric approach.

After building it and waiting a prudential time to let it warm I saved an hour of samples (3600) and fired Octave.

The mean and standard deviation looked ok and while a plot showed a bit of noise it was well within reasonable limits.

Just for the sake of it I did a histogram and, oh the horror:

This is clearly not OK. It should be more like a Gaussian (the real formula is quite daunting but still retains symmetry) and that looks a lot like a bimodal distribution. Changing the number of bins does not help.

The ADC I used does not have a reference input so I make two differential reads and then take the quotient (I know… but it was the only one available when started).

Perhaps the input multiplexer is at fault? (the unused channels are grounded, so I discarded that as a cause). I repeated the experiment but this time doing a full run on each channel instead of switching them and this is the result:

Well, both are skewed so there’s something else going on.

Scoping at the inputs shows what seems to be AM at around 70MHz even without power applied (that’s on the tv broadcast band here) and it kind of makes sense because I didn’t use a shield. Head bangs on the desk.

Anyways, using a quick digital filter makes everything look nicer but I’ll still have to shield this:

The transient at the beginning is not going to be an issue, as in real life I don’t expect such a step change (from 0 to ~3k) and in any case the antialias filter will get rid of it.

On a second thought, those chunks skewed up are really interesting and I should spotted that as a failure symptom earlier.

Learning again

I got into computers at the age of 8 (1994), when a very weird lady came to our school and offered to teach us. A week after that she was back with a bunch of friends and unloaded a stack of XT’s and AT’s on an empty room.

We spent two months doing exercises on paper and then she introduced us to QBasic (and DOS).

After that I continued with Visual Basic (and stopped at 6, I made a couple of programs for a very good amount of money), assembler for PIC micros. I also used C with TurboC, DJGPP and even Visual C.

I stumbled upon Python and its community and I fell in love. It is still my go to language for almost everything, even if only to try out solutions and then port them to the target language.

The last major event was a couple (and a bit more…) years ago when I started to use more and more Javascript, from plain servers using express to fully encompassing frameworks like LoopBack. From big iron to smaller micros (besides code isomorphism is really interesting)

After that I toyed with some languages but nothing really caught my attention. Among them I used Ruby, Haskell, Go and a bit of Java just for kicks or helping out long time friends. Also TCL, I   liked it as an extension language but I wouldn’t write an application from scratch using it. Go is really interesting (mainly because Docker and all the tools revolving around it) but I really need some itch that calls for it.

The major breakthrough came when I rediscovered Forth and Erlang.

Forth is really interesting as an embedded language because it’s so easy to implement and there are so many things powered by it.

Erlang was on my radar for a while and has, at least on my circle of friends, this aura of mystery.

After a post on PyAr I decided to join a local group and give it a go. I started with the exercises of Learn You Some Erlang and I couldn’t be happier.

It makes my head twitch and think in new ways. I just lust for the time of the day devoted to it.

Breaking a simple captcha with Python and Pillow

A while ago one of our long time customers approached us to automate tasks on a government portal. At least here most of them are kind of ugly, work on a specific set of browser versions and are painfully slow. We already helped him with problems like this before, so instead of having someone enter manually all the data they just populate a database and then our robot does all the work, simulating the actions on the web portal.

This one is a bit different, because they introduced a captcha in order to infuriate users (seriously, it looks like they don’t want people logging in).

Most of the time they look like this:

The first thing I tried was to remove the lines and feed the result into an ocr engine. So I made a very simple filter using Pillow:


#!/usr/bin/python

from PIL import Image
import sys, os

def filter_lines(src):
    w,h = src.size

    stripes = []
    ss = {}

    for x in range(w):
        count = 0
        for y in range(h):
            if src.getpixel( (x,y) ) != (248, 255, 255):
                count += 1
        if count == h:
            stripes.append(x)

    for x in stripes:
        for y in range(h):
            src.putpixel( (x,y),  (248, 255, 255) )
    return src

if __name__ == '__main__':
    src = Image.open(sys.argv[1])
    region = filter_lines(src)
    region.save(sys.argv[2])

Now it looks better but after trying gocr and tesseract it still needs more work:

Just for kicks I decided to filter 100 images and overlap them, this is what I got:

That is interesting… I used this script (not the most efficient approach, but still..)


#!/usr/bin/python

from PIL import Image
import sys, os

dst = Image.new('RGB', (86, 21) )

w,h = 86, 21

for x in range(w):
    for y in range(h):
        dst.putpixel( (x,y),  (255, 255, 255) )

for idx in range(30):
    src = Image.open('filtradas/%i.bmp'%idx)

    for x in range(w):
        for y in range(h):
            if src.getpixel( (x,y) ) != (248, 255, 255):
                dst.putpixel( (x,y),  (255, 0, 0) )

dst.save('overlapeada.bmp')

With this piece of information I can focus my efforts on that area only.
That font, even distorted, looks quite familiar to me. And indeed it is, it’s Helvetica.
This makes the problem a lot easier.

I grabbed a bitmapped version of the same size and made a grid that shows were can a number land assuming 8×13 symbols:

This shows that there is a slightly overlap between digits.
I went for a brute force approach, dividing the captcha in cells and comparing each one with every digit on the font with a small amount of overlap between them.
The symbols are smaller than the cell, so for every one of them I build regions on the cell and assign a score for the number of pixels that are equal on both.
The one that has a highest score is (likely) the correct number.

This is really simple, event tough we do a lot of comparisons performs ok (the images are quite small), and without tunning we got about 30% success rate (the server also adds noise and more aggressive distortions from time to time).

Have a difficult or non conventional problem? Give us a call, we are like the A-Team of technology.

This is the complete algorithm (it’s in Spanish but shouldn’t be hard to follow), can also be found here: https://gist.github.com/pardo-bsso/a6ab7aa41bad3ca32e30


#!/usr/bin/python

from PIL import Image
import sys, os


imgpatrones = []
pixelpatrones = []

for idx in range(10):
    img = Image.open("patrones/%i.png" % idx).convert('RGB')
    imgpatrones.append(img)
    pixelpatrones.append( list(img.getdata()) )


def compara(region, patron):
    pixels = list(region.getdata())
    size = min(len(pixels), len(patron))

    res = 0.0
    for idx in range(size):
        if pixels[idx] == patron[idx]:
            res = res + 1

    return res / size


def elimina_lineas(src):
    cropeada = src.crop( (4, 1, 49, 19) )
    w,h = cropeada.size
    stripes = []

    for x in range(w):
        count = 0
        for y in range(h):
            if cropeada.getpixel( (x,y) ) != (248, 255, 255):
                count += 1

        if count == h:
            stripes.append(x)

    for x in stripes:
        for y in range(h):
            cropeada.putpixel( (x,y),  (248, 255, 255) )
            cropeada.putpixel( (x,y),  (255, 0, 0) )

    return cropeada

def crear_crops(src, celda):
    limites = range(38)
    xceldas = [0, 8, 16, 24, 32, 40]
    xoffsets = range(-3,4)
    yceldas = range(6)
    boxes = []
    crops = []

    x = xceldas[celda]
    x = [ (x+off) for off in xoffsets if (x+off) in limites ]

    for left in x:
        for top in yceldas:
            boxes.append( (left, top, left+8, top+13) )

    for box in boxes:
        crops.append( src.crop(box) )

    return crops

def compara_crops_con_patron(crops, patron):
    scores = []
    for crop in crops:
        scores.append( compara(crop, pixelpatrones[patron] ))
    return max(scores)

def decodifica_celda(src, celda):
    pesos = []
    crops = crear_crops(src, celda)

    for patron in range(10):
        pesos.append( compara_crops_con_patron(crops, patron) )

    return pesos.index( max(pesos) )

def decodifica(filename):
    original = Image.open(filename)
    src = elimina_lineas(original)
    res = []

    for celda in range(6):
        res.append( decodifica_celda(src, celda) )

    return ''.join( str(x) for x in res )

if __name__ == '__main__':
    print decodifica(sys.argv[1])

How to make an awesome Android dashboard for your embedded widget

– Use a standard protocol. We chose Firmata.

– Play with nice things. We used Ionic, Apache Cordoba and the wonderful BluetoothSerial plugin.

– Remember to either modify the Firmata firmware to use the default 9600 bps speed of our HC06 adapter or change it to work at 57600.

Browserify a Node implementation of the protocol and make a port-like object so it talks over Bluetooth.

– The bluetooth plugin doesn’t like to work with binary data so we improve it.

After writing some glue code you end up with a nice and working control panel. I only got the working part but there are a ton of cool reusable widgets out there (like NexusUI or KievII)

New life for an old tv tuner.

A long long time ago I bought a cheap pci tuner card to listen to the radio. I was able to watch tv with it out of the box but the radio didn’t work. So I patched the driver and it made it into mainline.

Fast forward today, I don’t have a machine with a pci slot anymore but still wanted to listen to airwaves again as my other stereo broke. So I took out the tuner (a tnf 9835) and cooked a simple pygtk app to control it. I looks just like what you can expect from being born from such a quick hack. After trying for a while to make the hardware i2c interface work I settled for a pure software implementation. (also, pull-ups are not optional) I got all the bit masking ops right the first time (but on the other hand that was just copy pasted from drivers/media/tuners with some small edits). It is basically a nice interface around the SN761677.

Compared with my phone the sensitivity is rather poor as it needs quite a bit of an aerial but the sound quality feels better. Alas, the latter is totally subjective.

 

Things are looking different today…

Clearly I’m a glutton for punishment.

Today I decided that I can’t have enough and it was time to upgrade my distro so I can play with newer things (and also because google was nagging me to use an updated browser).

I did a dist-upgrade and that not only completed without a hitch but I also had a bit more of free space afterwards. Previously I hammered the thing and then just gave up all hope.

I’m starting to like this new future were things work like they should.

Now that I jinxed it I went full steam with a do-release-upgrade. It is downloading 3277 files at the blazing speed of 20 kB/s.

Let’s see if I still have a working machine by monday.