Monday, September 24, 2007

Refactoring

I'm doing some refactoring on my spare time. Sounds horrible? Well it is. But I have some old obligations that I'm trying to finish off so I can devote all my time and energy to my new job.

My main advise for refactoring is: take one small step at the time.

For me there are mainly two cases.

1) I know the code needs refactoring but don't know how/ where to start. What I do then is that I try to locate one line of bad code or duplication and try to refactor that. And with every change I run unittests to make sure that the piece of code still works.

2) I know the code needs refactoring and I see room for improvements in every single line of the code. What I do here is still to move in small steps. Change one thing at the time and run unittests. Bad code and little unittesting always seems to be tied together, so chances are you'll end up writing a complete testsuite for the module you're refactoring. But take that extra time! Unittests are essential for refactoring, without it you'll almost never know what the effects of your changes are.

I know that when you have many ideas of code improvements it can be a pain to stop and write tests. I mean, I have an idea and then I wanna try it right away, right? This really is a problem: you have to catch all these ideas or you might forget them, but you can't act on all of them at one time. I most often solve this by writing a comment saying: FIXME and then what I want to do. This has proven to be quite useful; often when you do major refactoring you'll end up not needing some of the old code. The unneeded code is often the one that you had a nice refactoring for. I think the reason for this is that, when you're refactoring you'll see a problem that needs to be solved. You then try out every solution you can find, but sooner or later you'll eventually stick with one and this makes parts of the others obsolete.

That was about what I had to say about refactoring. ( Besides the usual advises: always strive for DRYness, encapulation, orthogonality, etc ). I'll definitely get back to this subject when I finally get to read Martin Fowlers book Refactoring.

Mocking & BDD

First of all I wanna say, if you find a bug try to catch it with a test before you fix it. When you're done, see to that the test passes. This can be hard but I think it is a great way to get better at writing tests; you'll see what you let slip the last time and then (hopfully) take it into accout the next time you implement and write tests. No, of couse it should be write tests & then implement.

I recently came across stubbing & mocking and I must say that it's great. Before I used to think that there are situations you just can't test, even if you want to. At one peticular time I was writing this file reader that needed testing, but since it used a low level C api it seemed impossible to test. Or rather, fit into our testing system. I could of course put loads of test data in our repositary, but that would really increase our testtimes. And, even more important, chances are that I then end up just testing the C api. Well you never know, you might discover some bugs but that's not really what you wanna do, is it?

For me this was a very common problem, I tried to have the tests generate the right sort of data. The problem is that these generations are complex at best and even if you manage to get it right it makes the tests hard to read as hell. Once I actually had the program I currently was developing genereate data and then manually opened it in Excel to manipulate it. Then I saved the different data with smart names like 'data_with_none_existing_article_entry.dat' and so on...

As you can imaging this is very errorprone and you'll eventually end up testing your functions against data that you don't know is right in the first place. So this is not the way to do it, so how do you do it then? The answer is that you mock.

Mocking means that you create an object. Then you tell this object to return whatever you want to have from it. Or here we have to introduce the term 'stubbing'. When you mock you expect the method you mock to be called, if it's not called exactly the number of times you specify an error will be raised. When you stub an object you just supply a couple of methods and if they happends to be called they return the specified value.

Sounds easy? Well it is. Say you developing a class that keeps track of all the grades in a school class. You're storing all the data in a database and then just do some calculations before you show it for the user. There is no need to test the database and it accessors, that belongs to another testcase possibly even to another programmer in another company. You mock the database. This can be done in a number of ways but there exists good mocking frameworks for most languages. So let me show.

def test_calc_median_grade
grades = [2, 2, 3, 3, 4, 4 , 5]
pupil = mock(' A ordinary pupil')
pupil.should_recieve(:grades).and_return( grades )
assert pupil.calc_median_grade == 3
end

By the way, if I do it in a more behavior driven way it would be like
describe Pupil, "grades" do

it "should have a correcly calculated median" do
grades = ....
pupil.calc_median_grade.should eql(3)
end

end

It's hard to explain why the behavior driven way is so powerful, it just is. Try it for a while and you'll see. I too was very sceptical a first.

But now back to the file reader. How do you do that? The trick is to stub /mock the 'open' function. To do this you just have to go on like I showed before but at this peticular time I was developing in python and didn't seem to find a good mocking framework for it. In the end I decided to write my own stubbed object and that's actually very easy (at least in none-typed-languages such as python). So this is what I did:
class Reader:
def open(self, foo = None):
return self
def readlines(self):
return ["foo", "bar"]
def close(self):
pass

class SomeTestCase(testcase):
def test_module_depending_on_filereader(self):
module.Reader = Reader()
# call what ever function you wanna call
# assert what you wan't to assert.

Wednesday, September 19, 2007

A bit about security

I must admit that I don't know that much about the securing of websites. After asking around it seems that bad user input such as XSS (cross-side-scripting) are about 90% of the problem, so if you secure your website from malicious users trying to script you, you've done quite a bit about it.

I'm kind of new to XSS but now it is up to me to prevent it from happening on our site. So lets start with what XSS is. The thing is if you intend to have user input on your site and then present that input, then you are a target for scripting. (Which means that almoste every site out there are) Why? Say that you have a ordinary description field for a user. You would probably use a text area or something. Say that you just take the info from the user and then show it to him/her (I actually have a feeling that it's mostly a him), then it's totaly open for him to put every HTML-tag in the book in it, including the script-tag, and the site will render it next time it is show along with all the other HTML.

If I, for example, go <script>alert('This site sucks');</script> a ugly popup will show next time I reload the site. This could be used, and has been, for porn ads and other disturbing things you don't want on your site. But it could actually be put to use to more harmful things which I nor gonna show och know much about.

So what do you do about it? The answer is to escape it, which means that you translate all the tags into something that won't be interpretated as HTML and therefore just will be shown as text. The former example will then just show '<script>alert('This site sucks');</script>' as your presentation, which isn't that harmful.

You also have to be aware of SQL injections. which is when a user, via input, extends your queries to your database. This way they can get loads of info you don't wont to supply to a random user, such as passwords and creditcard numbers.

These problems is, as I said, very common and therefore most webframeworks have build in HTMLescaping and SQLinject prevention. So just use it. The motto is: 'Never trust user input'. Even if a users shouldn't be able to insert things in your queries or fields via your site they could do many bad things with scripts or commandline utilities such as curl.

I highly recommend that you watch this rails cast (and this and this) on this subject. And by the way, the functiion that escapes HTML for you in rails is h(). Just go : h(<script>alert('This site sucks');</script>) and your problem is solved, just don't forget to put it everywhere where malicious input/output can occur.

Tuesday, September 18, 2007

Get it real. Coding examples.

Enough theory, lets code. I guess you could prove all of your algorithms on paper. But most of the time we use an algorithm from a paper or from examples on the Internet and all you wanna do is to ensure that you've got it right.

Lets take a sorting method as an example. You wouldn't wanna invent an new algorithm for this, you probably wanna stick to quicksort or mergesort. These algorithms has already been proved so... let's go:

First write a test like:

def test_typical_sort()
array1 = [ 1, 5, 7]
array2 = [ 2, 4, 6 ]
sorted_array = sort(array2, array1)
assert sorted_array == [1, 2, 4, 5, 6, 7]
end

this will definitely break, I mean, we don't even have the method. So lets implement it.

def sort(array1, array2)
# insert random sort algorithm here
end

Now your test will work just fine. So lets test for some extremes.

def test_empty_test()
assert sort( [] , [] ) == []
end

And you could go on like that, you probably wanna test for some errorprone data, and so on. So when do you stop? Some say: 'test everything that could possible break'. I don't know if I agree, but it's a good thought. Some very useful hints on what to test are showed in this railscast. (Railscasts are a site with free video tutorial for rails but I think many of the stuff there is applicable for webprogramming in general. Check it out. )

The TDD, Test-Driven-Development

I've been reading about and trying out the tdd. It's of course real great. Especially if you are as errorprone as I am :) . The idea is that you write the tests first and then the code. You first specify a function you want, make sure that the test fails for the right reasons and then implement it good enough to satisfy the test. It's real easy, and powerful.

First I like to say something about testing in general. Some people seems to think it's awkward and go: First I write the code and then I write the same code to verify it's right? Then what's the point? I myself sort of thought something like this and, well, there is no point. Or it wouldn't be, if that was the way you wrote tests. The thing is that you (as you may recall from that algorithms course) can't algorithmically prove, in a general case, that a program works. The thing you do is that you take certain cases. Usally two extremes (empty & infinity) and one perfectly normal testcase and implement it and see that it passes. I sort of did that before anyways but now I do it in a more organized way. A more hands on tutorial on this can be found in the nice python book: Dive Into Python.

I tried out tdd and found to my surprise that it actually increased the speed of my development. One of the reasons for this is that I didn't even had to fire up my browser to test things. Then typos and others are discovered right there on the spot so you save time there. And then you can always lean on your test when your brain seems to have gone on vacation.

The unittests is great but sort of old and don't really fit with OO. I mean, it goes nice and you could really do it well, but the thing about software development, as far as I know, is that it's all about how powerful your abstraction is. Therefore I recommend the behavior driven pattern. I actually had nothing to do with the decision to start using it but now I got the hang of it, I love it. A real great article about this can be found here. (Most things go for unittesting as well...). If I was to give a short intro to behavior driven, I would say that it's all about describing objects and what you should be able to do with them.