Saturday, September 13, 2008

Why emacs can survive without automated tests

Automated tests are an accepted best practice in software engineering, and it's hard to find a recently-formed, well-run software organization these days that doesn't have some form of automated series of small tests run against their codebase.

The argument for automated tests is relatively simple: without them, there is no easy way to verify that a release of a project works. Manual testing is slow and expensive, and all the work you put into it is lost when the next release comes out. Furthermore, with frequently run automated tests, you can find out very quickly what changes caused which bugs, and so revert or fix those changes.

Given all that, how does emacs survive, given that it has a huge codebase and no tests?

The best thing it has going for it is that it's a desktop app. That means that users have to upgrade from one version to the next, as opposed to web apps in which all users are upgraded when the server version changes. Since users have to upgrade, and are in control of which version they get, user's versions range from very old to a build of yesterday's CVS head. That means that a certain percentage of users are going to be on the bleeding edge. Those users are either contributors on the project, or those that can tolerate increased bugginess.

Another key is that emacs is a tool people use every day. In fact, it's fair to say that everyone who is contributing to emacs is probably using emacs as their development environment. So those contributing are using the CVS head, and are actively using the latest version.

Let's assume that 1% of the emacs-using population is running off of a recent version of CVS head. So, 1% is exposed to the latest code, and since these same people use it all the time, to do a diverse variety of things, they are testing it continuously, while getting their work done at the same time. Not only are they testing it, but since these are power users, they know how to fix most problems they encounter.

For example, imagine two developers, each working off of CVS head. One checks in a change that breaks dired. The second will get the latest changes, build it, run it, work on it, notice the error, know how to fix it, and can submit a fix immediately.

Eventually, at some time period after any particular changes is made, it can be assumed to be safe, because enough emacs users have been running systems with this change in, and have not had a problem.

However, this implies that if emacs ever wants a stable version, it has to stop non-bugfix checkins, otherwise there will always be a change that hasn't been fully vetted yet. And this is what emacs does - for a new release there is a long phase in which no new functionality is put in.
It's worth noting that this phase is particularly long. Emacs release cycles for major versions can last years.

Let's imagine a world in which emacs did have automated tests of reasonable coverage. What would change?

First, the codebase would be more stable overall, so if you are an emacs contributor, you can expect a less buggy environment. This certainly would make contributing more efficient, since you can concentrate on functionality, not fixing bugs.

Secondly, the feature freeze stage could be significantly shortened. If the unit tests were extremely comprehensive, one could imagine that no feature freeze would be necessary.

All in all, then, in exchange for the considerable effort of developing unit tests for emacs, we can expect a codebase that can change must more rapidly and radically. Whether this is a tradeoff that makes sense I can only guess. Emacs is a mature product, not going through many radical changes anymore. But, part of the reason that radical changes are not happening is not because they aren't needed, but because it is frankly just too hard. It would take a very brave developer who would take on making emacs multithreaded, or switching to Scheme as a basis for elisp. That could change with unit tests. The tasks would still be hard, but it would not be so destabilizing anymore. And automated tests wouldn't take much bravery to write, so I think automated tests may be a path to allow the more radical changes in emacs's future.

What these tests would look like is a subject for another post.

Wednesday, August 06, 2008

Code is the easy part: Preventing data corruptions

Coding, coding, coding. There's an awful lot of attention paid to coding in the software engineering world. I read redditt's programming category a lot these days, and most software engineering posts are about specifically code. I think this is not because coding is the most important part of being a programmer, but because it is the most interesting and fun part.   Code is easy to change. If you have a bug, you reproduce it and fix it, and then it goes away. Sometimes it's difficult, but even if it is, you fix it, make sure it doesn't happen again by writing test cases (preferably before you fix it), and then you forget about it.

Let's contrast the fixing of bugs in code, with fixing corrupted data. Corrupted data in any reasonable system is probably going to be caused by a software bug. So, you first have to find out why the data is corrupted.  Of course, maybe the corruption is old, and the bug has been fixed.  Hard to say.  You have to investigate.  You could try and reproduce it, if you can figure out a likely scenario.  Most likely you will fail to reproduce it, and even more likely is that you know you will fail to reproduce it, so you don't try.  So instead you look at the code, seeing how it could happen.  This is a useful practice.  You can sometimes find the cause here.  Then you can write a test for it and fix it.   After you fix it, your work isn't done, of course.  You have to actually fix the corruptions.  This could be a simple as running a SQL query, or as complicated as writing a script to patch things up using a code library to do the work.   Of course, sometimes you never can find out why your data is corrupted, so you just have to fix it, if you can figure out how.


Some amount of corrupted data is an inevitability, and in fact some may come from design decisions.  For example, some database systems cannot do two-phased commits, and if you need to hook into one, you may have to accept a certain amount of data corruption due to not having atomicity in your transactions.  If the error rate is very low, some corruption may be a fair price to pay for whatever benefits the second system is getting you.   Even so, this is dangerous, and a low error rate today may be a severe error rate tomorrow, leaving you with a lot of angry customers with corrupted data, and a few dejected developers who have to clean it all up.

There's a few best practices you can do to avoid data corruption
  • Use one transactional system with ACID properties, and use it correctly.
  • When using SQL, use foreign key references whenever possible. 
  • Before saving data, assert its correctness as a precondition.  This includes both the values stored, and the relationship of the data to data that both links to it, and is linked to.
  • Create a data checker that will run a series of tests on your data.  This is basically like unit-tests for data, but you can run it not only after a series of tests, but in your production system too.  Run this program regularly, and pay attention to the output.  You want to be alerted of any changes in the sanity of your data.  Like unit tests, the more invariants you encode into this tool, the better you will be.  When changing or adding data,  modify the data checker code.  After each QA cycle, run the data checker.  Any errors should be entered as bugs.
  • If your data can be repaired, have an automated data repairer.  This shouldn't be run regularly, because you don't want to get too complacent about your corruptions.  Instead, if you notice that the data checker has picked up some new errors, then you run this, modifying it first if the errors are of a new type.

Doing a good job on all these tips should prevent most data corruption, but not all.  Like bugs, even the best preventative measures will not guarantee success.

Having clean data is extremely important.  This data is not yours, it is your customer's, and they trust you with it.  You need to protect it, and it isn't easy, but preventing data corruptions is always the right thing to do.

In org-mode, abandoning GTD

I've already mentioned the problem with contexts in the GTD system. In using org-mode, I've come to realize that another important concept of GTD is either flawed or redundant: the next action.

Next actions seem good at first. Every project has a next action, the action that is a short-doable task that will advance the state of the project. Of course, some projects have several actions you can do in parallel, so there are several next actions. As the number of projects you have grow, you get more and more next actions. At some point, the number of next actions becomes too much to keep track of. Some people might say this goes away with proper contexts. However, I've never found a good use for contexts, because in truth at any point I can do any of my next actions. Another strategy for getting rid of excessive projects is to move some to a someday/maybe folder. I think this is reasonable, but sometimes you know that in a particular week you are just going to work on a few different projects, perhaps because of deadlines or other prioritization. What then?

In org-mode, I solved this problem by using the agenda, and scheduling my next actions that I wanted to work on for the current day. I would then see a list of the next actions I had to accomplish that day. If I didn't get them done that day, the next day I'd move them up a day, to the current day. It ended up being a daily-planner-like system a lot like Tom Limoncelli recommends in Time Management for System Administrators. But with next actions.

Last week I had a revelation: scheduled items, for me, were the same as next actions! So I removed all next actions, just having states TODO, WAITING, and DONE. Like next actions, when I finish a scheduled item, the next item in the project becomes scheduled. I like my new system. It combines the flexibility of Limoncelli's day-planner system with the project-planning of GTD.

I spend my day as before, taking tasks from the day view of org-mode, and using a weekly review to schedule or de-schedule items from that view. I also add notes to tasks all the time, which proves to be helpful. I think I've benefitted from next-actions, since they force you to think of actions at level of granularity such that each task is specific, concrete, and immediately actionable. I treat tasks the exact same way now, even without next actions themselves.

For more info about org-mode, check out the talk on org-mode which happened after I invited Carsten Dominik to give a talk at Google about his system.

Friday, August 31, 2007

org-mode

In my previous post on the problems with GTD, I mentioned a bit about using emacs' org-mode to implement a GTD-like system. I've been using it for a while now, with good success.

I've previously used kGTD, as well as iGTD, but I like org-mode the best. With any other application, you are quite limited in what you can do. You can basically do what they want you to, maybe modulo a few scripts or preferences. With org-mode, the basic functionality is extremely flexible, and since it's all just a bunch of lisp code, you can rewrite and alter it to your heart's content.

org-mode, at it's simplest, presents an outline. You can expand and collapse it. Like other programs, you can put all sorts of stuff in the outline, in a very natural matter. For example:

* Fix bug 1919
** Reproduce the bug
This only breaks for very old users
** Write test case for bug
The stars here represent the indentation level. It's all just text, folks. This is emacs, what else would you expect? But notice that I can just add random text anywhere. It will be collapsed along with the "Reproduce the bug" header. This is very convenient place to put notes when you are working. As other programs do, you can have links. When I write "Fix bug 1919", the "1919" can link to my work's bug tracking system. More impressively, this being emacs, I can link to a specific location in a buffer. Opening it takes me right there, and off I go to do whatever work I need to.

I also use TODO tags, which are states associated with each work item. So, if this is an item that can become, in the GTD sense, a "next action", then it has a tag. There are four tags I use:
  1. TODO: An item which is still to be done, but is not yet actionable.
  2. NEXT: A "next action", which should be done.
  3. WAITING: An action that is waiting on some outside signal to either go back to NEXT or DONE.
  4. DONE: Finished. Hooray!

So, the above outline is more like:

* Fix bug 1919
** NEXT Reproduce the bug
This only breaks for very old users
** TODO Write test case for bug

Here is some customization code to enable this, as well as to enable useful agenda commands:

(setq org-todo-keywords '("TODO" "NEXT" "WAITING" "DONE"))
(setq org-agenda-custom-commands
'(("w" todo "WAITING" nil)
("n" todo "NEXT" nil)))

I like to, with one keystroke, mark an entry done and change the next one to NEXT. I wrote the following lisp code to do it:

(defun ash-org-todo-item-p ()
(save-excursion
(beginning-of-line)
(looking-at "\\*+[ \t]+TODO\\>")))

(defun ash-org-mark-done ()
(interactive)
(save-excursion
;; org-entry-is-done-p has a bug where if you are at the first
;; char of a line it doesn't always work. Let's work around it
;; here.
(end-of-line)
(when (not (org-entry-is-done-p))
(org-todo 'done)
(outline-forward-same-level 1)
(when (ash-org-todo-item-p)
(org-todo "NEXT")))))

(define-key global-map "\C-c\C-x\C-c" 'ash-org-mark-done)

The preceding code gives me a pretty nice, basic GTD system. On top of this I've been adding my own tweaks. One thing I like to do is to keep track of what I'm working on at the current time. This helps me keep focused. Org-mode has a way for keeping track of what you are doing at any one time, which is a clock which you can start or stop on any particular entry. Using this as a basis, I wrote the following nifty function for binding the F9 key to go back to the last current item. I know an item is current if I use the clock-in feature.

(defvar ash-org-current-task-loc nil
"A cons of the buffer & location of the current task")

(defadvice org-clock-in (after ash-org-mark-task activate)
"When the user clocks in, bind F9 to go back to the worked on task."

(setq ash-org-current-task-loc (cons (current-buffer)
(point)))
(define-key global-map [f9] (lambda ()
(interactive)
(switch-to-buffer
(car ash-org-current-task-loc))
(goto-char
(cdr ash-org-current-task-loc)))))

This helps me mark a task as complete, I just hit F9, then C-c C-x C-c, which closes it out (stopping the clock and printing elapsed time), and marks the next item done. Pretty nifty!

This isn't enough, yet. I'm still tweaking the system, trying to perfect it to suit the particular problems I have. I may post more on this in the future.

Saturday, August 04, 2007

The problem with GTD

David Allen's productivity system, GTD (standing for Getting Things Done), continues to be more and more popular. If you are all at interested in being more productive, than likely you've heard of this already. I follow it myself, using emacs and org-mode.  Well, actually, when I say I follow it, I currently only follow part of it, because part of it is just not useful.


There are a few key concepts in GTD, but the most often applied are next actions, contexts, and weekly reviews.   I think that next actions are an important concept.  The general idea is that in your todo list you break down tasks to a level that each are simple and directly actionable with no planning or meta-level thinking required.  One of these is your next action, which is the very next thing you have to do to move your project forward.   This is a cool thing, because it forces your tasks to be of the correct granularity.  Also, most GTD software will show you all your next actions, so that you can see what you have to work on.  Once you complete one next action, the next todo item will become a next action. For example, if your project is to fix a bug, then the next action here is to attempt to reproduce the problem locally.


Actually, this concept is slightly simplistic.  Sometimes your project can have several next actions.  If your project is to launch a new feature, then you might have several next actions, one to write the introduction and high level overview of a design document, the other is to send out a mail to your manager asking who could possibly be allocated to work on this feature.  A good GTD system should let you mark items as next actions arbitrarily.  org-mode does this, which is one of the reasons I like it.


Another key concept of GTD is the weekly review, which means that you update your tasks and take time to plan new ones.  Good advice.  I have nothing to add to it.


Finally, GTD has the notion of contexts, which are the bread-and-butter of most GTD apps.   Contexts represent some state in which your actions are performed.  One obvious place for contexts is for direct communication with someone.  You might have several projects that have an action that requires you to talk to your project manager.  You can associate those tasks with a context, and now you have a context that you look at to before you go speak to the project manager, to see all the appropriate tasks.  Then when you go talk with them, you have a list of things to ask them.


Sounds vaguely useful.  And that's the problem.  For software engineers and other types of knowledge workers, we just don't have a lot of context.   We are pretty well connected.  Usually things that can be done at work in front of our workstation can also be done at home, through VPN.   I find that most of my actions are to be done in front of my workstation, so the vast majority of my tasks have the context of the workstation.  What I find useful is to not use contexts for these tasks, since they all have a default obvious context.  I use it only for person-related contexts (such as the context for my project manager in my example).  And even then it's only somewhat useful.  It's possible that I could start doing work on my subway ride, in which case that is a useful context.  The problem would be that things done on the subway is a subset of things I can do at work.  Here again the concept of rigid contexts doesn't work.   So when I'm at my workstation, every task if fair game.  If I'm somewhere specialized, a specific context may come in handy.   But, ultimately, for most modern "knowledge-workers", contexts are not something we should be spending a lot of time thinking about.


With contexts, org-mode again comes in handy.  It's not a real GTD app, so it doesn't insist on contexts.  It just lets you tag items, and those tags are your contexts.  So you can put things in multiple contexts, or none.  The tags inherit.  I think it's really perfect flexible solution for the modern GTD user.  Next post I'll detail exactly how I use it and give some customizations.

Wednesday, July 04, 2007

Generics

In my last post, I discussed annotations, and their contributions to the Java language. The short version: they are pretty simple, and surprisingly useful. Generics, also introduced with Java 1.5, is a much larger pill to swallow. Not on the useful part, they are undeniably helpful. They do something very important, which is to make run-time errors visible at compile time, while at the same time increasing the self-documentation of code. This is a great thing. However, there is no doubt that they add complexity. To even document all cases of complexity would take a whole blog dedicated to just that subject. I'll give one example:

foo(List<B> list);
bar(List<? extends B> list);
tor(List<? super B> list);
So, what's the difference between these? Well, suppose B has a subtype C, and a supertype A. So the inheritance hierarchy goes A -> B -> C. You can call foo with a List<B>, but not a List<C>. But you can call bar with a List<C>; as well as List<B>. You can call tor with a List<A> and List<B>. Keep in mind that the contents of the list are not what I am talking about, just the compile-time type of the list.

Generics are a bit more complicated than most people assume. Almost anyone who has spent time with Generics has gotten into several puzzling situations that they can't explain, and where the solution isn't obvious.

The question I wonder about is whether it has done more harm than good. This is not a matter of opinion, I think. If we could count up all the hours of lost productivity in wrestling with generics, and count the hours saved by catching bugs and improving the clarity of the code, would the time saved exceed the time lost? My anecdotal evidence is that it is a net benefit. It has rescued the Java world from the perils of type-casting, and I do remember getting many ClassCastExceptions in the past. Nowadays they are rare, and I don't spend excessive time on the complexities.

Tuesday, February 13, 2007

Annotations

This post is a few years too late. Annotations came out in Java 1.5, and we already are on Java 1.6. But I'd like to make a few comments on them, because I think they are a great example of a low-cost but a high-use feature.

Making improvements to a popular language is always difficult. You want each addition in functionality to be well worth it. Each addition should make things possible that were not possible before, and do so in a way that both fits with the spirit of the language, and results in the least possible disruption.

Java annotations are a good example of a good change. I find I use them more and more, and they enable things I now realize I in fact need. Did I need them before annotations were around? It didn't seem so, but now they are here I would find it difficult to live without them.

I should illustrate how useful they are with a made-up example. Let's say you want a simple webserver, one that took the path and used it to call into a dispatcher object. So you would call http://localhost/doSomething?foo=aaa&bar=bbb. This would use Java reflection to call a method Dispatcher.doSomething(String foo, String bar) . But at reflection time you don't know the names of arguments, since those are all compiled away. So, the first challenge is to figure out which argument is foo and which is bar. How can we do this? Before annotations, I would say you cannot do this. You could do something such as populate a String to String map, and pass that in instead of arguments, but this is awkward for the non-reflection callers, and also doesn't enforce important invariants at compile-time. Luckily for us, we do have annotations, and the annotations stick around at runtime (if you want them to). So, we can make an annotation defined as:

import java.lang.annotation.*;

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.PARAMETER)
public @interface ParameterName {
String value();
}

And we can use it like so:

void doSomething(@ParameterName("foo") String foo,
@ParameterName("bar") String bar) {
...
}


Now reflection can match HTTP parameters up to the method arguments.

But the fun doesn't stop there. What if you wanted your delegator class to only have some methods accessible from the web? You could try to separate into another class that is only web methods, but that overly constraining on the design. Or, you could maintain a separate list of web-safe methods in a configuration file. That solution seems prone to mistakes - developers will forget to update the configuration file, then wonder why their new method isn't accessible. However, with annotations, you can simply tag methods with an annotation showing that they are web-safe. We can create a runtime method-level annotation called WebExecutable and use it like so:


@WebExecutable
void doSomething(@ParameterName("foo") String foo,
@ParameterName("bar") String bar) {
...
}


Now, your dispatcher can check for the annotation before it executes. Instant safety! You can imagine further annotations that may classify whether methods needs authentication or not, or other such improvements.

I've really been digging annotations. They make things possible that never were possible before, and don't introduce much complexity or confusion. There is some new syntax to learn, and some non-obvious things (such as "@interface"), but all in all, the part you have to look at the most is very easy and intuitive. I think it's an ideal addition to the language.

Next, I intend to look at generics, which has quite a different profile.