When abstractions attack!

Had an interesting little bug report today. Apparently someone has created an email address with an ampersand in the local-part. While this is perfectly, valid, it means we enter a world of pain when someone decides they want that email address published on a web page.

Anyone who's ever done web or XML stuff will know that & is a special character used to define other characters. For example, if you want to draw a greater-than character (>) in HTML, you have to type > rather than directly typing > as greater-than is one of the HTML (and XML) delimiters.

So noting this bug report, which was that when the user tried to use our drag-and-drool WYSIKCTWYG (What You See Is Kinda Close To What You Get) editor, adding the mailto: link resulted in & being inserted, and there didn't seem to be any way around it, including editing the source.

So I started thinking about the many levels of abstraction and translation in our application. It's quite staggering when you think about it, and any modern, complex application is likely to have similar layers.

Starting at the front-end, you've got the JavaScript editor control, which uses the browser's DOM to translate things into HTML in a GUI. Next the page is stored in XML according to schemas defined in the CMS. These, in turn, are stored in a database -- I've no idea if they're stored natively in XML or translated yet again.

When the content is published, the whole thing happens in reverse except that the HTML is generated by stitching together all the little snippets according to the templates. Finally we see for sure whether anything went wrong.

It's quite amazing how many layers of abstraction and translation there are. This is actually a pretty simple bug, and could probably be resolved in the JavaScript editor. Of course, given there's an easy work-around (change the email address to not have an ampersand) I'm unlikely to get around to having it fixed.

My justification for insisting they change the email address is that this system isn't likely to be the only one that has problems with the ampersand. I know for a fact that there are hundreds of broken email address validation systems out there that don't allow a whole stack of perfectly-valid characters in email addresses.

Virgin Blue confirmation doesn't print

Virgin Blue confirmation printing

Maybe I'm expecting too much, but if I were the web developer at Virgin Blue, I'd test that the itinerary lookup page printed alright. It's the kinda page your customers might think would be worth printing.

Instead you get this absolute dogs breakfast. WTF? Before you ask, this is what printing looks like in Internet Exploder 6 and 8, which probably accounts for 80% of your customers. I did try Firefox first, then reverted to IE to see if it fared any better.

I don't need fancy styling, but having the text I'm interested in reading all overlapping isn't what I had in mind.

What are Planets for?

Russell is discussing the purpose of Planets, the feed aggregators many of us enjoy. I don't think it's ever really been enumerated, and it probably should. But I bet it's different things to different people.

I enjoy the Planets for the rich and varied insights you get into peoples' lives. We're not all one-dimensional geeks, though I imagine those reading Planet Linux Australia might not have worked that out since I've been censored there.

Russell discusses etiquette for the planet operators. I strongly agree. I've found myself added, then censored from some Planets without any interaction from me -- yes Planet Linux Australia's operating cabal, I'm talking to you. Without some sort of policy, it seems these sites are just run at the whim of the person with the password.

The idea of separate Planet installs, one that's filtered to be solely on-topic and one that shows everything, is a good idea. I think that would resolve many of the complaints about off-topic posts for those who aren't interested in everything. Another thing I'd like to see is the banning of Twitter. The inane wibblings of people who are clearly deranged isn't something of any interest. And there's certainly nothing on-topic in there. Less coherence and thought than goes into a text message seems to be applied to average Twitter post. Enough already!

One thing I would recommend is to not run the official Planet software. It's breaks badly on perfectly-valid feeds, and seems to be abandomware. Try Venus instead, which has refactored much of the code.

Note to self: use prototype

Gah. I spent far too much time today trying to work around a weird JavaScript bug in Firefox. I had a form that needed to have different hidden parameters depending on the search type in a select list (ahh, the joys of integrating disparate systems). The rewriting worked just fine the first time, but when the user switched back it'd break in a strange way.

Just for relief I tried doing it the Prototype way. Not only was it more readable, logical and shorter, it magically didn't encounter the same bug.

I think it's become time for Prototype to be available throughout the site I work on. I've used it for a few small, targetted applications, and it's wonderful. Time for it to be used everywhere.

First, however, I need to see if I can lazy load it.

Music taxonomy

Back when I used to listen to music on pieces of etched plastic, I had a simple classification system. Music was slotted into one of "Chilled", "Techno", "Hip hop", "Rock/Pop", "Classical", "Jazz", "Other". There could easily be overlap there, but because I was doing the classifying, it made sense to me.

When I started listening to mp3s, I carried over this basic system, with Rock and Pop split to separate categories and a couple of new ones added: "Reggae", "Radio" (for podcasts), "Country" (for Johnny Cash). My musical tastes got broader too, where in the 1990s I listened almost exclusively to electronic music, I started getting back into Rock, and some of the new musical forms like Post Rock (a ghastly term, worse than "Progressive <anything>".

The cracks started to show in my taxonomy. Godspeed! You Black Emperor ended up in "Chilled", yet Tortoise and M83 ended up in "Rock", by virtue of using guitars. Goldfrapp is in "Chilled", despite mostly being stomping electro, but something you could go to sleep to. I dunno, it doesn't make much sense now.

My new toy has prompted me to revisit this issue. I've long avoided music library software, preferring to browse by my own folder structure, and I despise shuffle in most circumstances, either choosing individual tracks or listening to whole albums. The Squeezebox allows me to browse by my directory structure, but it also has a bunch of other useful and cool ways to slice and dice my music, so the time has come to get my music tagged properly.

It's going to be a long process, tagging all my music. I've generally gone for <artist> - <album>/<trackno>.<trackname> but with so many, for example, Bowie albums, that's getting unmanageable too. So I might end up with artist/album/ after all, though I will never do the brain damaged thing iTunes does and move stuff from compilations into individual artist directories -- that's just dumb.

So I'll shortly be starting the Great Retagging. Any suggestions on tools and approaches?

Postcodes in electoral divisions

This is just to notify geeks out there that this data exists. I'm sure this will be of real interest to someone out there, perhaps someone building a mySociety for Australia.

The federal parliament's parliamentary library just published a paper called Postcodes in electoral divisions, and describes the postcodes in federal electoral divisions and the percentage of that postcode that is within a division. Very handy stuff, for a very narrow group of users.

The copyright statement at the end is probably boilerplate, because it certainly doesn't match up with the publication of such a useful dataset, paid for by the public.

PS: It's well worth signing up to library feeds as there's a wealth of useful stuff issued by this organisation. I've particularly appreciated the bills digests, which explain the purpose of bills before parliament in quite neutral language.

ABC peering

I seem to have been misunderstood here. My post was rather rambling, for which I apologise, though that's the nature of blogs really. The ABC does, indeed, peer with ISPs through PIPE, and no doubt peers whenever approached to do so.

The problem is that they've started delivering their multimedia content (i.e., the chunky stuff) through the Akamai content distribution network. This makes it difficult for the ISPs to work out what is ABC content and what is from somewhere else, as Akamai have a lot of customers.

Now every ISP of anything more than puny size has an Akamai box somewhere in their network, so most of the content would indeed be coming locally, but the point of Akamai's network design is that if the local box is overloaded, doesn't have the latest content, is down, or any other reason, it falls back to the next nearest suitable node. So this content is normally local, but might not be.

What I'm suggesting is that the ABC work with Akamai and ISPs to work out a solution to this. It could be that ABC content almost always comes from the local node. Could it even be always except when the local node is down, in which case the content comes from the ABC network via peering? Dunno. Bigger brains than mine, who know the area well, should work on it.

My simplistic thoughts sees some kind of multicasting over their peering links to ISPs that sign up to the program, and something on the ISP network caches the content. That way the ISP can push the content as far out into their network as demand requires. Though I suppose that's pushing the cost onto the ISPs, so there's probably a better approach, and smarter minds than mine should be brought to bear on the problem.

Hell, for those of us with sweet, sweet megabit-in-the-teens ADSL2+ links could get really, really nice qualkity.

ABC Playback and quotas

"Sam" comments on Stigherrian's blog about some of his ABC Playback feedback. In particular, he points out that the video is highly compressed because they're mindful of "download caps that the public has to contemplate". There is a solution for this.

Download caps in Australia are a response to the high cost of transit bandwidth. They're a principle that is easy to understand by consumers, but actually bear only a passing relationship with the actual costs of bandwidth. Bandwidth is actually more usually charged, at wholesale level, based on the size of fixed pipe or a pipe's average utilisation. So a gigabyte of download from a consumer is only a cost if it happens at a time when the pipe is under heavy use. Hence the "off-peak" bandwidth allocations from some ISPs.

Now the majority of these costs are for international bandwidth, but the difficulty of determining whether a given chunk of traffic is interntational or local bandwidth, explaining this to consumers and the inevitable disputes led to bandwidth just being charged in a single block. To enhance their offerings, some ISPs offer "free" bandwidth for specific resources, generally ones that are hosted on cheap bandwidth and don't consume any international bandwidth.

The actual bandwidth costs are on a descending scale that looks something like this: international, national (other network), national (own network), ISP's own data centre(s), peering links (but note Telstra refuses to do this), exchange to consumer. So one of the cheapest sources of bandwidth is free peering links, and the ABC sensibly does do such peering.

So the solution here is actually quite simple. The ABC needs to peer with ISPs in a free and open manner, ideally in at least all the major capital cities. The video downloads that will be supplied when ABC Playback is eventually launched would then be a "value add" for ISPs that want to supply it for free.

The BBC has done something exactly like this, though they don't have broadband caps in the UK so it was actually done for rights-management reasons. That way the Beeb can be sure that a consumer ISP service is actually supplied to a UK address, meaning the person has (theoretically) paid the TV Licence fee and so is entitled to the media.

Now there's a problem here that I haven't mentioned. Due to The Chaser vodcasts' popularity last year, the ABC started using Akamai's content distribution network to distribute their multimedia content. This caused my ISP, Internode, to stop including this content in their "unmetered" offering, because they can't distinguish between ABC content and other Akamai-hosted stuff.

It shouldn't be too difficult for the ABC, ISPs and Akamai to come up with a solution, and by offering it on the same basis to all ISPs, there's no conflict with the ABC Charter. It would also mean that the ABC can make available vastly better-quality streams, and be pretty sure that the downloader is Australia-based.

So how about it Auntie?

Recognition at last!

Massively brained

Finally, someone has noticed that I can never find hats or bike helmets to fit. The reason is obvious. I'm massively-brained, according to Stigherrian.

Next step: forming the "massively-brained" support group, with an online shopping service for the larger-endowed cranium. I can just see the spams now: "Impress her with your massive... Sudoku score"?

ABC Playback shouldn't be hard to hack

Despite ABC Playback being touted as Windows/Mac support only, it should be pretty easy to hack up. Looking at it on Windows this morning with the trusty Live HTTP Headers Firefox extension switched on, I've grabbed what seem to be the important URLs, which are XML files.

The actual interface is embarrassingly tedious. They've tried for the latest and greatest in Flash crappiness, which probably means it'll be slow as a dog on a low-end computer. Fortunately, the data is passed through XML, so it'll only be a short time before someone writes a MythTV plugin to grab, parse and play.

There's a channel list, which corresponds to the three channels: ABC CatchUp, ABC Real and ABC Shop. Loading up one of the URLs given for a channel, you get a listing of the content on that channel, including where to get the file. Seems to be you need "rtmp://cp44823.edgefcs.net/ondemand" and then the bit after.

So, despite being annoyed they're blowing our money not supporting all operating systems and instead building crappy Flash stuff, we should be able to bend it to our will.