When abstractions attack!

Had an interesting little bug report today. Apparently someone has created an email address with an ampersand in the local-part. While this is perfectly, valid, it means we enter a world of pain when someone decides they want that email address published on a web page.

Anyone who's ever done web or XML stuff will know that & is a special character used to define other characters. For example, if you want to draw a greater-than character (>) in HTML, you have to type > rather than directly typing > as greater-than is one of the HTML (and XML) delimiters.

So noting this bug report, which was that when the user tried to use our drag-and-drool WYSIKCTWYG (What You See Is Kinda Close To What You Get) editor, adding the mailto: link resulted in & being inserted, and there didn't seem to be any way around it, including editing the source.

So I started thinking about the many levels of abstraction and translation in our application. It's quite staggering when you think about it, and any modern, complex application is likely to have similar layers.

Starting at the front-end, you've got the JavaScript editor control, which uses the browser's DOM to translate things into HTML in a GUI. Next the page is stored in XML according to schemas defined in the CMS. These, in turn, are stored in a database -- I've no idea if they're stored natively in XML or translated yet again.

When the content is published, the whole thing happens in reverse except that the HTML is generated by stitching together all the little snippets according to the templates. Finally we see for sure whether anything went wrong.

It's quite amazing how many layers of abstraction and translation there are. This is actually a pretty simple bug, and could probably be resolved in the JavaScript editor. Of course, given there's an easy work-around (change the email address to not have an ampersand) I'm unlikely to get around to having it fixed.

My justification for insisting they change the email address is that this system isn't likely to be the only one that has problems with the ampersand. I know for a fact that there are hundreds of broken email address validation systems out there that don't allow a whole stack of perfectly-valid characters in email addresses.