Look very closely at the newsfeed at the bottom of screens installed at the back of some New York City’s taxicabs. If the newsfeed includes an ampersand, instead of it you will see a mysterious amp; (so, for example, “Crate & Barrel reports quarterly loss” turns into “Crate amp; Barrel reports quarterly loss”). The first few times this didn’t even reach my threshold of detail — I’m so used to seeing bugs that are a result of common programming mistakes that this error made a lot of sense to me (which wasn’t at all an excuse — in fact, I’m surprised that this defect was allowed to exist for so long!). But then I stepped back and realized that this bug that I’m taking for granted is a fascinating case of how prevalent various technologies are, and how likely one technology is to build on another. If I were to explain the cause of this error to my mom, I thought to myself, I would find it very difficult (I feared that there is so much context that I just assume is broadly known that the exercise would turn into telling the history of computing).
I like to do difficult things so I decided to try (explaining this to my mom):
1. This is a bug — which is just a phrase for an error that was caused by a mistake in the programming of the device
2. It happens whenever the newsfeed needs to display an ampersand — instead, that ampersand gets replaced with the word amp followed immediately by a semicolon
3. The device needs to somehow know what it is supposed to display. I don’t know for sure the way it is done but programmers like to reuse parts of code, and standards, and conventions that have been widely accepted, so I have an idea for how this works
4. The device is connected to some kind of a component whose task it is to receive information from the outside (after all, the newsfeed is regularly updated with fresh news) and pass it on to the device to it can be ultimately displayed on screen
5. It doesn’t matter what that component is — it may be a radio receiver which receives all the information periodically from a radio transmitter somewhere in the City (just like the RDS on the radio which allows you to see what band is currently playing), or a 3G receiver which receives the information from a cell tower (just like your cell phone is able to receive email), or maybe the cab driver is plugging the device in to some kind of box when he parks the cab at the end of his shift, and the data is transferred then
6. In any case, the data is transmitted to the device. There are many ways to transmit the data — remember that the transmission is digital which means that there has to be some code for each character of the newsfeed. This is similar to Morse code. A convention that is used a lot is to use a code called ASCII which assigns every character a combination of eight bits — each bit is either a zero or a one. I’m guessing this data is transmitted this way. I’m guessing this because I haven’t seen anything more complicated in the newsfeed than regular text — if I saw little icons, or Greek characters, I would have to go for a more complicated code (ASCII lets you encode at most 256 different characters)
7. But the newsfeed is not the only thing that gets transmitted. Weather information gets transmitted (and displayed as a number — the temperature — and an icon — rain, snow, etc.); those annoying commercials must also be transmitted the same way. The latter is a lot more information than the newsfeed, but the fundamental way of transmission is the same
8. Because there is more than one different piece of information to transmit, there has to be some way to organize this information. ASCII has no built-in way to do this because all it does it encode individual characters. For example, if the device received just a “stream of consciousness” information like this:
Crate & Barrel reports quarterly loss. President Obama arrived in Denmark. Monday 67 degrees sunny. Tuesday 70 degrees heavy rain.
it would be difficult for the device to decipher what belongs to the newsfeed, and what belongs to the weather forecast. Within each group, there are structures as well (the newsfeed has individual ticker items; each day of forecast contains the day of the week, the temperature, and the weather conditions). So programmers use another convention called XML which allows them to organize information in a way that computer programs can read. XML allows you to surround text with special words enclosed in brackets which are interpreted specially. There are a few rules that stuck (for example, to use angular brackets, and to use a slash for the word at the end). So, for example, the above transmission would look like this:
<news>
<item>Crate & Barrel reports quarterly loss.</item>
<item>President Obama arrived in Denmark.</item>
</news>
<weather>
<item>
<day>Monday</day>
<temperature>67</temperature>
<condition>sunny</condition>
</item>
<item>
<day>Tuesday</day>
<temperature>70</temperature>
<condition>heavy rain</condition>
</item>
</weather>
You will see that all information is there, it’s just highly structured. A computer program can then ask for things like, “give me every item in the weather block, and for every item, piece together what’s in the day block, what’s in the temperature block (adding the word “degrees” to the end), and what’s in the condition block.”
9. XML has some limitations. For example, an opening angle bracket cannot be used anywhere in the text because the program will think that it’s the beginning of a special block and probably not allow this transmission:
<item>To write that 2+2<5 will confuse the hell out of this program</item>
By “not allow” I mean, it’s possible that the part of the program doing the transmitting is expecting correct XML (that is, every opening angle bracket has a corresponding closing angle bracket, and so on). Perhaps the part of the program doing the receiving is expecting correct XML. Very likely both do (because programmers reuse code — somebody else wrote code for interpreting XML and they probably wrote it in a way that prevents common mistakes from happening).
10. To get around this problem, if you want to display an opening angle bracket, you have to use a special code instead. This code (again, by convention) is < (which stands for “less than”): the ampersand denotes the beginning of a special code, and the semicolon ends it (you need both; otherwise the word “altitude” would be rendered as “a<itude”). Similarly, > is the closing angle bracket (“greater than”). So the above really needs to be written as
<item>To write that 2+2<5 will confuse the hell out of this program</item>
11. We’re almost there. This is a cool way to solve one problem, but unfortunately it introduces another one: you can’t display an ampersand! (because the program will think that you mean a special character). The way programmers solved this problem is to create a special code for ampersand itself — &. So the news item that gets transferred to the device probably looks like this:
<news>
<item>Crate & Barrel reports quarterly loss.</item>
</news>
12. I’m pretty confident that so far I’ve been fairly right — & is seen pretty much whenever XML is involved. One can come up with many theories at this point. Here is one.
13. I already mentioned that usually, programmers would include a frequently-used piece of code that does the proper decoding (i.e. turns & into an ampersand). It’s possible that in this case, they didn’t use that common code, and instead wrote their own (thinking it’s easy to write something simple like this). That code may simply be ignoring all special codes and just passing whatever it encountered on. Then, down the road, another piece of code would pick up whatever it received (at this point it’s no longer aware that the text came from XML), and, as a safety measure, simply stripped any characters that it didn’t expect. This includes ampersands.
14. Why is this a safety measure? Programs are usually written in a defensive way, that is, they make very few assumptions about what they are given. Instead, they err on the safe side and double, triple check everything. One of the checks commonly performed is called sanitization, and it’s a process of turning possibly erroneous input into a correct one by stripping bad data or data that could be malicious (if interpreted literally). For example, suppose that the same newsfeed has a command that defines how fast the newsfeed is moving on screen (since the XML data is structured, we can just add a special block for this):
<news>
<item>Crate & Barrel reports quarterly loss.</item>
<speed>16</speed>
</news>
So if the program ever encounters a speed block, it knows to set the speed of the ticker to the given number. Now suppose that I can submit my own news items and they will be displayed. Suppose that this gets integrated with all the other news items by replacing the word CUSTOM below with the user-provided item:
<news>
<item>This is the regular news item.</item>
<item>CUSTOM</item>
</news>
This is all fine for simple news items (for example, if I provide “Hello world”, the word “CUSTOM” will be replaced with the phrase “Hello world” and everything is good). But if I knew about the speed command, I could submit an item that looks like this (read carefully!):
My news item.</item><speed>99999</speed><item>
All the < and > will be interpreted as opening and closing brackets and so here is what the transmission will look like:
<news>
<item>This is the regular news item.</item>
<item>My news item.</item><speed>99999</speed><item></item>
</news>
I just added a speed command to the feed even though I wasn’t allowed to! If this is not caught, I could crash the program by, for example, passing in a really large value for speed (or a negative value, or zero, or some nonsense). If the program didn’t strip certain characters (such as angle brackets), it could be vulnerable to attacks like this (this, by the way, is called an injection attack because I’m injecting special code into the data that I’m allowed to provide). Ampersands can also be interpreted as special characters (because < is translated into an angular bracket which I can use to form a news item that changes the speed!) so they are stripped.
15. Hence, & becomes amp;. Now the newsfeed contains no special characters and hence we see amp; on the screen.
Hopefully now my mom can see where all those years of computer science education went… and at the same time she learned about injection attacks. Pretty good for one post.