home
on exploration, introspection and creation

Archive for the ‘paramathematics’ Category

How revolutionary change happens

Monday, June 14th, 2010

I think that there is a pattern to revolutions.

  • Revolutions reflect a zeitgeist, a mutual understanding between a large group of people, that change is necessary
  • Revolutions happen through individuals, but the specific individual is not instrumental to the revolution: the individual just happens to be the catalyst

I like to explain this process as a superposition of two probability functions: one is the intensity of the mutual understanding — over time it grows and declines. The other is the ability for the specific individuals to push the group over the boundary. Revolutions then happen with a probability that is a compounding of those two effects. If a particularly strong individual comes around, the revolution is simply more likely to happen.

Betting on the Timing of an Event

Sunday, June 13th, 2010

There are times when there is a disagreement over what time a particular event will happen and people want to turn the different in opinions into money. It is common to place “over-under” bets — if the event happens before time T, Andrew gets the money, otherwise Bob gets it. Usually the further the event is from T, the more money exchange hands.

I don’t like this style of betting because it’s simply not expressive enough. Instead, I prefer to bet by specifying my probability distribution of the timing of the event, and then using these distributions to determine payouts. My friend and I made this bet once and it was a fun activity, despite the math involved behind-the-scenes, not that frustrating and requiring very little mathematics to actually place the bet.

Essentially, each party draws a probability distribution of the timing of the event — a histogram with the time on the horizontal axis and the probability density function on the vertical axis. The latter can simply be intuited as “the relative probability that the event will happen around the time specified on the horizontal axis”. So if the histogram is twice as tall around 8pm than around 7pm, the event is twice as likely to happen around 8pm than around 7pm.

That’s all each person really needs to do. No need to worry about the area of the histogram summing up to 1 since the vertical axis can be scaled up appropriately. The two people should also agree on how money they are willing to bet — say k dollars each.

When the event actually occurs at time T, the two people compare the value of the probability density function (the height of the bar) at time T on their graphs (the height will be scaled appropriately so that the area adds up to 1–so of course you can’t cheat by making your graph taller) and pay up based on the difference in these values.

Executing such bets is a little difficult since it involves calculating areas under the graph which may be very irregular. Those who are mathematically masochistic can constrain themselves to piecewise linear functions, or, in the extremely, easily integrable functions; otherwise the graph can be scaned and a simple graphics editing application (like Photoshop) can be used to determine the area under the graph (using the flood fill and histogram tools).

Sunrise and Sunset

Thursday, December 3rd, 2009

I knew since I was ten (we had quite a comprehensive curriculum at school) that the shortest day of the year falls on December 22nd. What I didn’t ponder until very recently was whether it was also the day of the latest sunrise (and, consequently, the earliest sunset).

While it may seem like a natural consequence of December 22nd being the shortest day, it doesn’t necessarily have to be true. If we model the time of sunrise and of sunset as two sine waves, a(t) = A sin(t+α)+K and b(t) = B sin(t+β)+L such that t0 minimizes the difference (we can drop the constants):

B sin(t+β) – A sin(t+α)

This means that

d(b(t)-a(t))/dt = 0 at t0   =>   B cos(t0+β) – A cos(t0+α) = 0

We need to show that this equality may hold (for some values of α, β, A and B) even if one of the waves is not minimized at t0. Let

da(t)/dt = P ≠ 0 at t0   =>   A cos(t0+α) = P
B cos(t0+β) = A cos(t0+α) = P
cos(t0+β) = P/B, -A/B<=P/B<=A/B

We can always find some values of A and B such that P/B is between -1 and 1, and hence the equation will be satisfied for some values of t0 and β.

We can also take a short route and recall that a linear combination of two sine waves of the same frequency but not necessarily the same phase is still a sine wave. Its phase is a function of the difference in phases of the two waves. The value of t0 that minimizes the resulting wave is not necessarily going to minimize any of the two input waves because the three phases are different (and not different by a multiple of π).

In fact, if you look at the sunrise and sunset times in Connecticut around this December, the shortest day, unsurprisingly, falls on December 22nd, but the latest sunrise is on January 4th 2010 and the earliest sunset is on December 8th.

This is great news–it means that starting on December 8th (and not the 22nd), it will finally start getting darker later and later!

Color Pickers

Sunday, November 29th, 2009

If you’ve looked at the crowdsourced art experiment you may have noticed a little color picker on the page. It looks like this:

My Color Picker

My Color Picker

If you’ve seen other color pickers (for example in Photoshop, the Office / iWork packet, or even MS Paint) you’ll notice that this one is slightly different. I made it to have property that a lot of other pickers lack: it displays all possible colors on one two-dimensional plane. If you think about it, the other pickers you’ve seen either have an additional slider that changes the 2D plane (e.g. Photoshop or the built-in Windows or OS X one) or they don’t allow you to specify some colors (e.g. the Pantone picker or a lot of the pickers online).

As you know, I’m fascinated with color, especially when there’s math or technology involved. My picker happens to involve both and so here we go.

The color space is three-dimensional, that is, to uniquely identify a color, you have to specify three dimensions. There are many ways to specify a color: the most common one (albeit not the more natural one to understand) is the RGB space: each color is identified by the intensity of pure red, pure green and pure blue components that, when mixed together additively, construct the desired color. I emphasize additively because when we think about “mixing” color (when we paint with oil or watercolor) we don’t actually compose color the same way our eye does. In the RGB space, there are three numbers you specify for each color: and so yellow can be defined as (1.0, 1.0, 0.0) because it consists of max intensity red and green and no blue. Violet (like the color of this flower) can be defined as (0.6, 0.3, 0.7).

Similarly, there are other ways to describe a color: the HSL space, that I find fairly intuitive, describes a color by specifying its hue (the location on the rainbow), its saturation (how vibrant is the color) and lightness (how bright it is). Again, three dimensions (why the color space is three-dimensional is an interesting question…).

This fact makes it hard to design user interface elements that allow you to pick a color (color pickers): the screen is two-dimensional so you either need a slider or some other way to change the third dimension, or you will see a selection of all colors. Being a visual person, I wanted to have a picker that displays all the colors at once, without some stupid slider.

My first attempt took advantage of the fact that on a computer, every measure is discrete (there is no such thing as infinity in computing) so I can collapse the three dimensions into two simply by cleverly rearranging where each colors should be placed. Assuming that each color component has 256 degrees of intensity (which is the case in computer screens these days), we can list all colors, for example

(0, 0, 0), (0, 0, 1), …, (0, 0, 255), (0, 1, 0), (0, 1, 1), …, (255, 255, 255)

We can now map the sequence into a two-dimensional one, for example

(0, 0), (0, 1), …, (0, 4095), (1, 0), (1, 1), …, (4095, 4095)

The problem, however, is that we want the mapping to be smooth, i.e. ideally we would like nearby pixels to have similar colors, and the mapping above (and in fact most mappings) won’t guarantee this.

This is where math comes in handy, specifically the field of space-filling curves. I kept one dimension (say, the blue component)–that will be my X coordinate of the resulting picker. For the other, I used a Hilbert-like curve to collapse the green and red components into one Y coordinate. A nice thing about Hilbert curves is that they are somewhat smooth: in this picture, if you pick two neighboring points they are likely to be close to one another on the curve. I’ll leave as an exercise for the reader to determine the actual expected Euclidean distance in color intensity for randomly picked two neighboring points on a Hilbert curve.

Well, such was the first experiment. The resulting “color belt” was very long (because collapsing two dimensions into one makes it a very long dimension!) and the lack of smoothness was pretty obvious (click to view the full image):

The color belt

My first approach to create a two-dimensional complete color picker

I then took the belt and made it into a ring, by “curving” the long dimension around (to take advantage of the fact that the circumference gives me 2π more space than the radius:

The color ring

The color ring

Besides looking very Tolkenian, the ring has a pleasing æsthetic to it. Still, it’s somewhat hard to pick out the color you want because the colors are fairly scattered (if your mouse if off by one pixel you may be picking a totally different color).

The next approach was to be a little smarter with the choice of dimensions. Instead of picking one of the components to be one dimension and try to collapse the other two, I chose the light intensity of a color as the horizontal dimension and then collapsed all colors of a given intensity into the vertical dimension. The advantage of this approach was an increased smoothness: all colors along a vertical line had the same intensity; and because the intensity function is linear in the values of R, G and B (it’s a weighted average of the red, green and blue intensity), the colors on a horizontal line are all similar.

I also varied the order in which colors would be placed on a vertical line, based on a heuristic of the R, G and B component. Some of the choices gave very interesting results. For example, if I plotted each vertical line based on the lexicographical order (sort by R, then G, then B), I got a picker that looked like this:

Intensity-based picker; pixels ordered lexicographically (R, G, B)

Intensity-based picker; pixels ordered lexicographically (R, G, B)

I got really interesting results when I made the heuristic slightly more complex: for example, if I ordered the pixels by the value of the sum of squares of the colors, I got

Intensity-based picker; pixels ordered by R^2+G^2+B^2

Intensity-based picker; pixels ordered by R^2+G^2+B^2

Finally, in order to bundle similar colors together (i.e. have fewer “boundaries” where neighboring colors differ significantly at the expense of more pronounced boundaries), I used a heuristic that “favored” one component over another. The image on the right, for example, orders pixels like this:

  • In first order, pixels whose blue component is higher than their red component, appear higher in the image
  • Then pixels whose green component is higher than their blue component
  • Finally pixels whose red component is higher than their green component
Heuristic 1 Heuristic 2

Intensity-based picker; pixels ordered based on prevalent component (two different heuristics)

The little picker at the very top of this post is a slightly massaged (reduced and despeckled, to make picking more accurate) version of this approach with a lexicographical order of pixels. I also added a pure gray gradient on the side–even though the picker itself contains every shade of gray, they are scattered throughout the image. If you’re interested, here is what the picker looks like if I order the pixels by saturation, it looks… phantasmagorical:

Intensity-based picker; pixels ordered by saturation

Intensity-based picker; pixels ordered by saturation

Source code for all color picker generators:

  • Color belt and color ring generator: ring.php (it generates a PNG file, so redirect the stdout to a file, say output.png)
  • A useful cheatsheet of color conversions (source unknown): colorConversions.txt
  • Layered color picker: picker.c (if generate a PBM file that Photoshop/ImageMagick can read, so redirect the stdout to a file, say output.pnm)

The consulting industry and entropy

Thursday, November 26th, 2009

I like entropy because it’s such a simple concept that’s easy to define (a measure of disorder) and universal (it looks the same throughout the universe). Its definition doesn’t depend on other words that can be loaded (such as “happiness” or “language” or “intelligence” or even “evolution”) so it’s a good term to use as the basis of a framework for thinking about life.

Entropy in the thermodynamic sense has a close equivalent in the field of information theory. In fact, its equivalent is so close that it’s also called entropy. It is a testament to the universality of the concept — in fact, I truly believe that I could define everything I can think of in terms of entropy. Perhaps one day I’ll publish a dictionary that does precisely that.

In the meantime, my friend, wishing to see me suffer, challenged me to define a seemingly unrelated thing in terms of entropy. Here it goes.

I’m intrigued by the consulting industry, specifically management consulting. At first I had a somewhat cynical view of it (in fact, as I saw many of my friends prepare for interviews that seemed to ask nothing but Fermi problems. Ironically, most of my friends didn’t know they were Fermi problems… here’s a cheat sheet if you’re preparing for a consulting interview). I interacted with consultants at work (and saw the best of them in the movie Office Space) and found it very difficult to understand how they can be adding any value. After all, they don’t bring any skills to the table. Most importantly, there was an unsettling stench of slickness to many of the consultants I’ve interacted with, almost as if every conversation was a game to be won or lost (or–again, taking a cynical view–as if it was through the slick conversations that the consultants were making an impression of adding value).

Then I thought about it some more, in the abstract, starting from what I imagined to be the history of consulting. I had this impression of the consulting industry as being pioneered by a few incredibly smart people (university professors perhaps)–I call them “founders”–who worked out through the theory of efficient management, information flow and social interations and determined some theoretical framework that they published in a seminal work in the mid-1950s. Following them were ambitious and innovative entrepreneurs who decided to put their theory in practice–they worked out the kinks that usually prevent the elegant theory from being applied to the real world (this is also probably the reason why we hear about groundbreaking research in battery technology yet nothing seems to reach the mass market). I call these people “visionaries”. They came up with principles and set up the first consulting companies. Over time, as it always happens, the principles were lost and the companies lost sight of their mission and their roots; it all became a matter of making money (incidentally, this happens to a lot of industries which is why sophisticated enough companies all look the same, regardless of what they do). Surely, then, in theory consultants add value, even though what we see today obscures it well. Let’s find it (and let’s use entropy to explain it!)

Entropy in the information theoretical sense measures how much information some data contains. A string of zeros: 0000000000 contains no information while a random string of zeros and ones: 001101001011101 contains plenty of information. In other words, entropy tells you how predictable the data is. We can also talk about entropy rate–a measure of the “density” of information. For example, English text has an entropy rate of about 1 bit per letter, which means that if you were to represent English in the most efficient way (but without losing any information), Shakespeare’s Romeo and Juliet would take up about 169 thousand bits. Note that the most popular representation of English today on a computer is to use ASCII (assign an 8-bit sequence to each letter) at which point the same play would take up about 1.3 million bits in such an encoding–the amount of information varies depending on the encoding and this is why entropy assumes you find the most efficient such encoding.

Of course entropy rate depends very much on the domain of the information. While entropy rate of English is 1 bit per character, entropy rate of, say, the scripts of soap operas is much lower than that. It’s because what is said in soap operas is much more predictable–fewer sophisticated words are used.

While it’s hard to find the theoretically correct value for the entropy rates of data, a good first-order approximation is to compress it using a good lossless compression algorithm (for example, make a zip of the data) and look at the compression ratio. For example, Romeo and Juliet takes up 165kB (which translates to the aforementioned 1.3 million bits). Once you compress it, the text takes up 63kB. The compression ratio of 38% means that the entropy rate of Romeo and Juliet is about (38%*8 bits per character that ASCII uses) = 3 bits per character. More than regular English, but, there again, Shakespeare is a little less predictable than your average Joe (plus, there’s overhead of the compression program itself, and it’s not ideal, too).

Now the punch line: consultants are often brought in to synthesize information and come up with recommendations. While the recommendations are usually commonsense, the synthesis is hard. They have to interview lots of people, look through reams of documents, stare at various charts and graphs. This is a lot of information. At the end of their engagement they come up with a powerpoint document that summarizes the important findings. It’s usually an incredibly dense powerpoint — carefully chosen words, terse bullet lists, not even full sentences, articles and prepositions are the first ones to go. If they’ve done the synthesis right, they were able to fairly losslessly (given what they were supposed to recommend on) compress the information into a powerpoint. Hence, the way to assess the value of their work, simply find the entropy rate of their deliverables. High entropy means high value.

You don’t even have to bother reading the powerpoint document!

Coincidences and randomness

Thursday, November 19th, 2009

It happened twice in the past few days… each time, I encountered the same fairly obscure (i.e. unlikely to appear in my life) concept in two (or more!) completely unrelated contexts.

  • On Saturday, I went to see The Phantom of the Opera (an outstanding production, the kind of Broadway that elevates you). As I flipped through Playbill, I read an article about A Steady Rain, a show starring Hugh Jackman and Daniel Craig. Then, that night, I was flipping through a hotel brochure and read a different article about how Broadway starts attracting movie actors… A Steady Rain was mentioned. Finally, on Sunday I was talking to a producer who brought up the same show. Quite a coincidence
  • I just finished watching House, M.D. (one of two TV shows I religiously watch). The case (SPOILERS!!) revolved around the “hygiene hypothesis”, an idea that living in an environment that’s too germ-free actually makes us sicker*. Coincidentally, just a few hours earlier I was listening to a not-too-recent RadioLab episode and the same hypothesis was brought up! Uncanny

I was quick to burst my own bubble–selection bias is definitely at play here. We perceive coincidences as rare, weird and somewhat mystical so that when they actually happen (and they are bound to every so often), we will definitely remember them, unlike the thousands of non-coincidences that happen to us every day.

But as I thought about it more, I thought of a curious, seemingly unlikely interpretation of this phenomenon: coincidences are equivalent to randomness; and the fact that they happen is simply an outcome of the fact that there is plenty of randomness around us. Without coincidences, there is no randomness.

Say what? Why would coincidences and randomness be equivalent? Getting 7 tails in a row in a series of 100 coin tosses is actually a rather likely event, but we’d consider it a big coincidence. As shown by a mathematician named Ramsey, in a complex enough system, you can find pretty much any structure you can think of. The world that surrounds us is an incredibly random (and thus complex) system–we are bombarded by terabytes of information every minute, and most of this information is linked. It should therefore come as no surprise that the world we experience will feature plenty of structure, even of the most unlikely kind. It is this hard-to-believe structure that we call coincidence. Randomness, therefore, is the existence of unexpected structure.



* Something I’ve personally believed in ever since childhood–when I was little I would get sick all the time. I went through pneumonia, rubella, chickenpox, measles, you-name-it. At some point I got fed up with this (quite a bold thing to do when you’re eight!) and stealthily plodded through whatever ailment I was suffering from, coughing in secret and all. I haven’t been seriously sick ever since.
† Randomness–stochasticity–was the subject of another RadioLab episode I’ve recently listened to. The coin toss experiment was described there and this is the episode that inspired my randomness theory. You’ve really got to start listening to this show…
‡ The laws of physics, for example, are one example of such a link: if we drop a heavy object, it will make a sound as it hits the floor–the visual and the auditory information is thus linked.

Misuse of statistics

Monday, November 16th, 2009

Just a few days ago, CNN just reported 3,900 deaths in the U.S. this year due to swine flu. Everybody panic, right? Except that flu claims about 35,000 lives every year (swine flu might in fact be much less lethal than initially suspected), and it claims lives of people who are elderly, vulnerable and already sick.

Sometimes I wish numbers hadn’t been invented. Why? Because in the wrong hands, they are harmful. And, the sad truth is, most hands are wrong hands.

Absolute and Relative Happiness

Wednesday, October 28th, 2009

What is happiness? My friend P.W. and I were talking about it some time ago and we found it difficult to agree even on the most fundamental characteristics of happiness. It was clear to both of us that happiness is subjective (the best way to assess whether somebody is happy is to ask them); in fact, the subjectivity is pretty strong: a lot of the factors we considered (wealth, intelligence, family, physical appearance, health) didn’t seem obviously related to happiness–we could think of plenty of frequent situations where added wealth didn’t make the person any happier, for example.

When I think of happiness, I imagine a two-dimensional model. One dimension is the person’s capacity to perceive happiness. This capacity changes over time, varies from person to person, but it changes slowly. It defines a sort of spectrum. The other dimension is where in that spectrum the person is currently perceiving themselves to be. Essentially, at any given point people are somewhere between very unhappy and very happy, but those limits change over time as people realize how much happier (or how much more unhappy) they can be–either because they understand the world better, or see additional opportunities that they haven’t seen before, get hardened in life, or simply experience more. I think a significant hurdle people encounter when they talk about happiness is that these two dimensions are frequently confused. It’s not clear to me, for example, whether what we frequently refer to as “happiness” is the first dimension (the range of happiness), the second dimension (the absolute happiness), or some combination of the two. A simple yet useful way to think about these dimensions is to imagine a scale going from X to Y (the range) and a pointer Z somewhere between X and Y (absolute happiness). There are three quantities we may care about: Y (or X) (the extremes of happiness and unhappiness that we can conceive of), Z (happiness on an absolute scale), or (Z-X)/(Y-X) (a notion of where we are relative to the range: happiness on a relative scale).

If we use this framework for understanding happiness, we’ll see that increasing your range (i.e. discovering what it means to truly be happy) lowers your relative happiness. You can see this fairly easily if you think about a frequently cited adage “Ignorance is bliss”. Would you rather lead a life where you are happy but don’t have much of a conception of the complexity of the world around you, or lead a life where you are not so happy but are aware of what surrounds you, understand the world, and can perceive how much happier you could be relative to the other case. Would you prefer your happiness range (Y-X) to be small and your relative happiness to be large, or vice versa? In other words, do you care about absolute happiness or relative happiness?

What is Intelligence (part I)

Sunday, October 25th, 2009

What if intelligence was simply the ability to convince others you’re intelligent? I know it sounds like a cop-out (isn’t any recursive definition a cop-out) but with so many wildly different theories about what intelligence really is, it’s by far not the most unreasonable one.

This brings me to an interesting thought on how intelligence should be measured. The intelligence quotient test (you know, those puzzles) has been criticized widely as both suffering from false positives (you can train yourself to solve these brain teasers and puzzles; they are fairly predictable and being able to do them doesn’t necessarily make you intelligent) and false negatives (people who don’t have strong verbal skills will be deemed “unintelligent”). How about I measure intelligence by asking everyone how intelligent they think everyone else is. I would then weigh the opinions by how well each reviewer knows the reviewee, and by how intelligent they were deemed to be. This is recursive, yes, based on a hypothesis that intelligent people can gauge intelligence better than unintelligent people.

How exactly would this work? Say I send out a survey where I ask everyone to place everyone they know in one of several “buckets” (equivalence classes) based on how intelligent they are. Bucketing is my favorite way of assessing things — the problem with the ranking is that it takes too much time and often ends up being arbitrary (it’s much harder to answer the question “is person X more intelligent than person Y” than the question “is person X by and large as intelligent as person Y”); the problem with some scale is, again, that it’s arbitrary to begin with, and that it either requires some kind of reference point at which point it simply becomes a ranking, or it’s a heuristic, at which point it suffers from the same problem as IQ. If there aren’t too many buckets, this exercise is not particularly hard.

Then I figure out how intelligent everyone is by collating people’s reviews of them. I would weigh the reviews based on how well the person knows the reviewee (again, on a 5-point scale or something)–i.e. how confident that person is of his/her read of the reviewee. If the population is large enough, this resultant intelligence level will be pretty granular (much more granular than the number of buckets we started with). Now, since I’m assuming that intelligent people can tell intelligence better, I’d go back and weigh the opinions by how intelligent each person is. This will change the intelligence “score” slightly — so I feed the new scores back into the weighing, and so on, until the entire process converges on some numbers.

I don’t know yet — it’s a very interesting question — whether it will always converge, but my guess is that it will. It will definitely not diverge (since the intelligence score is bounded). It may oscillate.

The Metric system vs the Imperial system

Sunday, October 25th, 2009

This must be the most popular topic of conversation whenever a bunch of Europeans get together…

The U.S. adopted the English (I’m so rubbing it in) system of measures several hundred years ago and while the rest of the world (including Great Britain) moved on, going with the Metric system instead, we stuck with the good old pounds, inches, and ounces. As is the case whenever an entity used internationally lacks international standards, it’s been causing loads of confusion and even disasters. Besides asking why this is so (the United States is uniquely insular in this respect, even more than actual islands such as Japan or Great Britain where such a thing would have a natural justification), a lot of people engage in the oft time-consuming and fruitless rhetoric of which system is superior. I’m going to add to this hairball, but only a little bit. And I’ll try to use first principles rather than dogma (we’ll see if I succeed).

The goal of having a standard of units are to provide a common framework for the society to efficiently and robustly (i.e. in a way that’s resilient to errors) convey information about measures that doesn’t require special skills (i.e. to make the framework usable by as many people as possible).

I think there are two factors one must take into consideration when comparing the two systems: the intrinsic properties of the units (how practical they are in daily use) and the way they can be manipulated and composed (how to multiply them, compare them, convert them). I claim that the Imperial system is superior at the former, and the Metric system–at the latter.

What makes Imperial units intrinsically superior is the very choice of how much a primitive of various frequently uses measures actually measures. I think an inch is superior to a (centi)meter — I’m sure that if we were to take a survey of all lengths that humans refer to in their lives (controlling for a selection bias–people will tend to round up or down to the nearest unit of whatever system they are using), and draw a histogram of such usage, there would be a peak around the inch and not the (centi)meter. I’m even giving the Metric system the benefit of the uncertainty around which unit specifically should count as the primitive. In other words, we’re more likely to talk about things which are the size of an inch than things which are the size of a centimeter. Similarly, the (mili)liter is inferior to a fluid ounce — a fluid ounce is a more natural measure of a “splash” of a liquid.

There is a second-order effect of measures around, not at the primitive: do things more naturally come in (sub)harmonics of an inch (“half an inch”/”one-quarter of a pound”/”three ounces”) or the meter (“three centimeters”/”one-half of a liter”). This is probably much harder to determine. However, one important property of the Imperial system is that it operates primarily on natural and not decimal fractions of units. The Metric system talks about 0.2 centimeters; the Imperial system talks about one-eighth of an inch. Natural fractions are (even by the very definition) more engrained in the human nature than decimal fractions–we’re used to thinking about dividing things into equal parts and visualizing individual parts than dividing into a fixed number of parts (10) and visualizing multiples of that fraction.

The most important benefit of the Imperial system, in my view, is that it operates (mostly) on base 12 and not 10. I already wrote about how base 12 is far superior to base 10–it divides cleanly into 2, 3, 4 and 6. Base 10 numbers divide cleanly only into 2 and 5. Having intrinsically more divisors is better because it avoids awkward infinite fractions and, ultimately, inefficient communication.

When it comes to the second factor, though (how the units are manipulated and composed), the Metric system wins hands down. The Imperial system is inconsistent (12 inches to a foot; 16 ounces to a pound); there are far more units to remember (fluid ounce, pint, quart, gallon) and more units to remember the relationship between. The Imperial system captures many fewer measures at the very small and the very large range — in fact, the only way to represent very small or very large numbers is to use the multipliers that are the very foundation of the Metric system (10 million pounds, for example)!

Most of the disagreement, then (especially when a bunch of Europeans get together…), can be boiled down the the philosophical difference (I define a “philosophical difference” as a difference in opinions that cannot be reconciled with logic because it’s simply too costly to find a way to compare the opinions objectively) — do you prefer the intrinsic properties of the units (which I feel are more aligned with human nature), or the composition properties of the units (which are more aligned with civilization).