home
on exploration, introspection and creation

Archive for the ‘code’ Category

Useful Programming References

Monday, November 8th, 2010

Algorithms

C

Java

Ruby

OSX/Unix

postScript

LaTeX

JavaScript

Useful Reference Cards

Wednesday, November 3rd, 2010

Technology

Miscellaneous

Programming Unix/Mac circa 2006

Tuesday, November 2nd, 2010

The computing world changed tremendously between 2000 and 2006… in 2006 I completed a second version of the reference guide, this time focused on Unix and Mac superuser experience (less about programming, more about terminal usage). Most of the stuff here is timeless and still works beautifully, thus yet again proving that good technology never becomes obsolete.

Programming Mac/Unix circa 2003-06

Programming PCs circa 2000

Monday, October 18th, 2010

As a teenager I spent a lot of time programming. I was mostly interested in making games, and for this I had to get involved in the fairly low-level structures, such as interrupts, system calls, and I/O ports. While I mostly programmed in C (DJGPP — a C clone that was available for free for PCs, implementing an extended memory module which allowed me to finally relax the limitation of being able to use up at most 640 kB of memory in my games), I had to implement some assembly routines because using a high-level language for this was simply too slow. The most complex graphics routine I implemented was a routine to copy a rectangular block of pixels from one place (usually an off-the-screen buffer) to another in a way to preserve all pixels marked with a special color (a mask).

It was a wonderful world–there is something magical about dealing with low-level operations. The information I needed to display pixels on screen, or to control the mouse felt in a way like knowing a secret code that unlocks the door with the treasure behind it.

Around 2000 I had enough knowledge (all before I ever got a modem!) to compile it into a kind of cheatsheet. Now, about ten years later, it’s time to share it with the world. Of course, very little of it is relevant anymore — although most if not all of the information should still produce the same effects, thanks for the crippling yet comforting fact that PCs have been painfully backwards compatible for the past decade or more.

The PDF is a little dense, so it deserves a brief walkthrough.

  • Some operations were possibly simply by inspecting a fixed location in the computer’s memory. Most of the information could simply be read, but some could also be fetched by writing a particular set of codes in specific locations. Putting information directly in the computer’s memory has never been recommended (and these days, virtual memory makes it almost impossible), but by inserting bytes directly in memory the computer’s behavior could be changed wildly — most times it would crash the OS, but sometimes it allowed you to get infinite lives in your favorite game or get some creepy screen effects. I used to stay up at night and hunt for locations in memory, mucking with which produced the most spectacular effects.
  • Most low-level operations were provided through issuing a set of interrupt operations to the microprocessor. The OS (in this case DOS) would interpret them in a particular way. Usually you needed to specify additional information — you did that by writing directly to the processor’s registers
  • I spent a lot of time figuring out how to display graphics on the screen. Back in the early 2000, everyone’s resolution of choice was a 320×200 resolution with a color table of 256 colors. Since every pixel was a byte, and the colors could be changed globally, this allowed for a number of games that displayed graphics fast and cycled through the colors. Extended modes (called VESA) were also possible and they offered higher resolutions and full color spectra (15-bit, 16-bit or 24-bit)
  • When you wrote a game, you pretty much had to intercept everything that the OS (MS-DOS) tried to do for you. The default mouse offered by MS-DOS was awful, the keyboard had a frustrating delay that you couldn’t get rid of and the OS wouldn’t even inform you when most of the keys were pressed. Fortunately, it was possible to handle the keyboard and the mouse through similar OS interrupts
  • Finally, the document concludes with some common file formats.

Fortunately, technology today has great abstractions and high level routines that make it unnecessary to know most of what’s in the cheatsheet. I am proud to have had to figure all this out, though — in an extension to many Java jokes, I think in order to be a great programmer, you have to understand the technology stack all the way down to microprocessor commands. The effect is a kind of deep connection with technology, the ability to make optimizations based on what’s going on deeper in the stack (such as my need to write assembly code back in 2000), but also a good deal of humility.

Here it is: programming-cheatsheet-2000.pdf

It’s Easter Eggs all the way down

Monday, October 11th, 2010

I love the idea of Easter Eggs — small pieces of functionality hidden from the user (and, obviously, undocumented). If you haven’t already, you should read Thompson’s paper on trust, and, related, the possibility of a combination of scary, but very clever Easter Eggs: hidden functionality that allows you to login to any Unix terminal. The login utility is frequently recompiled, but if there is a corresponding Easter Egg in GCC that introduces the backdoor whenever login is compiled, the backdoor could be preserved. GCC itself is recompiled frequently — but it’s recompiled with another, older version of GCC — so if GCC also included an Easter Egg that tells GCC recompiled with it to introduce an Easter Egg whenever login is compiled with it… voilĂ !

What if we take this principle and apply it even further, as far as we can? Using the fact that as systems get more low-level, they get more complex, and so it’s increasingly more difficult to audit their functionality. So, if we’re worried that people may inspect the assembly language of that original GCC (or write their own GCC), we could put an Easter Egg in the microcontroller that looks for a specific combination of assembly instructions and executes special, tucked away code. At this point inspection becomes very difficult. Of course, the particular sequence of commands to look out for is complicated (and there are probably many such sequences — depending on the compiler optimizations, a simple compiler instruction may get assembled into all sorts of crap). However, with any system that’s complicated enough, you can see any pattern you like, and you shouldn’t underestimate the possibility of even simple patterns remaining undetected for a long time.

Can we keep going (as if there was much point in continuing!)? Of course; people use layout software (and–even better–usually simply just higher level synthesis software such as Verilog) to lay out what will become a microcontroller. We can add an Easter Egg there. This takes us back to software which is compiled with GCC — and you can see how we could continue this process indefinitely, with tools that we would never think would be used in that “critical path”. So I dare you, dear reader, to create the longest chain of Easter Eggs you can. Bonus points for creative inclusions! (Of course, it would be silly to keep doing it a lot, for the resulting code would likely be massive — think lots of lookup tables. Unless we do some serious hackery — it is, after all, possible for very short code to generate very long sequences…).

Can we protect ourselves against such a mindblowing seemingly infinite chain of Easter Eggs (or–more relevantly–security vulnerabilities)?

Disposable Software

Monday, October 4th, 2010

So far we have been optimizing our software development practices for the increased lifetime of our software — yes, simple features, especially around the beginning of the project will take a long time to develop because we want to structure our code in a way that will reduce tech debt in the future. We tend to keep tech debt at a steady low number — something like 10% — precisely because it’s debt, and we don’t want to pay it off over a long period of time.

What if we make it easier for software to be thrown away, and start optimizing for rapid release? That would be a huge paradigm shift: our users would have to get used to the feature set changing rapidly, even losing feature they may like. There are some cool things we could do, for example write code that literally expires (and a countdown clock for the user so he/she is not surprised when the feature disappears).

Something to think about…

The structure of elevenseconds.com

Tuesday, September 21st, 2010

I tend to favor building to buying when I want to learn something, so when I first started constructing the main site elevenseconds.com I thought of building myself a small Rails-like framework. The main difference would be that it would apply to a simple principle of just-enough abstraction layers, that is, I would introduce a new abstraction layer when I left that the cost of maintaining the current code was too high. I am currently pretty happy with the structure of the site, although updating the front page takes a bit of time so perhaps it’s time to introduce a new layer of abstraction.

At a high level, all requests go to an index page

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.cgi [QSA,L]

which grabs the desired page to be visited from the URL as well as the subpage option (I thought it would be cool to separate the option with a colon — I never liked the questions marks and ampersands, and overloading slashes for paths and parameter separations never agreed with me. And so http://elevenseconds.com/photography/italy:slides takes you to an index page which knows to display a page called photography/italy with an option of slides.

Most of the pages are very similar so I put in place a kind of Metacontent-Metaview paradigm: every page request will fetch content from the /content/ directory. The ruby files there are organized in the same hierarchy as the URLs, so the above link will fetch content from /content/photography/italy.rb.

The actual content file is very configuration-like; let’s take a look at a part of the main content file (root URL resolves to /content/main.rb):


append = []
append << rendering('text', {
  'x'=>1,
  'y'=>0,
  'text'=>'<img src="/images/ui/whatis.png"/>11seconds: on exploration, introspection and creation'
})
append << rendering('verticalLine', {
  'x'=>1,
  'y'=>2,
  'size'=>3
})
append << rendering('imageWithCaptionOnRight', {
  'x'=>1,
  'y'=>2,
  'link'=>'http://blog.elevenseconds.com/trends/',
  'src'=>'/images/thumbs/main/glogo.png',
  'text'=>'trends'
})
append << rendering('imageOnRight', {
  'x'=>1,
  'y'=>11.5,
  'link'=>'/pictures/glowdoodle',
  'src'=>'/thumb.cgi?im=/images/pictures/glowdoodle/sadterminator.jpg&x1=197&y1=74&ss=344',
  'id'=>'projects',
  'text'=>'glow doodles'
})
render('main', {'append'=>append})

append is really a collection of items to append. Each item is a rendered view with some parameters. Each view is simply some HTML markup with Ruby inline code. For example, imageOnRight.rb contains

<div class="rightOfLine" style="left:#{x*35+2}px;top:#{y*35+69}px">
  <a href="#{link}">
    <img class="image" src="#{src}"
      onMouseOver='caption("#{id}", "#{text}", "#{link}")'
      onMouseOut='caption("#{id}", "", "")'/>
  </a>
</div>

(To make the page more structured, I decided on absolute positioning — maybe that’s because I’m a bit of a control freak.)

Note that I’m not quite allowing hierarchical content — there is no need for it given the current design of the site. All I need to do is a two-level hierarchy: a main view (the last line in the main.rb file above may include a bunch of other views.

I needed to introduce another abstraction because a lot of the sites with photography looked the same and I saw myself writing the same code over and over again. So I wrote a function that displays a generic set of pictures. Take our italy.rb content page:


category = 'photography/italy'
caption = 'italy june 2010'
images = [
  ['italy-0', true, '115, 45, 305'],
  ['italy-1', true, ''],
  ['italy-2', true, '296, 20, 54'],
  ...
  ['italy-45', true, '117, 75, 85'],
]
append = []
append << generateSeries(images, category, caption, [534, 356, 2.42])
render('main', {'append'=>append})

The above specifies the absolute minimum that is needed to render the Italy photographs — a definition of each images (I don’t always just number them in sequence), with a specification of whether the image is in landscape (true) or portrait (false) mode, and the top-left coordinate and the size of a window which will be made into a thumbnail. If not specified, the function chooses a smart default.


$explore = [
  ['http://www.gagneint.com/Final%20site/Animation/Sensology/Sensology.html', '/images/thumbs/main/sensology.png', 'sensology'],
  ['http://js1k.com/demos', '/images/thumbs/main/js1k.png', 'demos'],
  ...
]

I do the same with the front page — since it’s mostly made up of a bunch of links, I define each (in the example above, all I need for a link is a URL and a thumbnail picture) and then just cycle over the array. I only look at the latest 10 images for each category (except for EXPLORE which gets 20) but when you click on the link you get them all (http://elevenseconds.com/detail:explore) — again, code reuse.

As the content gets more streamlined (and as I get more lazy), I’ll probably go a step further towards configuration and just define everything in some standard configuration format like JSON (XML may be human-readable, but it’s certainly not human-writeable!). My goal is not quite to minimize the number of characters that I type (because then the conventions at play will no doubt be too obscure, plus the amount of plumbing code I’ll have to write too much for me to remember what I wrote in a year), but I’ll want to get close to it.

The thumbnails are also pretty painless. The utility page, thumb.cgi, generates a thumbnail out of any picture. You just need to specify the top-left coordinate and the size of the window to made into a thumbnail, and the resulting size. Most thumbnails on my page are 32×32 (it’s a kind of hallmark of the site). Some are 67×67 (not 64×64 because I want the images to line up — two 32×32 in a row actually take up ((1+32+1)+1+(1+32+1)) = 69 pixels because of the border and the one-pixel gap so to replace it with a bigger picture means it needs to be 67: (1+67+1)=69.

thumb.cgi saves the autogenerated thumbnails in a cache directory so that the images don’t have to be scaled each time the page is loaded. That is, when thumb.cgi is called with the same arguments again, it will just bring up the saved thumbnail instead of resizing and cropping the picture:


# by default output png unless the target size is specified
format = cgi.has_key?('sc') ? 'jpeg' : 'png'
hash = Digest::SHA1.hexdigest(ENV['REQUEST_URI'].to_s())[0..7]
cacheFile = "cache/" + hash + "." + format
if(File.exist?(cacheFile))
  puts cgi.header("image/#{format}")
  file = new File(cacheFile, "r")
  puts file.sysread(20)
  exit
end

But (great example of just-enough abstraction layers), I go even further: instead of having thumb.cgi do the logic, I replace the source of the image when rendering the view with its cached image. That way I don’t even have to call thumb.cgi which significantly speeds up the page load:


def gen2cache(html)
  source = html.clone
  links = []
  while(source =~ /"(\/thumb.cgi\?[^"]+)"/)
    links.push($1)
    source[$1] = ''
  end
  links.each { |link|
    format = link.index('sc=') ? 'jpeg' : 'png'
    hash = Digest::SHA1.hexdigest(link)[0..7]
    cacheFile = "cache/" + hash + "." + format
    if(File.exist?(cacheFile))
      html[link] = "/"+cacheFile
    end
  }
  return html
end

I guess the next step would be to actually autogenerate the entire HTML response.

Anyway, while Rails is certainly a much better solution to arrive at the final answer, I’ve learned a great deal about Ruby through this exercise. I doubt I’ll make the site much more complex than that.

Life Hack #27: Spaces in Filenames

Tuesday, September 14th, 2010

Trust me — you will save yourself a lot of time if you avoid using spaces in filenames.

I am known to be a space-hater, but the reason is very simple: in computing, the space character has been overloaded to mean too many things. Most crucially, it delimits parameters in command calls, and because of that you should avoid using spaces in things which are likely to become parameters in command calls (like filenames).

The Actual Boston Subway Map

Tuesday, August 3rd, 2010

Being a son of a seafarer, I developed a kind of fascination with being on the sea, and with maps. It is because of the latter (and because I happened to live in Boston, and because I didn’t quite like how MBTA imitated Harry Beck, and because I always wanted to know how far it actually between the different subway stops) that in 2005 I decided to make an actual Boston subway map, that is, a geographically-accurate map of all subway stops.

It was several years ago — I believe MBTA may have added a few subway stops since then, and you can also see all these stops on Google Maps, but there’s something elegant in the simplicity of my diagram. It’s also a good case study of Google Maps, scripting and LaTeX.

The idea was to find all the subway stops on a map downloaded from Google Maps using the locations of the stops as reported by MBTA (as you can imagine, it was a humongous pain to click on every single station map to figure out where to actually plot each station), and put the coordinates of each station in a LaTeX file that would generate the pdf image of the subway map. I used pstricks, which is a great LaTeX package for drawing graphics.

The following tcsh script downloads the relevant quadrants from Google Maps and creates an HTML file that displays all the quadrants on one large page. The URL format for the quadrants has changed since 2005 but you get the idea:

get.tcsh

Then I opened the large map in Photoshop and figured out the coordinates of each subway station and turned them into a LaTeX file:

mbta.txt and mbta.tex

Finally, I ran LaTeX to generate the following pdf file (click on the file to download the pdf):

The actual Boston T Map

Tools and Libraries

Sunday, January 10th, 2010