The T-Files


Fri, 09 May 2008

Big One coming up?

Earthquakes are quite frequent in Japan, but now we had three tremor days in a row, which is a little unnerving.

And did I mention that Earth's rotational speed has increased recently, for unknown reasons nonetheless?

Tue, 06 May 2008

No Country for Old Men

Movie poster

Hunting for deer in the Texas desert, Llewellyn Moss (Josh Brolin) stumbles upon the bloody aftermath of a drug deal gone wrong, including a suitcase full of money, which he decides to keep for himself. Unfortunately, the Mexican bosses find out about him and send deranged killer Anton Chigurh (Javier Bardem) after him (strangest name since Keyser Soze, you say? Well, he is equally creepy, too). Ageing third-generation sheriff Ed Tom Bell (Tommy Lee Jones) can do little to stop the rampage that follows and starts seriously thinking about retirement.

This is the Coen Brothers back in Blood Simple mode. All the violence of Fargo, without any of the comedy pieces. And for some reason they refuse to give us a satisfying ending. The typical Hollywood movie would have seen Brolin and Jones overcoming Bardem in a big shootout at a motel (wrecking the place in the process). A darker version would have Bardem win. In either case, the whole movie (which is after all kind of a western) was building up towards that epic showdown. Well, does not happen. Or rather, does happen, but we don't get to see any of it. Is this the Coens telling us that violence does not pay, not even for movie-goers?

8 points

Sat, 03 May 2008

Leap Seconds

I was reading up about time zones on Wikipedia, when I came upon this chart showing the difference between UTC and the real time (mean solar time) over the last few years, and how leap seconds are introduced to keep UTC from diverging too far.

Apparently, there hase been a leap second about once a year until 1999, when the divergence rate slowed down and did not require adjustment until 2005. So I thought that they maybe tweaked the formula a little, but as it turns out for unknown reasons, Earth has sped up after year 2000, so the mean solar day has become 1 ms shorter and fewer leap seconds have been since then.

Should I be worried?

Sun, 27 Apr 2008

Atonement

Movie poster
A lavish country-side estate in England, 1935. Young Briony catches glimpses of the developing romance between her older sister Cecilia and the long-time family servant Robbie, which leaves her confused, frightened and angry. When her cousin Lola is assaulted at night, she thinks she saw Robbie do it and falsely accuses him. Based on Briony's testimony, Robbie is sent to prison. The story picks up four years later with Robbie as a soldier trying to get out of France (climaxing in an epic scene on the beach at Dunkirk) and return home to Cecilia, who, estranged from her family, has stood by him and now works as a nurse. Briony is also a nurse and tries to find a way to mend the damage she has done.

7 points

Sun, 20 Apr 2008

Cherie Dolce

Circle K sunkus wishes to offer you a unique and cozy store to put a smile on everyone's face. That wish has born a new concept Cherie Dolce. Its comfortable and warm atmosphere will smooth you and ease your mind anytime. We wish to begin a new style of c-store with this new concept Cherie Dolce. It would be a little bit comfortable c-store for you.
Sun, 13 Apr 2008

Andy Oram & Greg Wilson (Ed.): Beautiful Code

An O'Reilly book without the popular animal cover design that collects essays where leading programmers explain how they think and present examples of elegant solutions to hard problems.
Author Subject Programming Language
Brian Kernighan A regular expression matcher C
Karl Fogel An internal data structure of Subversion C
Jon Bentley Quicksort C
Tim Bray Web server log file analysis Ruby
Elliotte Rusty Harold XML verification Java
Michael Feathers The FIT Framework for Integrated Test Java
Alberto Savoia JUnit Java
Charles Petzold On-the-fly code generation C, C#, CLR Intermediate Language
Douglas Crockford Top-down-operator-precedence parsers JavaScript
Henry S. Warren, Jr. Counting the number of set bits in a word C and circuit diagrams
Ashish Gulhati Secure web-based email Perl
Lincoln Stein Data visualisation for bioinformatics Perl
Jim Kent A genome analyser web application C
Jack Dongarra and Piotr Luszczek Libraries to solve linear equations MATLAB, Fortran
Adam Kolawa The CERN mathematical library Fortran
Greg Kroah-Hartman Linux kernel drivers C
Diomidis Spinellis Layers of indirection in the FreeBSD filesystem drivers C
Andrew Kuchling Python's dictionary data structure C, Python
Travis E. Oliphant Multidimensional array iterators C, Python
Ronald Mak A highly reliable information portal for the NASA Mars Rover Mission Java
Rogerio Atem de Carvalho and Rafael Monnerat Enterprise Resource Planning Python
Bryan Cantrill Thread synchronisation and prioritisation in Solaris C
Jeffrey Dean and Sanjay Ghemawat Map-Reduce C++
Simon Peyton Jones Software Transactional Memory Haskell
R. Kent Dybvig Macro expansions Scheme
William R. Otte and Douglas C. Schmidt A networked logging service C++
Andrew Patzer REST (as opposed to SOAP) for integrating business partners Java
Andreas Zeller Systematic debugging Python
Yukihiro Matsumoto Brevity and human-readability Ruby
Arun Mehta A one-button user interface for Professor Hawking Visual Basic
T.V. Raman Emacspeak (auditory output from Emacs) Emacs Lisp
Laura Wingerd and Christopher Seiwald The Seven Pillars of Pretty Code C
Brian Hayes Computational Geometry Lisp

Thu, 10 Apr 2008

Are relational databases on the way out ?

For decades, the default choice when it comes to storing application data have been relational databases. Recently however, we see a lot of alternative approaches gaining widespread exposure (not sure about acceptance yet), especially as part of Web 2.0 platforms. Think Amazon's SimpleDB, Google's BigTables, or Apache CouchDB.

Cluster architecture: RDBMS have traditionally always been client-server oriented, meaning that you can have multiple clients access the same database concurrently over a network. This alone is an enormous improvement over file-based storage, and it is also useful for three-tier web application, as it allows to scale out the number of application servers. In order not to have the single database server as a bottleneck and single point of failure, you eventually will want to spread its functionality over a cluster of machines. This is a more advanced option that most RDBMS have added in one form or another, but it seems these new web databases were designed specifically to run on distributed nodes.

Schema-free: RDBMS rely on data schema definitions (tables with typed columns) and have great difficulties to handle unstructured documents. In particular, a relational system offers no way to query data other than by column value, and makes it very difficult to query data across tables. Again, most RDBMS now have non-relational extensions like XML query capabilities or full text search. In contrast, the newcomers appear to be very document-centric, where every document can have its own set of attributes. One could argue that a data schema is part of the data integrity validation that a database system should perform. On the other hand, most people seem happy with doing that in the application instead, and in any case, it seems like it should be an optional feature. One could also argue that a fixed schema makes for more efficient storage and access paths. In this case, the schema is seen more as a necessary evil, and one would be happy to give up on it if any performance problems can be avoided some other way.

Impedance mismatch: A big complication when using an RDBMS for storing application data is that everything has to be broken down and mapped to tables and columns using only the rather primitive (scalar) data types of the RDBMS. This gets complex very quickly, both conceptually and also in regards to how the resulting data will be stored, retrieved and queried. Multi-table joins are not easy to understand, and also not especially fast to execute.

Transactions: Probably the main selling point for an RDBMS is that they pass the famous ACID test: Atomicity (all or nothing: no incomplete updates), Consistency (the state of the database does not get corrupted at any time, even in the presence of crashes), Isolation (no one can see the results of a transaction before it is committed), Durability (no committed update can be lost). These properties are essential for many applications, but they come at a cost. In particular, they make it difficult to efficiently replicate or distribute the system. The newer non-relational databases tend to relax these constraints considerably, which makes them unusable when you really need a transactional database. But if you don't ...

Performance: One would assume that RDBMS with all their compacted and normalised storage schemes and their indices are the fastest way to go. And I guess that they do offer the fastest possible way to sort fifty million records, but how often do you really need to do that? Especially if sorting these fifty million records in the fastest possible fashion is still too slow for an interactive application, you start looking at alternative approaches such as an intelligent hierarchy of pre-computed aggregated data. In the RDBMS world this is called data warehousing. Once you get used to the idea that ad-hoc queries are impossible anyway, and that anticipated queries can be satisfied using clever indexing (that may not even need to be completely up-to-date), the performance benefits of operations that you can avoid become less important.

So, in summary, I think that these new databases are obviously not able to replace an RDBMS in its traditional field of operation (record processing where consistent read and writes, transaction isolation, and atomic updates are critical), but they may very well take a sizable chunk of the huge market where RDBMS are currently being used solely because there have been no other choices. There may be no need for an RDBMS in the usual web application stack after all.

Mon, 07 Apr 2008

Hemorrhoid Pictures

Confusing gmail ad of the month. Usually, those targeted ads are actually really close to the contents of the mail thread that they are displayed for. But I really cannot see how big bad hemorrhoids (I am not giving you a link here, and I am certainly not going to click on it myself, being in the middle of dinner and all) are related to the following conversation...

-------------
sakura pictures
 
   :-)
   attached: IMGA0110.JPG IMGA0112.JPG
   
-------------
Thanks Thilo,

And here is the video of the S. Carolina beauty contestant:
http://youtube.com/watch?v=WALIARHHLII

Sun, 06 Apr 2008

Deutsches Dorf Tokyo

Just like the New Tokyo International Airport, and Tokyo Disney Land (and Sea), the Country Farm Tokyo German Village is not really in Tokyo, but in the neighbouring prefecture of Chiba, where things are less crammed and there is more space for roomy ventures like, well, an international airport or a theme park. The German Village is mostly a big park (in the traditional sense, with meadows and flowers, and ponds) which is intended to bring a healthy breath of country-side lifestyle to stressed big city families. It is only mildly interested in trying to recreate Germany (or Bavaria): You do get beer tent background music, imported sausages, beer, Maus and Diddl goods, and Haribo, but there are also completely generic attractions like golf courses, a petting zoo, a pizza restaurant, a video game arcade, a Ferris wheel, and decidedly un-German foodstuff, such as dried jellyfish and other local (as in Chiba) snacks.

Sat, 05 Apr 2008

Philip Pullman: His Dark Materials

I quite enjoyed the Golden Compass movie and immediately ordered this boxed set of Lyra's adventures (the Golden Compass, the Subtle Knife, and the Amber Spyglass) from Amazon. It is being marketed as a Young Adult book, probably as a result of the main characters all being teenagers, but it certainly tackles more serious topics than, say, Harry Potter, and there are also a number of rather shocking plot developments.

When the Catholic League called for a boycott of the Golden Compass movie, they said it was less about the picture, but more about keeping children away from the books. And indeed, Pullman is quite aggressive in his attack on the concept of organised religion, to the point where one has to wonder if he is actively trying to offend.

Hardcore fans of the novels also disparaged the movie for watering down the controversial content to make it more commercially viable. I do not think that this actually happened, and the religious themes are not all that prominent until the later volumes anyway, but the movie does deviate from the source material in other ways, most notably in that it cuts off the ending (an anti-Happy-Ending if there ever was one) and reverses the order of the two main events before that. Apparently Pullman approved of these changes, though.

Sun, 23 Mar 2008

Treasures of the Household

Part Six: Finally, a dish washer!

Fri, 21 Mar 2008

The Darjeeling Limited

Movie poster

Three rich and estranged American brothers (and their eleven suitcases, the printer, and the laminating machine) on a train voyage across India to find their mother (turned nun in the Himalayan foothills) and renew the family bond.

Wes Anderson's latest oeuvre is, well, a Wes Anderson film. The focus is clearly on quirky character flaws, oddball dialogue, surreal situations, meticulous attention to detail, the retro soundtrack, and the colour schemes, and Anderson fans will be able to enjoy that. You even get a short Bill Murray cameo to round off the cast of usual suspects (Owen Wilson, Jason Schwartzmann, Anjelica Huston). But if you were looking for plot lines, character development, or a message, you might end up disappointed. Or offended that the India depicted is a collection of stereotypes and spoiled Western boys' dreams and serves as no more than exotic backdrop. Or annoyed that for all of the pretentiousness (especially with the opening short film), there is not much substance to it.

7 points

Wed, 19 Mar 2008

Disney Mobile

Japan has a new mobile phone service provider: Disney Mobile launched at the beginning of this month. They are a virtual network operator using Softbank's infrastructure (and also collaborate with Softbank in other ways, such as marketing, there are posters all over the place now). There used to be a Disney Mobile in America, but they failed and folded last year. In Japan they target women in their twenties and thirties rather than families with children in the US. This could actually work, Disney's various franchises are very popular in that demographic and they can also draw on an existing base of three million subscribers to their mobile content offerings (ring tones and such).

There have also been rumors that if the iPhone gets introduced in Japan, it would be on Disney Mobile. Unless this is going to be a non-exclusive deal, I do not think that this is a good match, seeing how Disney only targets a very specific market, and how both of these strong brands would probably not like to share the limelight with the other one. On the other, Steve Jobs is the biggest individual shareholder in Disney...

In any case, au sent me (completely unsolicited and for free) another three months' worth of pre-paid calling cards, so I am good until November now.

Sat, 15 Mar 2008

Eldritch Horror in the Great War

Wed, 12 Mar 2008

More JDBC Microbenchmarks

Now that I managed to log in to my OTN account, here are the results of Saturday's test suite for Oracle XE on Windows XP:

run0[ms]run1[ms]run2[ms]run3[ms]updates/sec

Oracle

A3490338233393402296

B1578147814441452686

C1272125312411474756

D6606606576591518

E4745434123256

  • With Oracle, using prepared statements makes a lot of difference, going from interpolated variables to bind variables more than doubles the throughput, and reusing the same statement adds another ten percent. This is good news in more than one way, because that first part (low-hanging fruits for a programmer) brings such a big gain that you can argue against the need for the extra few percent fro the second improvement, which is trickier to implement in a general fashion (although you could turn on statement-caching in the driver, I need to try to measure that some time).
  • Using the batch-update interface when applicable gives a spectacular boost, in this case it is about 15 times faster. Further testing is needed to how this plays out with different batch sizes, specifically if there are upper and lower limits for when it makes sense to use the feature.
  • As for how much time it takes for getting a connection from the pool, it depends if you turn on the validation feature of the pool, which checks if the connection is still alive before giving it out. With validation turned off, there is basically no overhead, with validation it adds a few milliseconds every time you get a connection, in my case (I only tested this with Oracle, the times are not included in the charts) one to two ms.

After these measurements for a thousand updates, I also took timings for a different scenario:

  1. SELECT non-existing row
  2. INSERT the row
  3. SELECT again
  4. UPDATE the row
  5. SELECT again
  6. DELETE the row
  7. SELECT the now missing row again

This pattern was run in two variations (as shown above and without the selects) in two different implementations (using bind variables or not using them). Each of these four routines was run interleaved (ABCDABCD...) for a total of 101 times, with the first iteration results discarded, and the times it took for each iteration becomes the benchmark result. The connection was in auto-commit mode the whole time.

[ms/run]PostgresqlMySQLOracle

Insert, update, delete (no binds)4.8328.8

Insert, update, delete (binds)5.3325.3

Plus selects (no binds)9.93714.6

Plus selects (binds)10.3369

Again, we see prepared statements making a big difference on Oracle, not so much (even a slight slow-down?) on the open source databases, and that MySQL suffers because of the slow commits (of course, it should still be fast enough, that part is unlikely to become the bottleneck).

Potential follow-ups to this would be to properly profile the connection pool's validation feature, to include Hibernate into the mix and measure its overhead, to record the strain on the server, and to use multiple threads to see how bind variables affect scalability. But I promise that if I do that, I will not bore you with the results here on my blog (one thing that I do want to put here, though, are the results of running these two benchmarks on the same machines in Perl instead of Java).

Mon, 10 Mar 2008

Me wearing other people's glasses

Part twelve: Protective goggles in the hardware store.

Sat, 08 Mar 2008

JDBC Microbenchmark

What is the overhead of getting a fresh connection from the connection pool instead of passing the connection around? How much faster are repeated SQL statements when using a fixed query string with bind variables as opposed to directly interpolating the data into the query string? How much faster when re-using the same prepared statement? How much faster when using a batched update?

I ran a little benchmark.

  • A) 1000x [getConnection createStatement executeUpdate commit]
  • B) 1000x [getConnection prepareStatement executeUpdate commit]
  • C) getConnection prepareStatement 1000x [executeUpdate commit]
  • D) getConnection prepareStatement 1000x [executeUpdate] commit
  • E) getConnection prepareStatement 1000x [addBatch] executeBatch commit

I wanted to test Oracle XE on Ubuntu, but did not get either installed (the eMachine did not like the Ubuntu CD, and Oracle's web-site was unresponsive), so I went with Postgresql 8.3 and MySQL5(InnoDB) instead. The databases were running on Windows XP, both fresh installs using the default settings, accessed from the Java test program on a Mac mini via local ethernet network.

run0 [ms]run1 [ms]run2[ms]run3[ms]updates/sec

Postgresql

A2013194517551809545

B2088179118751731556

C1667172916581714588

D1213119811791169846

E7697807677661297

MySQL

A9479937992999479107

B9382935792649274108

C9314937193949222107

D6326176746411553

E6506506136341581

  • Commits against MySQL are amazingly slow. I assume that this is a problem with my setup, or with Windows. This also probably only affects the transactional InnoDB backend.
  • With MySQL, there is no speed difference between methods A, B, and C, and hence no visible performance advantage to prepared statements. Maybe the JDBC driver does not implement the feature. With Postgresql it seems to improve throughput, but not by much. The Oracle figures should be interesting here.
  • Committing only once instead of separately after every update makes a big difference, especially with MySQL (see above). Of course, performance considerations should not be a factor in deciding what a transaction is.
  • Bulk updates give another big boost to Postgresql, not so much to MySQL.
Wed, 05 Mar 2008

From X to O: Price

The XO-1 was supposed to be the $100 Laptop, but unfortunately went over budget and in its current version costs $188. The OLPC project hopes that an increased output combined with price drops in its off-the-shelf components will bring production costs down enough to reach $100 by the end of this year. Considering that OLPC mainly targets the world's poorest countries, and that for example Intel sells more conventional computers (together with Windows licenses, training, and support) starting for less than $300 in these markets, the price tag could easily become the decisive factor for OLPC's success, regardless of the educational and social concepts that they also have to offer.

Sun, 02 Mar 2008

The Golden Compass

Movie poster

Whether the adaptation of Philip Pullman's fantasy novels will become the trilogy it is clearly intended to be will depend on the financial success of this first part. Box office results in North America have been disappointing, probably because the Catholic League called for a boycott, but overseas performance has been solid. It seems to be up to Japan now.

7 points (ahead of Narnia, slightly ahead of Potter).

Sat, 01 Mar 2008

AVCHD

We have a video camera (Panasonic HDC-SD5) shooting in Full HD resolution, which is more of a down payment towards a future home entertainment system than anything we can really use right now (nothing in the house can play back at a 1920x1080 resolution). So for now I am just left with these huge files that take up lots of disk space to store and ages of CPU cycles to process.

The camera records in AVCHD, which is a highly compressed format that still takes about 1GB per 10 minutes. Things get worse when importing them into iMovie, because iMovie insists to decompress the files, resulting in 1GB per minute, and the decompression is painfully CPU-intensive, running at about real-time. I want to store the movies exclusively in their native AVCHD until I feel like actually editing them, but there seems to be no viewer application for that format (which is weird, as iMovie's import wizard can preview them). At least I can copy them off the camera onto the hard disk to free up the memory card.