Saturday 21 July 2007

Flex VS Silverlight - another shot in the war

Another interesting post by Kurt Cagle on The Open-Sourcing of FLEX (no, I'm not paid to shill for Kurt - even if we do live in the same town). I've been wondering how the Flash/Silverlight war was going to evolve, but I'd missed this turn of the strategy. Kurt has some interesting points:
  • Open Source is a weapon to gain market share
  • Does this promote the slow death of SVG? And will Mozilla team up with Adobe to promote Flex? (If so, maybe this answers the question about how the OS community will provide a story to reply to Silverlight)
  • Does OS-ing Flex enhance Adobe's ability to improve the technology with feedback from OS adopters?
The battle for the client UI is going to be very interesting to watch, and the outcome is going to affect the way we design applications for a long time to come. I'd give the edge to MS in this one (much as I hate to say it) except for one thing - will they sacrifice lock-in to the Windows platform in order to ensure dominance for their client platform? That's a hard one to call...

Friday 20 July 2007

Where is XML going?

Kurt Cagle has an interesting post on "Where is XML going".

One of his bold predictions is that SQL databases (by which I think he means relational DBs) are going to plateau and slowly be replaced by XML-based DBs. I find that a fascinating possibility. Certainly the emergence of XQuery finally provides a decent query language for XML (and it's interesting to note that it draws from or at least parallels many SQL concepts).

One question I have is whether the XML information model is in fact more expressive than the relational model. It certainly seems so, since you can easily represent a table in XML, but the converse is not so simple. XML contains implicit information about hierarchy, which must be supplied in a relational model. It seems like there should be some sort of information content metric which could be used to determine which model is more powerful. Although, perhaps the human factors are in the end more important.

Also, to me it feels like XML suffers from too much concern with syntax (whereas relational is a pretty pure data model). There's also the cruft of elements versus attributes, processing instructions, etc. All of these complicate the XML model without bring a lot more data expressiveness. Perhaps simple use patterns will emerge and we'll all just get used to ignoring this stuff (but it seems to me that's been said about SOAP as well, and look how successful that's been...)

Wednesday 18 July 2007

Do mountains exist?

I came across this provocatively-titled paper while web-surfing looking for information on ontology. It's a fun read, if you're into that kind of stuff. I don't think the paper really tackles its proposed thesis in a head-on way, but it does raise some interesting points about the philosophy of geospatial data, such as:
  • What is the real definition of a mountain? Sure, we all know one when we see one - or do we? Go ahead - try and define this concept in a rigourous way...
  • There is a long-standing schism between between feature-based and field-based views of natural phenomena. Both are useful, in different contexts. (This seems to me to echo the particle-wave duality of quantum mechanics - is there possibly some sort of fundamental dichotomy going on here?)
  • Every so often this kind of debate surfaces in the geospatial world (a recent example is this thread on the GeoWeb blog). But we are as babes in the wood compared to philosophers, who have been debating this topic for at least 2000 years. Fortunately, it looks like the kindergarten of computer science is beginning to engage in a useful dialogue with the fusty temples of philosophy.

Thursday 12 July 2007

The ultimate Point-In-Polygon implementation

The classic stabbing line/ray-crossing point-in-polygon algorithm has been around for longer than computational geometry itself. (See William Franklin and SoftSurfer for background and description. ) The algorithm as usually published is short, effective and clever. But it does have some limitations:
  1. It is implemented as an iteration over a data structure representing a sequence of line segments. This ties the algorithm to a fixed data structure, which leads to code duplication
  2. It handles simple rings, but not fully general polygons (which may contain holes and/or multiple shells)
  3. It does not detect the case where the query point lies on one of the segments in the polygon
  4. It runs in O(n) time. This is reasonable for a single query point, but is not the most efficient way to perform many point queries against a single fixed polygon
  5. It uses conventional floating-point arithmetic. This makes it non-robust (i.e. potentially incorrect for some inputs)
This predicate forms a key component of many spatial processing algorithms. In JTS it's used all over the place. The more places it's used, the more the above limitations have become apparent. Over time I've been slowly chipping away at removing them.

Most recently, I've revisited P-I-P in the course of some work on improving the performance of spatial predicates. And I think I've finally arrived at the ultimate Point-In-Polygon implementation. It removes all of the above limitations with the following design features:
  1. It uses an incremental design, which frees the algorithm from being tied to any particular segment data structure
  2. It handles all polygonal geometry types defined in JTS. (Implementing this was made much cleaner by the pleasant realization that the PIP algorithm really doesn't care where in a polygonal geometry the segments come from. The only thing that matters is the parity of the segment crossings. This holds true for any number of holes and shells.)
  3. It detcts the point-on-segment case, and reports it via a trivalent return value (in the set {INTERIOR, BOUNDARY, EXTERIOR}).
  4. It is straightforward to use with a spatial index on the segments of the polygon (such as STRtree or BinTree), thus providing O(log n) performance
  5. Thanks to the RobustDeterminant class in JTS, the implementation is much more robust. (Due to some remaining floating-point arithmetic it may still not be 100% robust - but I hope to address that sometime as well.)
Even with all these changes, it's not much more complex than the original algorithm (at least, not given various supporting components which are already available in JTS.)

Coming soon to a JTS version near you...

Monday 9 July 2007

Relations - the machine code for data?

In the rushing torrent of computer science, a few concepts stand like rocks which resist being swept away. One of the most familiar and useful is relational database technology. It took a relatively short time for this concept to be identified (in Codd's paper of 1970) and implemented commercially (around 1975, in IBM's System R, among others). Perhaps only 10 or 15 years elapsed between the earliest database systems and the emergence of commercially viable RDBMS . For over 30 years since then RDBMS's have stood the test of time and repelled all challengers to their central role in data management.

A recent and serious challenger was Object database technology. In the 1990's this area looked poised to ride the gathering wave of object-oriented programming and sweep aside the fusty old RDBMS technology. Since then, while the OO wave has swept far up the beach, the promise of ODBMS seems to have been left behind in the receding foam, never having been fully realized.

So why is this?

I think one reason is that the relational paradigm could be a fundamental data structure for information representation. In a sense, relations are the machine code of data modelling - they can be used to represent any required data model. Granted, relational models aren't alway elegant or efficient - but that simply confirms the metaphor. Moreover, their simplicity allows relational theory to be firmly grounded in mathematics and logic. (Of course, the relational hard-cores have been saying this for years - in some cases a bit too vociferously, IMHO. I'm just reluctantly coming round to think that they might be right.)

A further piece of evidence for this observation is that RDBMS vendors have a fine track record of being able to adapt the core paradigm to accomodate new technical ideas. Queuing, security models, data analytics, and of course even (to an extent) object-orientation are some of the areas which have been implemented quite reasonably in terms of an underlying relational model.

ODBMS's, in contrast, have never quite seemed to develop a truly compelling and general paradigm. The various OO query languages don't seem to have the expressive power of SQL, and Object data models seem to run up against thorny semantic issues sooner than relational ones do (such as schema evolution and inheritance modelling). Granted, implementing an object-based data model using relational technology doesn't actually solve any of these issues, but somehow the maturity of relational techniques and tools make them seem less painful.

Some people (myself included) have tended to see the rise of Object-Relational mapping technology as at best a temporary expedient, and at worst a string-and-baling-wire solution to cope with the OO-RDB impedance mismatch. Just give ODB's a few more years to mature, we hope, and we can do away with this inelegant hack! But if relational really is a fundamental concept, perhaps the quest for pure OO databases is misguided. Perhaps O-R mapping tools are the final goal, not just a stop-gap. They can be thought of as compilers for the data access layer (with all the opportunity for performance improvement and platform independence that the compilation paradigm provides).

This isn't to say that RDB technology has evolved as far as it needs to. I hope to see much more powerful querying capability, using concepts from the areas of logic and functional programming, machine reasoning and knowledge representation. All of these seem to be continuing to mature - I look forward to seeing them make the leap to practicality like OO did years ago.