Tuesday 13 January 2009

More on Database Thawing

I just saw this blog post in rebuttal to Martin Fowler's DatabaseThaw article. It's a typically well-thought out response from Typical Programmer. The short summary is that he rejects Fowler's criticism of using RDB's as the mechanism for application integration. (And don't dismiss him as simply a RDB bigot - he's not shy about nailing relational gurus, as in Relational Database Experts Jump The MapReduce Shark).

TP's arguments ring true to me, although in my experience it takes a very concientious DB shop to fully maintain the "encapsulation of data and operations" across many applications, many custodians, and many developers.

It would be great if Fowler's vision of the integration point being HTTP rather than SQL solved all our problems. Loose coupling is a wonderful thing to have. But so far I'm not seeing how this provides equivalent performance for all cases, and how it avoids the problem of data model mismatch.

Friday 9 January 2009

Is Global Database Warming happening?

Martin Fowler has a great blikiTM post on DatabaseThaw. The soundbite is something like "What happens after relational DBs?". He has some interesting links to things like Drizzle.

His opinon of relational DBs is that

their dominance is due less to their role in data management than their role in integration

This makes sense, but I think they also have offered the benefit of providing a fairly powerful, standard way of modelling and querying data. Over the lifetime of this technology this has fought off some powerful challengers, notably object-oriented data modelling. (Although perhaps this battle was only won by becoming more like the "loser"). The latest challenger is semantic data modelling (RDF, SPARQL, etc). Perhaps the world is ready to make this transition - although I'm not rushing to sell my Oracle stock.

He also makes a good point that the focus of integration is shifting from the DB to the Web. Hard to argue with this... the clouds are moving in!

Thursday 8 January 2009

Computing Geometric Similarity

The GEOS team is working on porting the JTS CascadedPolygonUnion algorithm. Reasonably enough they asked whether there were unit tests that they could use to check the correctness of their code. I was a bit dismayed to realize that in fact there weren't any unit tests for this class (although having used the code extensively I'm pretty confident that it's correct.)

So fine, some unit tests are needed. This should be easy, right? Just create or obtain some polygonal datasets, union them using both CascadedPolygonUnion (fast) and Geometry.union (slow), and compare the results.

But there's a catch (in geometric processing there usually is). Due to rounding effects, there's no guarantee that the results of the two algorithms will be vertex-by-vertex identical. Of course, they should be very, very similar... but how to check this?

In geometry processing it's usually much harder to test "similar" than it is to test "exact", and this case is no exception. In fact, there's many different tests that could be used, depending on how you want to define "similar". In the context of unit tests, similarity testing is essentially checking for discrepancies which might be introduced into a theoretically correct result. So the tests to use may depend on the operation(s) that are being performed, since different kinds of algorithms can produce different kinds of discrepancies. A further issue which needs to be considered is to detect gross differences which might result from catastrophic algorithm failure.

So now the task is to write a SimilarityValidator for geometries. For polygons there's various characteristics which can be compared:
  • A simple thing to do is to compare basic quantities like the area and perimeter length of the polygons. Paradoxically, these are good for detecting small discrepancies, but not for detecting gross ones (e.g. two identical rectangles which are offset by a large distance)
  • One obvious thing to do is to compute the symmetricDifference of the polygons and inspect the area of the result. This has a couple of issues. One is what area tolerance to use? A more serious issue is that the symmetric difference operation is itself an approximation to an exact answer. This is especially true in situations where the inputs are very close - which is exactly the situation we are trying to check! So there may be a certain lack of confidence that this is detecting all discrepancies
  • A more robust test is to compute the Hausdorff distance between the rings of the polygons. This allows expressing the tolerance in terms of distance, which is more intuitive. It's also fairly robust and easy to confirm the answer. There's a challenge in developing an efficient Hausdorff algorithm - but that's a topic for another blog post.
  • Other tests might include comparing structural things like comparing the number of rings in the polygons.


Hausdorff Distance between two polygons

With this validator written and some test datasets run I can finally confirm that CascadedPolygonUnion actually works!

Friday 2 January 2009

JTS Version 1.10 released

I'm happy to announce that after a long gestation JTS 1.10 has been released. (This is my contribution to help fulfill James Fee's prediction 8^)

The package is available on SourceForge here.

There are numerous enhancements in this version. The highlights are:
  • Dramatic improvements to buffer performance, robustness and quality
  • Moving the GML I/O classes into the core library, and provided more control over the emitted GML
  • numerous bug fixes and performance improvements.
The full list of changes is:

Functionality Improvements

  • Added Geometry.reverse() method for all geometry types
  • Added setSrsName, setNamespace, setCustomRootElements methods to GMLWriter
  • Added Envelope.getArea method
  • Added copy, copyCoord methods to CoordinateSequences
  • Added area method to Envelope
  • Added extractPoint(pt, offset) methods to LengthIndexedLine and LocationIndexedLine
  • Added CoordinatePrecisionReducerFilter
  • Added UnaryUnionOp(Collection, GeometryFactory) constructor to handle empty inputs more automatically
  • Added DiscreteHausdorffDistance class
  • Made LineMerger able to be called incrementally
  • Added GeometricShapeFactory.createArcPolygon to create a polygonal arc
  • Enhanced Geometry.buffer() to preserve SRID

Performance Improvements

  • Improved performance for EdgeList (by using a more efficient technique for detecting duplicate edges)
  • Improved performance for ByteArrayInStream (by avoiding use of java.io.ByteArrayInputStream)
  • Unrolled intersection computation in HCoordinate to avoid object allocation
  • Improved performance for buffering via better offset curve generation and simplification.
  • Improved performance for IsValipOp by switching to use STRtree for nested hole checking

Bug Fixes

  • Fixed Geometry.getClassSortIndex() to lazily initialize the sorted class list. This fixes a threading bug.
  • Fixed RectangleContains to return correct result for points on the boundary of the rectangle
  • Fixed error in com.vividsolutions.jts.simplify.LineSegmentIndex which caused polygons simplified using TopologyPreservingSimplifier to be invalid in certain situations
  • Fixed error in DouglasPeuckerSimplifier which caused empty polygons to be returned when they contained a very small hole
  • Fixed PackedCoordinateSequence to return NaN for null ordinate values
  • Fixed Geometry.centroid() (CentroidArea) so that it handles degenerate (zero-area) polygons
  • Fixed Geometry.buffer() (OffsetCurveBuilder) so that it handles JOIN_MITRE cases with nearly collinear lines correctly
  • Fixed GeometryFactory.toGeometry(Envelope) to return a CW polygon
  • Fixed UnaryUnionOp to correctly handle heterogeneous inputs with P/L/A components
  • Fixed UnaryUnionOp to accept LINEARRINGs
  • Fixed CentroidArea to handle zero-area polygons correctly
  • Fixed WKBWriter to always output 3D when requested, and to handle 2D PackedCoordinateSequences correctly in this case
  • Fixed NodedSegmentString to handle zero-length line segments correctly (via safeOctant)
  • Cleaned up code to remove unneeded CGAlgorithms objects
  • Fixed GeometricShapeFactory.createArc to ensure arc has requested number of vertices

API Changes

  • Moved GML I/O classes into core JTS codebase
  • Changed GMLWriter to not write the srsName attribute by default
  • In DistanceOp switched to using nearestPoints method names
  • Exposed STRtree.getRoot() method

JTS TestBuilder

UI Improvements

  • Added ability to read GML from input panel
  • Added GML output to View dialog
  • Added file drag'n'drop to Geometry Input text areas
  • Add display of computation time
  • Added Stats panel
  • Added Scalar functions panel, with extensible function list
  • Added Save as PNG...

JTS TestRunner

Functionality Improvements

  • Added -testCaseIndex command-line option