Thursday, November 20, 2014

Dynamic creation of mapping service

In one of our projects we got a huge amount of georeferenced plane survey sheets (see Fig.1). Because of their thematically link we published them within one time enabled Web Mapping Service (WMS) and Web Coverage Service (WCS) layer. The main goal of the WMS and WCS was to support reasearchers and other stakeholder by using our map data within there local reasearch environment (e.g. DesktopGIS). After a couple of months I got a call from a person from another state department. The person was quite happy about our data and the possibilitie to consume it also via GI-Services. But she had a hard time to figure out how to integrate the time enabled services in their DesktopGIS. I haven't a complete overview about the support of time enabled GI-Service through the classical DesktopGIS systems, but it seems that is still not the best. I managed to help on the phone, but after that I though about the cons of the time enabled Service solution, which is basically his higher rate of complexity.

Figure 1.: digital georeferenced plane survey sheet
On the next morning I tried an alternative approach. Normally the most person request only a single or a few plane survey sheets. Because we already use the UMN Mapserver software for publishing our WMS and WCS I would like to stick with this software. A single service wherein every plane survey sheet would be represetend through a single layer was not a solution for me. This would mean we would blast our mapfiles with over 6.000 layer what would make it slow and hard to manage. Also to produce >6.000 services with a single layer was not the best option for me. This would mean I got a folder with over 6.000 files, which also wasn't really appealing for me.

Finally I decide to use Mapscript and developed a small script who checks if there does a exist a service for a given service name and if not to create one from a template and information queried from a database. The script looks basically like this. 

#!/usr/bin/env python

# This scripts parse the WMS service request and checks if a matching mapfile exist. If not it create one.
# Dependencies: python-psycopg2, python-mapscript
# @author: jacmendt

import cgi, os, psycopg2
from mapscript import mapObj, OWSRequest

# Configuration parameters


# utility functions


# This part parses the request and builds the mapfile
# initialize service request
wmsRequest = OWSRequest()

# parse map parameter and check if it exist
mapfileId = wmsRequest.getValueByName('map')
mapfilePath = os.path.join(MAPFILE_DIR, mapfileId + '.map')
if os.path.exists(mapfilePath):
    # initialize mapfile and dispatch request
    wmsMap = mapObj(mapfilePath)
    # copy mapfile 
    wmsMap = mapObj(MAPFILE_TEMPLATE).clone()
    updateMapObj(wmsMap, mapfileId)

The complete script can be found on GitHub together with the used template file.

Saturday, August 11, 2012

What Should Your Data Do?

Object Oriented Design tells us that everything is an object. That's easy to grasp. It further tells us, objects communicate via messages issued against their public interface. Interacting objects represent context, data, function, interaction, types, in short, anything that a program has to know about. But that is only the way to think about the dynamic part of any object system. OOP tells us to think about design in an executable context.

Sadly, the way to create an executable context that maps to the intended object structure leads straight to the design of classes, which are by their very nature static. Started with a coherent idea about the domain, the programmer now faces the challenge to decide what classes to use and how to control their plumbing (prior to the days of IoC, of course). He now has to decide weather to use an abstract class, static class, a trait or case class and how the particular instantiation and public interface are shaped, even how to instruct a persistence framework. And everything has mutable (shared) state.

OOP is not wrong within it self. We just sometimes take it too far. We do Class Oriented Programming. Thinking of a list of URLs. Which data type do you like? ArrayList? LinkedList? HashMap? SortedArrayLists? Or ProjectsURLListWithFeatureX? The OOP idea conflates the structure of data with its related functions. So the question follows: What do you want your list to actually do? Uh, Nothing, you know. Just a list.

Defining functions on data structures is essential. But defining them tied to a particular data structure is only good as long as they just preserve the integrity of the data. Beware!

Friday, February 3, 2012

Professional Hide and Seek

For Java developers, it's inherent to use a lot of tools specific to tasks that arise around writing code, assembling configuration files and generally handle all the moving parts of a Java application. If you lack those tools, you will very quickly find yourself fighting a loosing battle against confusing machinery. The design of Java is inherently complex and specification based. Because of that, a Java solution is usually over-sized and over engineered for clear and manageable problems. But on the other hand, might be of good use for large applications with a lot of process and policies to govern large development teams. That's why Java has still such a good reputation amongst executives, whereas agile-minded developers tend to hate it's style and initial complexity. It seems that 'javarized' developers love their tooling and IDEs. It's mostly out of their imagination to tackle a simple programming problem just using a simple text editor and compiler, or even using a command-line for that matter. I know some exceptions to that assumption, but for me that's an actual cultural difference.

One problem of this tool-fetish however, has made it's way out of enterprise-zone into agile web projects and other former light weight architectures. Much of this is because web projects use Object Oriented Programming techniques just like the Java mindset demands it. A new generation of developers popped out of schools, not knowing anything about functional programming and its mathematical foundations. Sometimes they not know a single OOP design pattern. They are therefore doomed to conduct old-school C'ish OOP design failures over and over again. Producing immensely large code bases of ensnared component mazes, only manageable by a complex suite of tools, nobody but vendors understand what is actually going on. And all that happens in the name of efficiency! Developers shall not care too much about the complexity their dealing with, just leave it to the tools. This is professional hide and seek!

We see this trend has been slowed down to some extend by the rise of functional languages and the want for aesthetic design, but the devastation of 'class oriented' languages disguised as object oriented remains. It's time to move on! In the end, we are writing code for people to read. Machines can handle everything. Always think that way.

Monday, December 5, 2011

Pure Event Programming

I am just thinking of special kind of system and programming environment, where everything is based on publish/subscribe events. This would be a pure event driven system. Even code distribution would be an event, handled by an already running code handler that is automatically subscribed to such kind of events. Processes are registered with and may hook up on specific event patterns. The core run time system bootstraps and starts the particular process and removes it when processing is done and all events have been handled

As a basic principle the system is always running. The system changes at run time. All bootstrapping and introduction of event matching rules happen at run time. There is no need for external configuration given through config files. New code comes to the system as an event the process manager is subscribed to. Implications of this are interesting. You could start an "empty" system (no registered processes). You could run "empty" programs (no matching rules). You could then watch all events fly by and program by example. Refining your program as it runs. Just imagine how test driven development could be taken to the extreme.

Clearly, it's a shared nothing situation. All disposed memory objects get garbage collected. All data is shared by published events only. Those events may have temporal and locational properties like time_to_live, local_only, distributed or idempotent. Properties allow sophisticate matching rules like "2 seconds after X has been processed match Y". Memory objects can have quality properties like temporal, persistent, frequently_used, rarely_used, distributed. Setting memory object properties would be the only way to choose a persistence strategy for objects. Persistent data is known and shared between all instances of a process.

May be there are some systems with part of this characteristic. Functional programming ones most likely. Again, I am just thinking...

Tuesday, October 11, 2011

Data vs. Behavior

When programming in popular Object Oriented Languages it's an ongoing struggle to design a good class hierarchy. Oftentimes, things get messed up when classes are seen as merely functionality holders (or "method bags" if you like) with some private data. As we know OOP encourages us to build classes that represent concrete things in the particular domain and define methods that are applicable to those things. So far, so good. But are these methods implementing business functionality? Or aren't they just an interface to the data structure? I would make the case for the latter.
Object methods are just operations on a data structure, business methods usually go beyond that.

Objects as model of a concrete domain concept may play a lot of roles depending on the kind of business use cases they are serving. If the domain logic has only a few bounded contexts, a single type of model might be appropriate to serve all use cases. But this is usually not the case in bigger contexts, and one can easily find finer grained scenarios in even a simple blogging web application. Therefore, an application has multiple potentially overlapping models for different use cases. From an implementation perspective there are at minimum two models when you look at a simple data base backed web application: The application model often designed using programming language class types and the relational model on the data base side. Using non-relational data bases here just diminishes the impedance mismatch a bit, but it does not make one of the conceptual models go away entirely. By embedding persistence behavior into the model they become an additional technical responsibility totally unrelated to the domain. We have to adhere the presence of multiple models, otherwise "supermodels" a.k.a god-objects will arise, providing representations, abstractions and functionality for each and every domain and non-domain use case that might appear among the application. In the end Data Abstraction has been overdone.
Don't cram business methods and rules into data structures, they belong into different classes.

Implementation of non-domain logic is a common pattern in OO programs, especially when the Fat Model metaphor has been applied faithfully. This is not new and its avoidance often yields the use of the Data Mapper pattern. But domain related computation doesn't have to happen in data models at all. In fact, there should be a code layer expressing business related functionality as services operating on useful models representing the data needed for a given domain operation. This way, OOP practices and functional programming style could be implemented side-by-side avoiding to many objects that end with 'er', worker, manager and so on.
Don't conceal language verbosity with clever architecture.

Is polymorphism really natural to business methods? Doesn't it apply more naturally to domain models? Well known OOP techniques still have a lot of benefits when it comes to code design. But class hierarchies mapping functionality are often way different from domain model hierarchies (e.g. validation vs. persistence). Property handling logic, for example, is often messy, verbose, boring to code and laborious to maintain in most languages. But its clear and understandable like all boring straight forward code. Clever architecture instead, forces the reader to understand and deep-dive into a set of abstractions that is uncommon amongst developers and often not even related to the application domain anyway. But that's part of a bigger story :-)

Friday, October 7, 2011

Life Library Data

In a programmers perspective a library is a collection of functions called API to address very specific computational topics. There is often comprehensive documentation on how to use those functions, there are forums and blog posts on the web that help to gain understanding about the libraries internal workings. Is this metaphor still related to the real life?

Imagine the first thing you would think of if you want to get an overview over the data that is referred to in actual political debate in your local community or town. Where do you look for it? At a library website or rather in a local newspaper? If I want to know what happens in my community, at my university? Google might be the best place to start, well if the information I am looking for is relevant according to my Google profile and if it has been published on the web at all.

Newspapers usually get us well informed about general topics of interest. Journalists are the people behind those stories. But is the data they are using kept secret in some licensed agency data set somewhere? Maybe. How about the community? How do they keep themselves informed? Well, there are plenty of resources out on the web, at television, radio. Also they communicate at places where they meet. Might it be on the street, on Facebook or at the library.

People at the library are often very well informed about cultural and political events. They actively contribute to the local life as citizens and thus they know and might be able to share bits of information to leverage communication. It's common sense amongst librarians that people don't seem to want books, magazine orders or a laptop power supply. What they are looking for is knowledge (which oftentimes appears to be found in books) and an atmosphere of work, communication and collaboration.That's what takes them to the library building physically. But information is able to flow freely over the internet, so people just don't care or know about data curation that much. Libraries are aware of this to some extend, but, despite deploying a shiny website, they are stuck in old terms of curation and preservation now just fostering digital affairs.

But libraries should not just collect and conserve knowledge, they should also gather online and open access data. Further, they should not just collect data but generate and publish open data themselves. They should find, encourage and help people that are willing to contribute data and conversation! And finally, libraries should develop and provide expertise in data visualization and evolve to a data organization. Perhaps approaching linked open data principles as a technical benefit. I think this would help to connect a library to the people, and the life outside.

Monday, December 20, 2010

No comments please

Commenting source code is usually considered a good programming practice. And well, it is. Code without comments leaves the reader with a single option to gain understanding. Reading all of it. Often somebody left with the code to put more features into it, gets lost in details and isn't able to get an overview of the design ideas. Comments might assist here. As I personally don't want somebody to have to wade through every bit of my programs I tended to comment *everything*. So having a comment to code ratio close to 80% comments became an incentive in my programming practice.

But writing comments at such a high rate generated quite a bit of extra effort in writing programs and in doing maintenance. Plus, a lot of these comments I wrote seemed to be redundant and uninformative especially when describing data structures. But despite of this feeling I continued commenting like mad in all the following projects. I kept commenting. It was a dogma.

One day - after another insane amount of commenting effort - I stopped for a short review. Asking myself what is the Return of Invest in commenting that much? Seriously, comments saved my life when had to get back to older (>2 weeks) code. Reading comments instead of the real code, which is obviously harder to get then natural language writing, has been a pleasure. But still there where lots of comment in places where I felt there was no good reason for having them, beside sticking with my coding guidelines. No additional meaning. No clarification. No explanation of intent.

Then lately, by reading Clean Code: A Handbook of Agile Software Craftsmanship if found validation of my personal findings. There are some rules of good measure about commenting, as there are with coding and testing etc. Consider the following example code where most of the comments are absolutely redundant:

 * Represents a circle with central point and radius.
public class Circle implements Shape {

     * Center point of the circle.
    private Point center;

     * Radius of the circle.
    private Float radius; 

     * Create a circle.
     * @param center Point that defines the center
     *               of the circle.
     * @param radius Value that defines the radius
     *               of the circle.
    public Circle(Point center, Float radius) { = center.clone();
        this.radius = radius;



Its OK to strip those comments out and preserve the expression of intend, because the code is that simple and easy to follow. As long as the interface is clear and unambiguous comments are just clutter here. Note that comments on private elements are not part of the public API at all! They are left for the next developer touching the code. Of course, there are more complex situations in real programs but the point is still valid there: Prefer a clear API over extensive commenting.