Chapter 1.
I have been writing (and there's a lot to do):
This is not a book of answers. It’s a book full of questions – hard-won observations about enterprise-class software and its implementation. My intent is to put some new thinking in front of you, the reader. I would never claim to have invented many of the techniques that you will read about, but I do hope that this book is bringing some of them together for the first time.
The set of keywords I used professionally used to include reliability, encapsulation, normalization¸ and so forth. These are still great words! We need what they imply. But there are more words that are important – resilience, partial information, self-repair, aspects, deployment…enterprise-class software needs these.
I’ve spent 20 years following the winding evolution of programming. Like most programmers my age, I began my learning process playing with early microcomputers like the Apple II and the Atari 800. We have come a long way since POKE 101, 25. Computer languages have sparked and grown in an ecosphere full of needs. Assembly language still has its place today, but for most usages, there are easier and more appropriate dialects to work in. We’ve seen evolution from assembly to procedural languages, structural to object-oriented, and a myriad of functional and logical languages. Each has its place, its strengths and weaknesses.
As of 2003, most applications are constructed with an object-oriented basis. A number of mature methodologies exist for creating object models; they have produced some excellent results. One of the noted strengths of good object-oriented design is the ease with which change can be accommodated by altering the model. Change is at the heart of almost every software enterprise. Any software that is actually used has constantly evolving requirements.
We also find a growing trend towards special purposes interfaces. Rarely is software constructed in a vacuum: We code to the reality of the internet, to the sharing of data, and to many users. With such a myriad of information available, we often wish to focus the information available to a particular class of user. Specialized interfaces meet this need, deployed against particular platforms such as web browsers, Java, or a traditional client. We also see hybrid user interfaces gaining in popularity, where the best of traditional user interfaces design is coupled with web capabilities.
The only constant in all of this is change. New methodologies, generally referred to as Agile Development, have arisen to help manage change within the development process. Object-orientation in general is capable of doing a good job of handling change at the design level. We can refactor a design to handle new tasks, achieve better organization, expand on capabilities, and so forth. This represents a significant evolution from the early days of programming, where applications were excessively brittle and hard to evolve once written.
Modern object-oriented programming environments make it easier to deploy these applications. The various incarnations of the Java platform and of Microsoft’s .NET architecture are prime examples. These systems come with vast libraries of functionality, and many tools to effect the sophisticated deployment of applications. Java achieves a high level of portability as well.
For all this coding and development flexibility, we have made little progress in creating applications that are flexible at runtime. We need applications that can evolve, adapt, and even reason. We need to structure our information in a way that allows this complex processing to take place. At the same time, we don’t want to lose the structuring capability that a traditional object model provides. Object models still give us a good mechanism for dealing with some of the real world things we want to model, and achieving comprehension of the result.
What we want to create is a blend of conventional application design and techniques from the worlds of knowledge management and artificial intelligence. We want a common way of expressing this information, of transforming it, of querying it, and displaying it. Aha! You may think at this point…this sounds like XML! XML is an interesting way of expressing information, but is difficult to work with and not really appropriate for the kinds of applications we want to construct. XML has problems with scale and querying, as the specifications stand – very few tools can really deal with large amounts of XML (say, gigabytes worth).
There’s something more fundamental than XML’s tree of information.
Instead, we want to break things down to the basics. Statements (or facts) are at the base of the information pyramid in computer science. We can construct complex entities by binding together statements. Consider the following:
Ross has a last name of Judson.
Here we indicate that a person named Ross has a certain last name. If we decide to be a little more “computery” about this, we might rephrase as follows:
Ross has_last_name Judson.
There are three symbols involved here: Ross, has_last_name, and Judson. Symbols are extremely useful; we’ll be delving into them extensively later on, and talking about efficient ways to implement them and work with them. Our symbol-based example doesn’t quite express what we want, though. Let’s add the following:
Ross is_a Person.
Now we know that Ross has a last name of Judson, and Ross is a person. The examples given so far are in a triple form. Triples are very useful things; they form the basis of RDF, an important standard we will discuss later. A triple is a special form of a tuple.
These atomic statements can be used to construct just about anything. Consider:
Employee subclass_of Person.
Here we have declared that employee is a subclass of person. In a conventional object language, we might do something like the following:
class Person {
String name;
}
class Employee extends Person {
String department;
}
What I am trying to convey here is that for any given object-oriented program, we can encode both the data and metadata of that program in simple statements. There can be any number of such encodings; RDFS is one example that we will discuss at length later. Symbols and other identifiers can be used to join the statements together into a complex model. Concepts such as generalization, aggregation, and association can easily be mapped into statements. Once we have done this, we have created a fully fluid meta-model. A fluid meta-model easily copes with change at runtime by altering the statements that control the metamodel. This characteristic is key to the flexible applications we seek to develop.
Conventional object-oriented languages execute a compilation phase that creates a runtime version of the application’s metamodel. This information is sometimes available, at least in part, to programs written in the language. A Java program can, for example, elicit a good deal of information about itself from the runtime system. Most C++ programs cannot, though – the C++ runtime type standard is primitive at best. A C++ program is not privy to the information the compiler has about it.
Instead of creating an object model, we create a knowledge base. We structure the knowledge base with statements that correspond to object-oriented concepts. Knowledge bases are highly amenable to logic processing with systems such as Prolog (for backward chaining) and CLIPS (for forward chaining). Using these tools we can easily react to change in our knowledge base (run rules), and also perform a significant level of ad-hoc querying. Because we are expressing our data and metamodel in the same way, we can easily react to changes in our metamodel, as well as changes in the data.
As important as a flexible metamodel is, it is equally important to be able to distribute that knowledge and work with it in applications that span large areas, geographically or otherwise. When we condense our metamodel and model down into statements, we only need to develop mechanisms that can distribute the statement information. Once we have successfully done that, all our knowledge and metamodels are easily transferred. Distributed propagation of a complex object model is very difficult and error prone. Numerous point mechanisms are usually constructed and the “hidden corners” are what bites back in the deployment of such a system. We choose instead to propagate our statements, and the edits to those statements. This is a substantially easier problem, as a quick recitation of Linda tuplespace history will show (covered later).
Finally, straightforward mappings of knowledge to XML are available (RDF is one such mapping). Once the knowledge base has been queried and transformed into XML, a whole host of capabilities become available for generating pages and other kinds of displays that can be presented to a user (in XUL, HTML or XHTML form).
We do not need to transform into XML before presenting to a user, though – we can create widgets that are able, through adapters, to interoperate directly with the knowledge base the application contains. When a user works with the application, we make appropriate changes to the knowledge base. Our rules engines and other user interfaces react appropriately.
Ted Nelson (of hypertext fame) said this:
``Intertwingularity is not generally acknowledged -- people keep pretending they can make things deeply hierarchical, categorizable and sequential when they can't. Everything is deeply intertwingled.''
I can’t vouch for the presence of the word intertwingularity in a dictionary, but it should be there! What I am trying to do in this book is describe are some of the ways of creating this intertwingularity and working with it to create real applications. At the least, I hope to change the way you view object models, and open your mind to the simpler and more powerful techniques that pure knowledge representations offer.
In other words, everything you know is wrong. Well, not everything, but enough of it that I hope you will have at least a few lightning bolts strike in your mind as you engage it on what follows…I know that as I learned of these ideas for the first time a few brain cells were singed.
2:13:08 AM
|