I am going to implementing a publish/subscribe mechanisms for our recent Web-based research prototype. Therefore, I am interested what others already thought about this pattern. Looking for the  pub/sub paradigm, Google returns the corresponding Wikipedia article on the first place. Screening the article, there are a few starting points worth to be remembered:
- Pub/sub is a sibling of the message queue paradigm
- Subscribers typically receive only a sub-set of the total messages published; selecting messages for reception is called filtering
- In topic-based systems messages are published to topics or named logical channels
- In content-based systems, attributes or the content of messages must match constraints defined by the subscriber
- In hybrid systems, publishers post messages to topics; subscribers only receive content-based subscriptions on a particular topic
- Brokers might be used to maintain subscriptions, store and forward messages and perform the filtering
- The first time a pub/sub mechanism was described was in Exploiting Virtual Synchrony in Distributed Systems [pdf] by K. Birman and T. Joseph.
- Publishers and subscribers remain ignorant of the system topology; publishers don’t know about the existence of any subscribers; this allows to create a loosely-coupled system
- Scalability for pub/sub under high load in large deployments currently remains a research question
Looking for more information on the Web I cam across the Publish/Subscribe integration pattern from Microsoft’s patterns & practices. The problem statement seems reasonable:
- How can an application in an integration architecture only send messages to the applications that are interested in receiving the messages without knowing the identities of the receivers?
In the context description, the following communication infrastructures are mentioned:
- Bus
- Broker
- Point-to-Point
In contrast to to the Wikipedia article, we learn about three different types of mechanisms:
- List-based Publish/Subscribe
- Broadcast-based Publish/Subscribe
- Content-based Publish/Subscribe
Having a closer look, the List-based Publish/Subscribe mechanism maintains a list of subscribers, similar to the Observer pattern. Attach() and Detach() operations allow to modify the list of subscribers while a Notify() operation is used to send updates to the subscribers. Seems to be well suited if you have one publisher and many subscribers, but does not look suitable if subscribers watch many subjects. The core functionality of List-based Publish/Subscribe can thus be identified as
- The publisher maintains a list of all subscribers
- The publisher notifies each one individually
If we understand subscription lists as named channels, the List-based Publish/Subscribe represents a topic-based subscription mechanism.
The Broadcast-bases Publish/Subscribe mechanism simply dumps messages to the local are network. Each subscriber is responsible for listening and inspecting the subject line of the message. If the subject line matches, the subscriber processes the message. This approach seems to be a optimum in decoupling the system. Clearly, this can be identified as some kind of a topic-based system. If the publisher needs to know about subscribers to a particular topic, a hybrid approach can be chosen, where a additional process requests information about interested subscribers. To establish the hybrid system, however, every subscriber must respond to the request. Another name mentioned in the article is publish/subscribe channel with reactive filtering due to the responsibility of each subscriber to filter the messages on its own.
In difference to the Wikipedia article, in this article the author differentiates between topic-based and content-based mechanisms. In this context, both, List-based Publish/Subscribe and Broadcast-based Publish/Subscribe are understood as topic-based mechanisms.
While topics are considered as a pre-defined set of subjects, each message in a content-based system can be understood as a  single dynamic logical channel. This idea was proposed in The Evolution of Publish/Subscribe Communication Systems [pdf]. We will come back to this paper later.
Where to implement the pub/sub functionality depends on your underlying communication structure:
- Bus: Implement the subscription mechanism in the bus interface
- Broker: Implement the mechanism through subscription lists to the broker
- Point-to-Point: Implement the mechanism through subscription lists in the publisher
The article also differentiates between fixed subscriptions and dynamic subscriptions. While applications cannot control their subscriptions, dynamic subscriptions allow to modify subscriptions through certain control messages.
Some more keywords are listed in the article:
- Initial subscription: How communicate subscribers their subscription to the communication infrastructure when they are initially added
- Wildcard subscription: If supported, subscribers can subscribe to multiple topics through one subscription
- Topic discovery: How can subscribers discover available topics if dynamic subscriptions are supported
How to implement a dynamic list-based publish-subscribe pattern is illustrated in the MSDN library.
I found also some article about the way EDA (event-driven architecture) extends SOA including a nice depiction of the idea behind EDA. There, EDA is proposed for a publish/subscribe mechanism rather than a command/control mechanism as provided by SOA. EDA seems especially suitable when you are facing
- Workflow type of processes and
- Processes that cross functional organizations borders.
It is also mentions that some good support for the EDA pub/sub pattern would be a declarative model.
I also came along this article giving a brief overview of Publish-Subscribe Channel pattern from the book Enterprise Integration Patterns by G. Hohpe and B. Woolf. It basically tells that the channel delivers a copy of the message to each of the output channels where each output channels has only one subscriber. After the message is consumed, it is removed from the channel.
Having a look into Exploiting Virtual Synchrony in Distributed Systems, mentioned in the beginning, gives you an insight into several issues in distributed systems. One interesting fact to bear in mind is about synchrony vs. asynchrony. If your publisher requires responses this could be 0, 1 or n for n subscribers. If you expect 0 responses you actually run a asynchronous system. For so-called process groups an interface is provided, allowing to join or leave a group but also to receive updates on the group memberships. Sounds similar? The Observer pattern, I see here. In the described news service, one already realizes the common concepts described before: “Each subscriber receives a copy of any message having a ‘subject’ for which it has enrolled on the order they were posted.“. The overall description is rather abstract, but gives a interesting insight into the development of the mechanism.
Afterwards I ended up directly with The Evolution of Publish/Subscribe Communication Systems, providing a well written summary of the publish/subscribe paradigm. Especially the decoupling fact has been structured into
- Anonymity: parties do not need to know each other,
- Decoupling in time: interacting parties do not need to be up at the same time,
- Decoupling in flow: sending and receipt does not block parties.
Again, we see content-based and topic-based mechanisms which makes me think twice of the classification proposed in the Wikipedia article. Back to the paper, the authors state that content-based pub/sub systems cannot rely on
- Centralized architectures based on
- Network level solutions.
A single server simply cannot deal with a high number of subscribers and the limited number of IP multicast addresses does not fit the large number of logical channels. Rather they propose a application-level realization through a set of event-brokers, exchanging information on a point-to-point basis. For broker interaction the following issues are pointed out:
- Subscription and information routing: I.e. creating a mapping between subscriptions and subscribers and the matching and forwarding  of operations.
Maybe its worth to mention, that both papers address communication systems on network and overlay network infrastructure-levels than on application-level. However, the concepts are the same.
Baldoni comes up with the concept of ad-hoc subscription languages, compared to SQL for databases. This, however, requires a-priori knowledge of the structure of the information space. At least, the idea of selecting subscriptions or topics using a query language sounds quite appealing. As future research direction, a potential formal specification of the subscription service, provided by a pub/sub system is proposed.
- Notification semantics would provide the conditions if, when and how many times an information is delivered to a subscriber. This is pointed out as a mandatory feature if the pub/sub mechanisms would be applied to mission-critical or dependable applications.
- Publishing semantics should allow to define the lifetime of information. I an pub/sub-based system, the subscriber has no rights to remove elements from a queue. To avoid overflow, the information must be removed from the queue. This, however, is clearly publisher dependent.
Another often cited paper I have a look at is The Many Faces of Publish/Subscribe [pdf]. Similar to the paper before, the three decoupling dimensions time, space and synchronization are considered to extract the common concepts of different variants of the pub/sub paradigm. We learn that individual point-to-point and synchronous communication leads to rigid and static applications. Three types of pub/sub mechanisms are introduced:
- Topic-based
- Content-based
- Type-based
The basic terms for sending and receiving messages through a software bus/event used here are
- Event for the message to be delivered and
- Notification for the act of delivering this event.
The core system should provide a
- Event notification service providing
- Storage and management for subscriptions and
- Efficient delivering of events.
The events used here are called subscribe(), unsubscribe() and publish() – not that different from the ones we know from the Observer pattern. Some new operation is called advertise() to advertise the nature of future events of an publisher. That way, the event service can adjust to the expected event flows and subscribers can learn when new types of information come available. We also learn about alternative communication paradigms here:
- Message passing is just about sending and receiving messages through communication channels. For the sender, the process is asynchronous, while the receiver must act synchronous. Both parties must be active at the same time and the sender must know its receivers. Consequently, the parties a coupled both, in space and time.
- RPC (mentioned the first time in Implementing Remote Procedure Calls [pdf] and A Survey of Remote Procedure Calls [pdf]) makes remote interactions appear the same way as local ones. Here we have a strong space and time coupling since the the invoking object hold a reference to the invoked one. One attempt for removing synchrony was e.g. applied by CORBA using one-way modifiers. In this context, the authors mention the expression fire-and-forget.
- Notifications allow a decoupling of synchronization by performing two independent invocations. The first (sent from client to server) provides a callback reference used by the server to notify the client about changes. This is mentioned to be a limited version of pub/sub mechanism and directly related to the Observer pattern we already learned before.
- Shared spaces are definitely not what I am going to use, however it is interesting to read the summary. All communication between parties takes place using tuple spaces (e.g. known from Linda) using three operations in(), out() and read(). This approach is both, time and space decoupled but remains synchronized and is thus somewhat limited in scalability.
- Message queuing often uses some pub/sub mechanism.In difference to tuple spaces, message queues provide some transactional, timing and ordering guarantees. In difference to the pub/sub mechanism we learned before, messages are concurrently pulled by the consumer.
For the three pub/sub forms we fi
nd some more detailed information:
- Topic-based pub/sub is based on the notion of topics or subjects, extending the notion of channels. Subscribers can subscribe topics, identified by keywords and are related to the concept of groups and group communication. When you think now of the paper we discussed before: The Isis system, described in Exploiting Virtual Synchrony in Distributed Systems is also mentioned as the one introducing the pub/sub concept the first time. Some nice expression I read in the related section was the concept of event space. In topic-based systems, the event space can be addressed hierarchically, while groups usually offer only a flat structure.
- Content-based pub/sub (aka property-based) should introduce a subscription scheme based on the particular event. Some properties events to be used for structuring could be: internal attributes of data structures or meta-data associated to events. Here again, we read about subscription languages but more in detail about filters on form of name-value pairs combined with simple operators (=, <, >, <=, >=) resulting in so-called subscription patterns. 
- Type-based pub/sub is meant to replace the name-based classification of topics by a scheme according to the type of events.
Having a closer look at events we learn about the classification into messages (delivery through a single operation e.g. notify) and invocations (event triggers some specific operation on the subscriber). Furthermore, invocations are directed to a certain kind of objects and provide some well-known semantics. You can also differentiate between on-way invocations (COM+ or CORBA Event Service) and those requiring some return value.
We see different kinds of architectures there:
- Centralized architectures are using a centralized component for storing and forwarding events. Consequently, this component is a single source of failure.
- Distributed architectures omit this centralized component and are well suited for efficient delivery of messages.
- Hybrid approaches provide a decentralized notification and storage service.
Dissemination of messages is also discussed but relies a lot on the underlying concepts. Efficient multicast in content-based pub/sub systems, however, is pointed out to be still an issue.
Some more points to be considered are related to QoS:
- Since the publisher does not know about when and if the sent messages are processed some mechanism is required to ensure persistence of the information.
- More QoS features deal with priorities (only relevant for messages in transit) and transaction if multiple messages are combined to atomic operations.
- Reliability is finally pointed out as one of the most important features in distributed information systems.
Bearing this information in mind, I now have a look into the Publish-Subscribe Notification for Web Services [pdf] whitepaper as part of the WS-Notification family. The document deals with the notification pattern for notification-based or event-driven systems in the Web service context. Here we see the same pattern as learned before:
- Subscribers  register dynamically with the publisher
- Multiple subscribers can register with a publisher
- The distributing Web service sends one separate copy to each of the subscriber
The spec defines (among others) some interesting requirements:
- Support of resource-constrained devices
- Support both, direct and brokered notification
- Transformation and aggregation of brokered topics
- Publishing of runtime meta-data (for discovering available elements)
- Allow federation of brokers
In the terminology section we find another interesting statement saying “a Subscription is a WS-Resource” where a WS-Resource is defined as follows:
“A Web service having an association with a stateful resource, where the stateful resource is defined by a resource properties document type and the association is expressed by annotating a WSDL portType with the type definition of the resource properties document”
Got it? At least let us think of subscriptions as resources. This idea lines up well with my current research.
Also the fact of hierarchically structured topics is considered: Especially topic trees are hierarchically structured topics and topic spaces are a set of topic trees grouped together into the same namespace (obviously due to administrative reasons).
It is actually the first document dealing with security aspect, listing the following classes of attacks:
- Message alteration
- Message disclosure aka confidentiality
- Key integrity
- Authentication
- Accountability, i.e. a function of the type and string of the key/algorithm used
- Availability, e.g. DoS attacks
- Replay of messages
Finally, I found the Distributed Publish/Subscribe Event System on CodePlex: In the whitepaper, the various types of pub/sub are characterized by
- Coupling
- Brokered subscriptions
- Persistent vs. transient subscriptions
- Delivery of events and
- Routing.
The Web Solutions Platform (WSP) is designed as a distributed pub/sub system and works both, intra-machine and inter-machine. Applications here subscribe to event types, so it looks like an event-based pub/sub system. The document provides some more descriptions on the system itself but no more  information on publish/subscribe mechanism in general.
That’s a lot of stuff and now I have to spend some time in reflecting all these information for my design.