ZeroMQ Sockets

27 Aug 2011 16:20
Tags: zeromq

ZeroMQ provides a scaleability layer between application endpoints. It implements several different message topologies, provided by a series of Socket Types, each with their own well-defined function and behaviour. This article summaries the behaviour and function of each Socket Type, and the most typical Socket combinations, valid in ZeroMQ 2.1.x.

ZeroMQ Socket Types

Socket Topology Send/Receive Incoming Routing / Msg Action Outgoing Routing / Msg Action HWM Action
REQ Request-Reply Send, Receive, Send, … Last peer /
Removes empty part
Load balanced /
Prepends empty part, queue unsent msgs
Block
REP Request-Reply Receive, Send, Receive, … Fair-queued /
Retains message parts up to 1st empty part
Last peer /
Prepends retained parts, queue unsent msgs
Drop
DEALER Request-Reply Unrestricted Fair-queued /
No changes made
Load-balanced /
No changes made, queue unsent msgs
Block
ROUTER Request-Reply Unrestricted Fair-queued /
Prepends reply address to message
Addressed peer /
Strips 1st message part, drops message if identity not known or peer available
Drop
PUB Pub-Sub Send only n/a Fan out /
No changes made, drops message if no subscribers
Drop
SUB Pub-Sub Receive only Fair-queued, optional filtering /
No changes made
n/a Drop
PUSH Pipeline Send only n/a Load balanced /
No changes made, queue unsent msgs
Block
PULL Pipeline Receive only Fair-queued /
No changes made
n/a n/a
PAIR Exclusive Pair Unrestricted n/a Queue unsent msgs Block

Where:

  • Fair-queued means the socket gets input from all sockets connected to it in turn.
  • Load-balanced means the socket sends messages to all sockets connected to it in turn.
  • Last peer means the socket sends the message to the last socket that sent it a message
  • Fan-out means message is sent to all connected peers simultaneously

ZeroMQ Socket Combinations

ZeroMQ only supports the following Socket combinations in a connect-bind pair:

Combination Examples from the ZeroMQ ZGuide [Bibliography item ZGuide not found.]
PUB and SUB Uni-directional message broadcast from Publisher to Subscriber(s)
Publish-Subscribe Proxy Server
Slow Subscribers (Suicidal Snail Pattern)
High-speed Subscribers (Black Box Pattern)
Reliable Pub-Sub shared key-value system (Clone Pattern)
REQ and REP Sequential message request-reply between two Endpoints (e.g. client to server)
REQ and ROUTER Frontend broker connection to multiple clients
Req-rep broker connection to multiple workers using custom routing (LRU)
DEALER and REP Simple synchronous (one request , one reply) broker connection to one or more separate worker processes using load-balancing DEALER socket
DEALER and ROUTER Custom 1 .. n request routing from a single client to multiple servers
Asynchronous n..1 request routing to a single server
DEALER and DEALER Backend asynchronous (one request, many replies) broker connection to multiple workers
ROUTER and ROUTER Asynchronous brokerless reliability (Freelance Pattern)
PUSH and PULL Task ventilator connected to a pool of workers that forward results to a task sink
Parallel Pipeline with kill signalling
Pipeline streamer device
PAIR and PAIR Signalling between Threads
[[bibliography title="Further Reading"]]
ZGuide
[http://zguide.zeromq.org/]
Concepts
[http://www.250bpm.com/concepts]
API
[http://api.zeromq.org/]

[[/bibliography]] - Comments: 0


——

Messaging #5 - Duplicate Messages

27 Aug 2011 11:07
Tags:

This is the 5th in a series of articles about SOA (Service-Orientated Architecture) messaging systems.

Duplicate Messages

Every Component should expect to receive duplicate messages from the message system at any time. Duplicates can arise from:

  • A technical failure in a publishing Component, or a delivery failure and attempted recovery event by the messaging infrastructure itself.
  • An external (with respect to the system) message source (e.g. an external FIX endpoint at a customer), outside the control of the receiving System.

No messaging architecture can guarantee duplicate - free message delivery under any circumstances, particularly the kinds of architectures required for low-latency high-volume distributed processing. Duplicate message handling must be considered as an integral part of a
Service Component’s design.

Duplicate messages can be either:

  • Explicitly flagged as possible duplicate messages. For example, 29West sets a flag on a message if it has been delivered by the persistence store (which acts as a proxy to the original publisher). Another potential example is during a message replay during a FIX session re-connection event, where previously sent messages are re-sent by the FIX client.
  • Unknown duplicates. The message initially appears to be a unique, new event. Receipt and subsequent detection of such messages usually indicates a major failure in the messaging layer somewhere.

Functional and technical analysis should define the strategies for detecting and handling confirmed and suspected duplicates. Care should be taken on balancing the requirements for low-latency processing (e.g. time to update a cached currency position) with the required accuracy and integrity of the position data itself. Implementation strategies may include:

  • Use of UUIDs or other unambiguous data matching techniques
  • Heuristic pattern matching of message data with existing records
  • Optimistic detection, whereby unflagged messages are initially assumed to be unique, thus ensuring fastest – possible processing latency, followed by subsequent duplicate detection that may result in the effect of the message being unwound in the (assumed rare) cases that a a duplicate has been received. - Comments: 0

——

Messaging #4 - Resiliency Groups

20 Aug 2011 14:54
Tags: messaging

This is the fourth article in a series about designing messaging architectures.

Resiliency Groups

To understand what we mean by “resiliency” or its cousin “reliability”, we need to look at their alter ego: “failure”. We can consider failure originating from:

  • Application code can throw unhandled exceptions, crash, run out of memory or other resources, get into a deadlock or inifinite loop and therefore stop responding to input. This is the most likely, as the application code is less likely to be shared across other systems.
  • System code, such as the messaging middleware, can die, or even run out of memory (e.g. caused by slow subscribers). This should happen less than in application code, if only because other systems will probably also using the same middleware and therefore find and identify bugs which can be applied to other deployments.
  • Message queues can overflow
  • Network transmission can fail temporarily, causing intermittent message loss.
  • Hardware can also fail: servers, network switches to entire data centers.

For the purposes of this article, we will look at how to mitigate the failure of an individual Service Component instance. A key insight is that consideration of the strategies required to mitigate each of the different failure modes listed above must be included in the design from day 1. Wherever possible, the implementation of messaging resiliency design should be moved away from application code into the “Bus” layer or the middleware system; so that a common set of strategies can be applied across all Components.

Components in a SOA architecture are defined by the Service(s) that each type provides. If we assume that the Services are only provided by Component instances of the same type (i.e. with the same application code), the resiliency strategy for those instances needs to be considered. Additionally, scalability needs to be factored into the design; how many instances of the same Component are required in the local / global system to fulfil the performance requirements for the Service.

One approach is to define, by configuration (see above), one or more Resiliency Groups for each Component type. A Resiliency Group will ensure that exactly one instance of the Component within the Group will receive and process a given message sent to the Service on the Logical Message Bus, known as the “Primary” (P). Backup instances (see below) are denoted as Secondary (S) and maybe also Disaster Recovery (D) instances that run on dedicated DR – ready infrastructure.

Scalability is provided by splitting the message workload across multiple Resiliency Groups (e.g. may be located in different WAN regions). The most efficient way to achieve this is likely to be using topic subscription filtering, based on information in the topic name itself, enabling the horizontal scaling to be mediated by the messaging infrastructure and Logical Message Bus, rather than in application code.

Primary Instances within the Group will receive messages from the Logical Message Bus (if more than one Primary, messaging layer can route to different Primary instances based on an appropriate routing logic: LRU, Round-Robin etc)

Secondary & DR Instances can run in hot standby (receiving messages, processing them to update internal state, but not generating any output or effect), warm standby (initialized and running, but not receiving messages), or cold standby (instance not running unless required).

Primary and Secondary / DR instances can then be composed into Resiliency Groups; with the Group behaviour mediated by the software layer forming the Logical Message Bus on top of the messaging infrastructure. It would use heartbeating or similar to observe component failures and ensure the appropriate role change occurs to a backup instance in the Group:

  • PPD (or PPPPD !). Two or more Primary instances, with a load balancing function to distribute the message load across the active Primary instances. The DR Instance would only become active and start processing messages if no Primary instances were running in the Group.
  • PSD. One Primary instance processes all messages; the S and D instances are running in warm standby. If the Primary instance should fail, the Secondary instance will detect and take over the Primary role. If the Secondary instance should fail, the DR instance would take over. If the previously failed instance should restart successfully, it would join as a new Secondary instance.

The role handover used in PSD depends if the Service(s) provided by the Component are stateless or not; if stateless then the new Primary instance can go straight ahead and process the next message. If message processing is stateful, the new instance must synchronize its internal state (e.g. from a persistent store or other Service) before commencing to process the message stream. - Comments: 0


——

Messaging #3 - Logical Data Bus

07 Aug 2011 22:19
Tags: messaging

This is the 3rd in the series of messaging - related articles.

Build a Logical Message Bus

The messaging infrastructure supports the creation of a messaging topology and provides messaging – level services (guaranteed delivery, messaging resilience, WAN bridging etc). However, there is still a significant functional gap between that and the application domain-level code that developers and the business wish to spend most attention on.

A shared common layer can be built that abstracts the messaging infrastructure from Service Component application code, and provides further services:

  • Topology configuration (e.g. converting message topology configuration into behaviour and state).
  • Topic building (e.g. using a Builder Pattern implementation)
  • Component resiliency (warm standby, round - robin)
  • Inter-component heartbeating
  • Test injection / mocking for unit testing
  • Message performance monitoring
  • Presents a messaging API to application code that abstracts the implementation of the messaging layer, allowing possibility of heterogeneous messaging architectures to be utilized to extract maximum performance.
  • Implements the messaging-level protocol(s) used for message creation and bundling (the actual wire byte protocol would be determined by the messaging infrastructure itself).
  • Provides a common point of control for messaging functionality across all components.

The combination of these services effectively creates a logical message bus (although the underlying messaging topology may be one or more of many different types and implementations, and not formally recognized as a “bus” at all).

This shared code becomes a critical part of meeting the System’s performance objectives, and, as such, requires the oversight of senior technical development personnel in its design and development. - Comments: 0


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License