Requirement
"A communications sub-system is required
to provide sequenced bi-directional data transfer
between the Central Server of a Trading System
and each of the outlying Traders' workstations."
In this Service, the unit of transfer is the Message.
In order to limit the bandwidth needed to provide the required service,
common information may be sent to all Traders by means of multicast messages via UDP.
The Unreliable Broadcast Service was developed in the clear understanding
that UDP communications are inherently unreliable:
therefore a user-level protocol layer is needed to help meet the Project Requirements of
guaranteed delivery and fairness to all Clients.
The sub-system allows access to registered users & monitors the integrity of connections.
Compliance People - Please note:
"If a communications line is poor, then you cannot guarantee delivery; and fairness is totally unachievable".
Each Client is autonomous, effectively behaving as if there were no other Client.
The approach taken is to use UDP multicasts from UB server to all UB Clients for public messages
and to use TCP unicasts from Server to a Client or Client to Server for "private" messages.
This protocol automatically guarantees delivery & sequency for unicasts in each direction.
At least, up to the limitations of TCP/IP stream responsiveness.
UDP multicasts are unreliable.
Unicast messages are stored (in a transmit Q) until they are sent.
If a Client realises that it has missed one or more messages
it should request retransmission of the appropriate message(s).
Periodically, each Client sends a HeartBeat message to UBServer
to indicate the latest UDP message received (with no prior missing messages).
As the ACK information is not used to request retransmission,
the UB Server periodically issues a HeartBeat
to allow a dozy Client another chance to realise that it has missed something.
It is necessary to re-synchronise the arrival of unicast & multicast messages at each UB Client.
A Ubc will initiate communications by performing a TCP connect to one of its Ubs host machines.
It is possible for messages to be received by an unauthorised host on the network.
A Client may notice a WindBack which was requested by another Client.
This should be ignored, unless it contains any message(s) needed by this Client.
Several Clients may share a WAN comms line.
In this case the bandwidth available should not be
exceeded by the aggregate data volume sent, in either direction.
UB Server/Clients must bolster-up the delivery & sequency performance of UDP.
Messages may be undelivered or delivered twice (or more) or delivered out-of-sequence.
Therefore "Sequence Numbers" are added to UDP messages by the Ubs
so that the receiving Ubc's can re-establish sequency,
ignore repetitions & request re-transmission of any messages not received.
When the write() sys-call returns satisfactorily,
the message buffer is free()'d
(because TCP guarantees delivery, unless the channel dies completely).
Multicast messages are stored until all connected Clients have received them.
Of course, if any Client gives up and disconnects we no longer need to store any messages for that Client.
Received messages are stored in a receive Q until they can be forwarded appropriately.
This is the Negative AcKnowledgement part of the protocol.
When the UBServer receives a NAK it initiates a re-transmission procedure.
UB Client should re-request re-transmission if a WindBack is not noticed
within a reasonable time after each request.
This is the positive ACKnowledgement part of the protocol.
UB Server uses this information to free space in the Multicast Q (Purge operation).
If any Client does not receive the Server HeartBeats,
it probably would not receive re-transmitted data
& so eventually it will either be disconnected,
because the TCP stream fails or it will be tailed off by the MultQ filling up.
For this, the Sync part of the protocol is used.
When the UB Server notices that an incoming message from CS is of a different mode to the previous message from CS,
the UB Server sends a Sync message on the current stream before sending the new message.
If the current mode is Unicast, then the message must be sent to all connected Clients.
At the UB Client,
receipt of Sync message indicates that the WEP output stream should switch to the other input mode and,
if necessary, wait for further messages on that stream.
A Ubs which knows it is a Slave should direct the Client to the current Master.
Minimal authentication is performed by the Ubs - merely checking the IP address of the caller.
An invader should be disconnected.
It is left to the Trading system to perform, say, a DNA check on the Trader.
This would provide a data-feed from the Trading System.
If this is a serious problem, the message data may be encrypted/decrypted.
Alternatively, or additionally, part of the Multicast Receive Address may be specified
pseudo-randomly by the Ubs.
However, the Multicast Receive Address should not be changed during Operation an
certainly not during Trading.
Suggestion: use byte-2 of IP address as Trading System Identifier,
byte-3 as randomly chosen at Ubs startup time
and byte-4 as low byte of Ubs host IP address.
Note that only the low 23 bits of IP address are mapped to Ethernet multicast address.
typedef struct Client // used for connected Clients
{
struct Client *next; // next Client in TxTimer list
u_long in_addr; // inet address (in host form)
u_short fd; // file descriptor used for this Client
u_short delta; // incremental time until TxTimer due
u_short quota; // number of Clients sharing BW
long rewind; // Multicast seq_no to replay from for this Client, or (-1)
u_short state; // sub-state for this fd/stream
u_short diag; // per-stream diagnostic mask
qio r; // read parameters
qio w; // write parameters
rcvd recv; // recv Client list header
} Client;
Tx state on any stream is identified by the combination of poll_b & poll_c for that stream:
| State | poll_b | poll_c | changedown | change up |
| Idle | clr | set | Rx new msg | - |
| Prodded | set | set | Tx selected | Tx q empty |
| Sending | set | clr | msg too big | end of msg sent |
| Choked | clr | clr | - | choke timeout |
Only need to Prod if Idle; Must not Prod if Choked
May re-Prod if Prodded or Sending without ill effects.
Use the Prod() macro to achieve this:
# define Prod( fd ) \
do \
{ \
if ( FD_ISSET ( fd, &poll_c )) \
FD_SET ( fd, &poll_b ); \
} while (0)
The message data passed are TSI messages with the following abbreviated structure:
struct TSI_MASSAGE
{
u_short tsi_msg_no; // normally known as a "message type"
u_short tsi_msg_len; // length of message (including header)
u_short tsi_user_no; // user number
u_char tsi_system_id; // Destination system id
u_long tsi_inet_addr; // inet address of recipient
...
};
Comms messages conform to the following structure:
typedef struct msg
{
u_short type; // comms message type
u_short len; // message length (including header)
u_short seqno; // last-used m/c sequence number
u_char msg[ 0 ]; // buffer for unmolested TSI message
} msg;
All messages are read in two chunks:
Message Types
0 *** Normal Normal TSI data message. 1 ** Ctrl UBS to all UBC's (CS up/down, New UBS); & UBC heartbeat to UBS. 2 * * Re-tran Re-transmission request (or reply to one Client). 3 ** Sync Mode change between unicast & multicast from CS. 4 * SyncInit Initialise synchronisation at UBC. Pro tut // Protocol: UDP or TCP stream. Fr: ssc // From... To: ccs // To...
Multicast 0 Normal TSI message to all WEP's. 1 Control message to all UBC's. HeartBeats CS up/down. New UBS. 3 Sync to indicate change mode to unicasts from UBS. Unicast 0 Normal private TSI's from CS to this WEP. 2 Re-tran for this UBC only; no Sync/Ack required. 3 Sync to indicate change mode to multicasts from UBS. 4 SyncInit to start synchronising operations at Ubc
Unicast 0 Normal TSI message from WEP to CS. 2 Retran request for missed m/c message.
Heartbeats are sent by all connected Ubc's to Ubs at regular intervals (5s default). Hearts are prodded by HeartFd timeout. Heartbeat message is a Ctrl message with length of 6 (i.e. just the comms header). The latest m/c seq_no is included in the comms header.
Ubs sends heartbeats, to keep every Ubc up-to-date. Ubs heartbeats are not strictly necessary for this protocol; however, they'll keep Tom happy.
When Ubc detects a missed multicast message, it requests a Retran & sets a timer (RetranFd). If a WindBack is not noticed before the Retran timer matures, then another Retran is requested and the timer is again primed. This caters for the case of a retran not being received (cos the WAN was still poorly) and no further multicasts have been sent since.
Unicast messages are self-sequenced by TCP. This completely achieves sequency for Ubc to Ubs direction. Also, any unicasts for a given Ubc will appear in sequence.
To ensure the correct interleaving of m/c & u/c messages, unicast messages carry the last m/c sequence number.
Retain TCPq messages until the appropriate m/c Sync has been established in RxRemQ.
Ubs should send a m/c Sync message to indicate that next message from CS is a unicast. This prevents a Ubc from zooming past its unicast which TCP is having trouble pushing through.
When CS next sends a multicast, the Ubs should indicate the mode change by sending a unicast Sync on all connected Ubc's first.
This will take about 10ms of circuit bw (in parallel: any number of Ubc's). But it may take .5s to 3s to tell 2000 Ubc's (sequential TCP access to router via ethernet).
Sync message is essentially like "Over" in a R/T call: i.e. when a change of mode is necessary a Sync message is sent on the old mode.
On the TCP stream socket, unicasts following a Sync on TCP cannot arrive before the Sync. Thus we can guarantee one or more multicasts appear in the correct sequence between unicasts.
On the other hand, embedded unicasts in multicasts are guaranteed sequency because the Sync on multicast is allocated a seq_no & the following multicasts are not forwarded to WEP until all previous messages have been received.
Re-transmission request from Ubc to Ubs includes the seq_no after the missing message(s). This is sent in network byte order as a u_short in msg[ 0 ].
28 byte UDP/IP header
| Range of first byte | 1-126 | 128-191 | 192-223 | 225-239 |
| Class | A | B | C | D |
| No of network bits | 7 | 14 | 21 | 28 |
| No of host bits | 24 | 16 | 8 | 0 |
| Bits in first byte | 0xxxxxxx | 10xxxxxx | 110xxxxx | 1110xxxx |