Client not receiving expected errors

Aug 28, 2014 at 10:31 PM
Keith --

We are progressing with our install of WSP; now in a staging environment and have gotten to the point of the plug-pulling party. It didn't respond in the way we had expected, and are wondering what you would expect and if you had any advice.

In this environment, we have 2 hubs and a number of nodes. An example of an unexpected result was in our test on what occurs when WSPEventRouter is shut down on a node:
1. Shut down node 1 WSPEventRouter (all shutdowns occur via task manager end process as per your suggestion in other discussion).

2. Start test program on Node 1; which can subscribe and publish.

3. Try subscribing.  This apparently worked as far as the test program was concerned.  We were expecting an error here.
    The client didn't then receive any messages from other machines (obviously)

4. Try publishing.  Again, the test program thought it was publishing successfully; we were expecting an error.
    The client did receive it's own self-published messages, but it didn't push them out to other machines.
Other tests were similar:
  • when a hub was shutdown, the nodes that were already connected to it did not recover by switching over to the other hub. Those nodes essentially stopped sending/receiving messages, but no errors were thrown.
  • when all hubs were shutdown, we did not get errors when starting wsp on a node.
FYI: We are using WSP to control aggressive caching in order to limit hits to a database for a system which is mostly but not entirely readonly. When a cached record is updated on one server, it uses WSP to tell other servers about the change.
Coordinator
Aug 28, 2014 at 11:06 PM
What you are seeing is the expected behavior. When you shutdown the wsprouter, you must have had an app running which was publishing or subscribing to events. This held the shared memory open. When you're in this situation, apps can start the publishmgr/subscriptionmgr and they can publish and subscribe to events. The only catch is, they can only do this until shared memory becomes full. At that point, all event activity will stop until wsprouter starts back up. When it does, it will connect to the other hubs and start processing the events which are in shared memory.

Had you published more events such that shared memory filled, your app would have gotten an error trying to publish events.

When the node loses connection to a hub, it will randomly connect to a hub in its group and I think it tries every 10 seconds. So if it didn't connect to a different hub in the same group within a reasonable amount of time, this is something which I haven't seen. If no hubs are up, wsprouter on the nodes will start just fine and apps can publish and subscribe. The node will be in a continuous state of trying to connect to a hub in its group.
Aug 29, 2014 at 2:39 AM
Thanks again for your fast response. We will retest with that in mind.

A followup, though: if keeping the subscriber open was keeping the shared memory queue alive, shouldn't restarting the service have then pushed the queued messages out? We did not see that happening.
Coordinator
Aug 29, 2014 at 7:15 AM
The behavior will be different depending upon if the hub lost connection to other hubs or was killed and restarted. If it keeps running and loses connection then it knows there was a subscription from the other hub, it already has a queue for the other hub, and it just keeps putting events in the other queue. When connection is reestablished then the events are sent.

If the hub is restarting then it does not have any knowledge of subscriptions from other hubs/nodes. In this case, though it would process the events that had been put in shared memory, unless it first establishes connection to the other hubs/nodes, it would do nothing with those events.

I never saw this to be an issue with servers in production since they usually were until they were taken out of rotation to perform maintenance. It this were a real issue for you, you could put a 30 second sleep in the Listener startup. This would allow time for connections to be established prior to processing the events that would exist in shared memory.