Minitus Delay

Nov 29, 2012 at 12:15 AM

Hi, Team,

We are using WSP to publish the message from server (orginal role) to multiple clients (client role). we noticed sometimes the message was delayed for several minutes.  We have > 10 event types defined, publish them based on different frequencies, do you know what's the cause?

Thanks

Coordinator
Nov 29, 2012 at 4:37 AM

I haven't noticed this issue before. How many events / second are you sending? If you look at the perf counters, you will find two perf counter categories relating to WSP. Start Performance Monitor and select WspEventRouter and WspEventRouterCommunication. View it in "report" style. This shows you the internal queues and the queue for each connection. You should see the queue properties at 0. Are you seeing events in any of the queues?

You can also execute "netstat -an | findstr 1300" in a cmd window. Notice what the port numbers are and if they are changing over time. Since WSP keeps the connections open, you should see the port numbers staying the same. If you're constantly seeing new port numbers then something is causing the connections to drop.

Keith

Nov 29, 2012 at 7:27 AM

Hi, Keith,

Thanks for the reply

Actually, we had commented out the perfmance counter since in some client machine we don't have the right to install that.

We can get the message eventually, but it was delayed for several minutes, we aslo noticed that WSP router on the server had eaten lots of memory (7G), and the CUP was peaking

And, the client application sits on the same pc with sever can 100% receive the message without delay. We ensure the TCP connection was established succesfully. Do you have any idea about this?

Thanks again

Coordinator
Nov 29, 2012 at 12:24 PM

How many events / seconds do you see on the server when it starts using memory? You must be queuing events on the server when it's consuming lots of memory, what queues do the perf counters show are events being queued in?

Dec 3, 2012 at 12:17 AM

In which cases the event will be queued. One event type one queue? if no message was subscribed, will the message be routed from parent to client? you means the router created lots of wsp queue, can eat out lots of memory right?

We are run some stress testing with some dummy event types. Our team member describes the iisue like:  This was about 10 minutes after a restart.  I’d restarted because my machine bluescreened, which I suspect was due to the same thing.  I had been starting the Our Application, but not sending any messages.  I can’t see anything obvious in the event logs and it seems to be fine now (I disabled all the services and started again): that router is showing about 110MB, which still seems high.

Coordinator
Dec 3, 2012 at 3:04 AM

 

There are numerous internal queues within the Wsp  router. The perf counters for "WspEventRouter" will show the state for the internal queues and what the event / seconds rate is. The perf counters for "WspEventRouterCommunication" will show the state of the queue for each parent/child connection. If you overwhelm the Wsp router, you may see events queuing up in the internal queues. If the rate of events is too great for the network connection or if the network connection is lost, you would see the events queuing up for the connection. As the events queue, they begin using memory. It will depend upon the size of your average event but for our usage, I've seen ~500,000 events queued which consumed ~14GB of memory.

If the event isn't subscribed to from another server, they will never be forwarded. All events sent between servers use the same queue.

The amount of memory the Wsp router process uses is roughly the size of the shared queue setting (look at the config file) + the app itself + space used by queued events. I believe the default setting for the shared queue is 100 MB so having the process use 110 MB is to be expected. The size of the shared queue needs to correspond to the size of your events and the event / second rate. As these two factors increase, you need to increase the size of the shared queue. I uses 100 MB for the client servers and 1 GB for the parent servers which are only running as Wsp routers.

One thing to look at, you mentioned you removed perf counters from the code for your client machines since you didn't have permission at install time to create the counters. Also at setup time, the code creates an event source in the "System" event log which is used for logging. I think this also requires elevated privileges. If you didn’t change all the code which writes to the event log, I’m not sure what the behavior will be if the Wsp router tries writing to the event log and the event source doesn’t exist.

 

Dec 3, 2012 at 6:36 AM
Edited Dec 3, 2012 at 6:56 AM

Keithh,

Thanks for the detailed explaination.

We had setup the performance counter, we are trying to reproduce it.

What should we do when the network connection was lost or has something else bad with that?

 

One more question: Why EventsProcessed always shows some figures and go back to zero? it should keep increasing, right?

Thanks.

 

Coordinator
Dec 3, 2012 at 12:43 PM

When a network connection is lost between servers, there are some settings in the config file which control the max size of the queue and how long the queue should be kept for. These settings only apply for when the connection is lost. So when the connection is lost, the Wsp router will continue queuing events for that connection. It will delete the queue when the max size if reached or if the connection hasn't been restored within the specified amount of time.

 

Dec 4, 2012 at 1:21 AM

Hi, Keithh,

Our client don't publish anything to parent, it just listens the events from the parent. In our case, does the events from parent occupied lots of memory? Do you know why those events was queued without dequeue? Is there any setting likes outputCommunicationQueues, controls when all events from parent  in the queue will be deleted?

Let me say Thanks again for your kindly help

Coordinator
Dec 4, 2012 at 3:40 AM

When you see the events being queued on the one server, what is the state of the network connection? Run a "netstat -an | findstr 1300" in a cmd window which will show the state of the connections. Do this on each server.

BTW, I don't ever see the behavior you are describing.

Can you send me the config file from each server?

Dec 4, 2012 at 5:22 AM

Hi, Keithh,

Below are the config files. Currently, we hadn't reproduced the issue, will post the connection information once it happened again.  One things i forgot to mention, we always run three WSP routers on one dev machine, one for QA, one for UAT, one for Production to receive different messages from differnt parent.

-----Client

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
 <configSections>
  <section name="eventRouterSettings" type="RouterSettings"/>
  <section name="eventPersistSettings" type="PersistSettings"/>
 </configSections>

 <eventRouterSettings>
    <!-- If you choose autoConfig to be true then you need to first install the origin router and have the DNS configured with -->
    <!-- data center routers if your topology requires it. With autoConfig turned on, all config files will be kept identical -->
    <!-- to the origin's config file. When it is changed, all others will automatically be changed. -->
    <!-- The bootstrapUrl must resolve to the origin router. It will initially be called by a client to retrieve its config file. -->
    <!-- The mgmtGroup is the eventId which will be used to communicate all config info while servers are running. -->
    <!-- Role defines what role a given server takes when automatically establishing the topology. -->
    <!-- The values for role are: -->
    <!--   origin   -->
    <!--   primary   -->
    <!--   secondary   -->
    <!--   client   -->
    <!-- <configInfo role="origin" autoConfig="false" bootstrapUrl="http://WspOrigin/GetConfig" mgmtGroup="2B2B78DB-8AE7-4a16-AB6C-850F54A82D54"/> -->

    <configInfo role="client" autoConfig="false" bootstrapUrl=""/>

    <clientRoleInfo>

      <subscriptionManagement refreshIncrement="3"  expirationIncrement="10"/>

  <localPublish eventQueueName="WspEventQueue" eventQueueSize="102400000" averageEventSize="10240"/>

  <!-- These settings control what should happen to an output queue when communications is lost to a parent or child.-->
  <!-- maxQueueSize is in bytes and maxTimeout is in seconds.-->
  <!-- When the maxQueueSize is reached or the maxTimeout is reached for a communication that has been lost, the queue is deleted.-->
  <outputCommunicationQueues maxQueueSize="200000000" maxTimeout="600"/>

  <!-- nic can be an alias which specifies a specific IP address or an IP address. -->
  <!-- port can be 0 if you don't want to have the router open a listening port to be a parent to other routers. -->
  <thisRouter nic="" port="1300" bufferSize="1024000" timeout="30000" />

  <parentRouter name="pcname.dns.net" numConnections="2" port="1300" bufferSize="1024000" timeout="30000" />

    </clientRoleInfo>

 </eventRouterSettings>

 <eventPersistSettings>

     <!-- type specifies the EventType to be persisted.-->
  <!-- localOnly is a boolean which specifies whether only events published on this machine are persisted or if events from the entire network are persisted.-->
  <!-- maxFileSize specifies the maximum size in bytes that the persisted file should be before it is copied.-->
  <!-- maxCopyInterval specifies in seconds the longest time interval before the persisted file is copied.-->
  <!-- fieldTerminator specifies the character used between fields.-->
  <!-- rowTerminator specifies the character used at the end of each row written.-->
  <!-- tempFileDirectory is the local directory used for writing out the persisted event serializedEvent.-->
  <!-- copyToFileDirectory is the final destination of the persisted serializedEvent file. It can be local or remote using a UNC.-->

  <!-- <event type="78422526-7B21-4559-8B9A-BC551B46AE34" localOnly="true" maxFileSize="2000000000" maxCopyInterval="60" fieldTerminator="," rowTerminator="\n" tempFileDirectory="c:\temp\WebEvents\" copyToFileDirectory="c:\temp\WebEvents\log\" /> -->

 </eventPersistSettings>
</configuration>

-- Server (Original)

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <configSections>
    <section name="eventRouterSettings" type="RouterSettings"/>
    <section name="eventPersistSettings" type="PersistSettings"/>
  </configSections>

  <eventRouterSettings>

    <!-- If you choose autoConfig to be true then you need to first install the origin router and have the DNS configured with -->
    <!-- data center routers if your topology requires it. With autoConfig turned on, all config files will be kept identical -->
    <!-- to the origin's config file. When it is changed, all others will automatically be changed. -->
    <!-- The bootstrapUrl must resolve to the origin router. It will initially be called by a client to retrieve its config file. -->
    <!-- The mgmtGroup is the eventId which will be used to communicate all config info while servers are running. -->
    <!-- Role defines what role a given server takes when automatically establishing the topology. -->
    <!-- The values for role are, though you can dynamically create your own names (i.e. fooRoleInfo): -->
    <!--   origin   -->
    <!--   primary   -->
    <!--   secondary   -->
    <!--   client   -->
    <!-- <configInfo role="client" autoConfig="false" bootstrapUrl="http://WspEventRouterOrigin/GetConfig" mgmtGroup="2B2B78DB-8AE7-4a16-AB6C-850F54A82D54"/> -->
    <configInfo role="origin" autoConfig="true" bootstrapUrl="http://localhost:4092/GetConfig" mgmtGroup="D979AEB4-C501-4027-BBCA-C38F7B56FD00" cmdGroup="C8EDEB22-7E4A-4441-B7B4-419DDB856321"/>

    <!-- refreshIncrement should be about 1/3 of what the expirationIncrement is. -->
    <!-- This setting needs to be consistent across all the machines in the eventing network. -->
    <!-- <subscriptionManagement refreshIncrement="3"  expirationIncrement="10"/> -->

    <!-- <localPublish eventQueueName="WspEventQueue" eventQueueSize="102400000" averageEventSize="10240"/> -->

    <!-- These settings control what should happen to an output queue when communications is lost to a parent or child.-->
    <!-- maxQueueSize is in bytes and maxTimeout is in seconds.-->
    <!-- When the maxQueueSize is reached or the maxTimeout is reached for a communication that has been lost, the queue is deleted.-->
    <!-- <outputCommunicationQueues maxQueueSize="200000000" maxTimeout="600"/> -->

    <!-- nic can be an alias which specifies a specific IP address or an IP address. -->
    <!-- port can be 0 if you don't want to have the router open a listening port to be a parent to other routers. -->
    <!-- <thisRouter nic="" port="1300" bufferSize="1024000" timeout="30000" /> -->

    <!-- <parentRouter name="ParentMachineName" port="1300" bufferSize="1024000" timeout="30000" />  -->

    <originRoleInfo>
      <subscriptionManagement refreshIncrement="3"  expirationIncrement="10"/>

      <localPublish eventQueueName="WspEventQueue" eventQueueSize="10240000" averageEventSize="10240"/>

      <outputCommunicationQueues maxQueueSize="20000000" maxTimeout="600"/>

      <thisRouter nic="" port="1300" bufferSize="1024000" timeout="30000" />

      <!-- <parentRouter name="" numConnections="2" port="1300" bufferSize="1024000" timeout="30000" /> -->
    </originRoleInfo>

    <primaryRoleInfo>
      <subscriptionManagement refreshIncrement="3"  expirationIncrement="10"/>

      <localPublish eventQueueName="WspEventQueue" eventQueueSize="10240000" averageEventSize="10240"/>

      <outputCommunicationQueues maxQueueSize="20000000" maxTimeout="600"/>

      <thisRouter nic="" port="1300" bufferSize="1024000" timeout="30000" />

      <parentRouter name="WspDcOrigin" numConnections="10" port="1300" bufferSize="1024000" timeout="30000" />
    </primaryRoleInfo>

    <secondaryRoleInfo>
      <subscriptionManagement refreshIncrement="3"  expirationIncrement="10"/>

      <localPublish eventQueueName="WspEventQueue" eventQueueSize="102400000" averageEventSize="10240"/>

      <outputCommunicationQueues maxQueueSize="200000000" maxTimeout="600"/>

      <thisRouter nic="" port="1300" bufferSize="1024000" timeout="30000" />

      <parentRouter name="WspDcPrimary1" numConnections="10" port="1300" bufferSize="1024000" timeout="30000" />
    </secondaryRoleInfo>

    <!-- clientRoleInfo will be the default setting for when autoConfig is false. -->
    <clientRoleInfo>
      <subscriptionManagement refreshIncrement="3"  expirationIncrement="10"/>

      <localPublish eventQueueName="WspEventQueue" eventQueueSize="102400000" averageEventSize="10240"/>

      <outputCommunicationQueues maxQueueSize="20000000" maxTimeout="600"/>

      <thisRouter nic="" port="" bufferSize="1024000" timeout="30000" />

      <parentRouter name="localhost" numConnections="2" port="1300" bufferSize="1024000" timeout="30000" />
    </clientRoleInfo>
  </eventRouterSettings>

  <eventPersistSettings>

    <!-- type specifies the EventType to be persisted.-->
    <!-- localOnly is a boolean which specifies whether only events published on this machine are persisted or if events from the entire network are persisted.-->
    <!-- maxFileSize specifies the maximum size in bytes that the persisted file should be before it is copied.-->
    <!-- maxCopyInterval specifies in seconds the longest time interval before the persisted file is copied.-->
    <!-- fieldTerminator specifies the character used between fields.-->
    <!-- rowTerminator specifies the character used at the end of each row written.-->
    <!-- tempFileDirectory is the local directory used for writing out the persisted event data.-->
    <!-- copyToFileDirectory is the final destination of the persisted data file. It can be local or remote using a UNC.-->

    <!-- <event type="78422526-7B21-4559-8B9A-BC551B46AE34" localOnly="true" maxFileSize="2000000000" maxCopyInterval="60" createEmptyFiles="false"
          fieldTerminator="," rowTerminator="\n" tempFileDirectory="c:\temp\WebEvents\" copyToFileDirectory="c:\temp\WebEvents\log\" /> -->

  </eventPersistSettings>
</configuration>

 

Thanks

Coordinator
Dec 4, 2012 at 4:38 PM

I have never run multiple instances of the Wsp router on a single server. You would need to modify the code to make this work. You'd have to use different shared memory names, port numbers, perf counter names, etc. This may be your problem.

You should have the same config file for both servers and the only difference should be the role. One server would be "client" and the other server would be "origin". Notice the queue size for the origin is 10MB and the queue size for the client is 100MB.

Coordinator
Dec 4, 2012 at 4:41 PM

BTW, you want to use the config file from the origin but change the parentRouter in the client section to pcname.dns.net.

Dec 5, 2012 at 12:53 AM
Edited Dec 5, 2012 at 12:57 AM

Keithh,

On the server side, we just have one original WSP router, one original router for QA, one for UA, one for Production, but on our dev box, we want to run mutilple routers to receive messages from different servers. Eventually, if our product goes to live, there is just one client router running on user's box.  According to our testing, we can succesfully receive the messages from different severs,  do you really think we need to modify the code to support this function? I means, to listen the message based on the evniornment user selected during login.

 

Thanks

Coordinator
Dec 5, 2012 at 4:35 AM

You should just need one origin server. Have all the other computers be clients. The config file should be the same for all computers with the role being "client" except for the origin computer where its role is "origin". Use different event types for the events so QA, UA, and Production GUIDs are all different.

You shouldn't be running multiple instances of the Wsp router in different processes on one computer.

Dec 5, 2012 at 5:17 AM

Keithh,

From my learning of wsp, it should be ok for every subscriber to keep eye on the same shared queue, Every message, not matter where they came from, it will queue in the shared queue, every subscriber will get that notification. Our client never publish the message directly, we always call the service on the Server to publish WSP message, so i think we can have them all listening on the same port.

In user's evniornment, our application don't have the right to stop QA wsp router and start UA wsp router after user switching the evniornment, so we just keep all the routers running, to give the application the ablity to receive the different message from different server.

Are we on the right direction?

 

Coordinator
Dec 5, 2012 at 12:58 PM

Just to be clear, the wspeventrouter.exe on each computer is what I refer to as the Wsp router. It is the only process which opens the Tcp port to its parent and which a parent (the origin) is listening for its children to connect. You can't have more than one wspeventrouter.exe processes running on the same computer.

The events are passed from computer to computer only if some application has subscribed to that type of event. When the event is received by the wspeventrouter.exe, it is put in the shared queue. Your application is using a DLL to listen for the events it is subscribing to and when it sees one of those events, it takes it from the shared queue and gives it to your application.

It doesn't matter which computer you publish the events on or subscribe to events from, the events will go to every computer where there is a subscription for that event type.

For your application, the thing that should change when you go from QA to UA to production is what event type your application is subscribing to.

Dec 7, 2012 at 2:13 AM

Keithh,

Thanks for the answer.

One more question.

Currently, we have 100 client router connect to the same original router, listening the message from parent. Even there is not any message be published, we noticed the original WSP router always have a pretty high CPU usage. I think it caused because all the clients are sending the internal event (for subscription) every 3 seconds to parent, and it takes something for the parent to process all of them. it's a normal behavior of WSP, but we are worrying we will soon run into some serious problems as the number of client increasing, do you have some suggestions on this?

Learning from the desciption from homepage, WSP router should be good to handle this.

WSP is currently installed on >9,000 production servers in Microsoft. A real-time monitoring application uses the event system for communications and monitors 100% of the IIS requests. Within 3 seconds, all http requests across all servers and all datacenters are aggregated, displayed on a console, and able to instigate alerts.

how you configure the 9,000 production servers, clients connect to the primary router, and all the primary routers linked by the original router, right? how many clients does one primary covered? 400?

Coordinator
Dec 7, 2012 at 2:26 AM

From: EnchantedBusiness

Keithh,

Thanks for the answer.

One more question.

Currently, we have 100 client router connect to the same original router, listening the message from parent. Even there is not any message be published, we noticed the original WSP router always have a pretty high CPU usage. I think it caused because all the clients are sending the internal event (for subscription) every 3 seconds to parent, and it takes something for the parent to process all of them. it's a normal behavior of WSP, but we are worrying we will soon run into some serious problems as the number of client increasing, do you have some suggestions on this?

Learning from the desciption from homepage, WSP router should be good to handle this.

WSP is currently installed on >9,000 production servers in Microsoft. A real-time monitoring application uses the event system for communications and monitors 100% of the IIS requests. Within 3 seconds, all http requests across all servers and all datacenters are aggregated, displayed on a console, and able to instigate alerts.

how you configure the 9,000 production servers, clients connect to the primary router, and all the primary routers linked by the original router, right? how many clients does one primary covered? 400?

Dec 7, 2012 at 4:25 AM

Keithh,

are you forgot to text the answer or what i asked is just the answer? please clarify.

Thanks for the patience.

Coordinator
Dec 10, 2012 at 3:56 PM

I tried replying from my cell phone but I guess it didn't work, what I said was:

The origin should not be using hardly any CPU. Something is not right. What are the perf counter values for wspeventrouter on the origin when you say it's using CPU?

Jul 29, 2013 at 1:59 PM
hi keithh
I've provblem with a wsp server application. I run my "server" application that subscribes itself to an event. Another pc runs a client application that sent many events (it would be a stress test). Server read events and write them into a database.

The problem is that the memory used from the server increases very rapidly.

Which could be problem?
Coordinator
Jul 29, 2013 at 3:43 PM
It sounds like you're publishing events at a higher rate than your server can write. If you use Wsp v3.0 then there are some perf counters you can look at to see if events are being queued in your app process. Also, if you need to write events at a rate faster than just one subscriber you can partition your events to multiple subscribers by using a filtered subscription.
Jul 29, 2013 at 5:34 PM
Thanks for fast answer keithh.
I will consider idea of to pass to wsp 3.0 to use perf counter.
Anyway can you kindly tell me as can I partition my events to multiple subscribers by using a filtered subscription?
Thanks a lot.
Coordinator
Jul 30, 2013 at 4:22 AM
See my thread on Filtered Subscription Example