VIP with the load balancer

May 27, 2013 at 2:22 PM
Keithh,

In our Production evniornmet, we had two boxes, says Prod1 and Prod2, and we had setup VIP between them for load-balancing. we had lots of client using the VIP to connect one of them, on the behind, we configured PROD1 as Origin, and Prod2 as primary, connecting to PROD1 as the parent. We had external service to call WCF service hosted on both boxes to publish the WSP message to all the clients。We noticed some times, the message is published twice, respectively by Prod1 , Prod2. Had you met this kind of issue before? please advise what i can do to remove the duplicated message?

As i know about the WSP, seems the client had established the connections with DR and PR both, since the message which was published on box, wil be replicated to another box, and they both will forward the same message to client, that's why the duplicated messages show up on the client side.

I don't know why the client router try to connect another boxes when it had establisheh the connection with one of them. Learning from codes, is it because the connectioon broken? (Socket.connected = false?).

please advise, thanks in advance.
Coordinator
May 28, 2013 at 5:36 AM
I don't fully follow your issue. I thought you were now using Wsp version 3.0? If so, then there is not longer a "primary" or "secondary". You would set up your two servers to both be hubs and in the same group. Are you running your Prod1 and Prod2 in active/active mode or active/passive mode?

It doesn't sound right that the client connected with both servers. This should never happen. When you say the client is getting the same event twice, did you look at the two events and verify they have the same originating router name? I would think it's more likely that both Prod1 and Prod2 are publishing events which look the same.
May 28, 2013 at 6:23 AM
Keithh,

We are using 2.0.

We had blocked server to forward the subscription event to client, from the surface, we think it is unnecessary since we just publish the message from server to client. We suspected the issue was introduced by that. Client cannnot receive anything from server for 5 mins, the socket was timeout, and the client were always trying to connect to server to establish the connection again, that bring the chance that client established the connections with both PROD1/PROD2.

how do you think about this?

Thanks
Coordinator
May 28, 2013 at 1:57 PM
Be sure to specify in the config file for the client that numConnections = 1. Then there is no possibility of connecting to both servers.
May 28, 2013 at 2:23 PM
Keithh,
Thanks for the quick answer.
Actually, we set the numConnections =2. do you think it will bring the chance that the client may connect to both server?

                          for (i = 0; i + parentConnections.Count < parentRoute.NumConnections; i++)
                            {
                                if (string.IsNullOrEmpty(parentRoute.RouterName) == true)
                                {
                                    continue;
                                }

                                Socket parentSocket = OpenParentSocket();
Is this code causeS the client connects to differenet server, if we have two client connection?


Thanks
Coordinator
May 28, 2013 at 5:20 PM
If you have a VIP which the load balancer can resolve to multiple servers then using more than one connection will cause issues.
May 29, 2013 at 2:15 AM
Edited May 29, 2013 at 2:15 AM
Keithh,

If we migrate to use WSP 3.0, can we get the this issue fixed?

Thanks,
Coordinator
May 29, 2013 at 5:15 AM
You don't need to migrate to 3.0 to solve this. In your config file in the clientRoleInfo section, set numConnections="1" for the parent router definition line. This should solve your problem.
May 29, 2013 at 6:28 AM
Keithh,

Got your points.

Learning from the codes, I noticed: when we configure the client to use VIP to connect its parent, to start the client router, we noticed ReceiveCallback is invoked every 5 mins, and we got connectionr reset as the socket error, then the thread is stopped, and the client try to connect ot its parent again. This is not observed when client connects to server router directly, do you have any idea on this?
private static void ReceiveCallback(IAsyncResult ar)
       bytesRead = socket.EndReceive(ar, out socketError);

                            if (socketError != SocketError.Success)
                            {
                                //EventLog.WriteEntry("WspEventRouter", "Receive failed with bad return code: " + socketError.ToString(), EventLogEntryType.Warning);

                                state.receiveDone.Set();
                                return;
                            }
Coordinator
May 29, 2013 at 6:36 AM
When you connect via a load balancer, the physical connection is from the client to the load balancer. There is then a separate connection from the load balancer to the server. You should check the settings on your load balancer to see what would affect a long running persisted connection. It sounds like there might be a timeout with the connection being closed.
May 29, 2013 at 7:43 AM
Yes, 5 mins is the value configured for the socket timeout.
My question why there is no such socket timeout observed when it connects to server router directly. Is that because the response from server was dropped since VIP used, no response received in 5 mins?

thanks
Coordinator
May 29, 2013 at 1:49 PM
There is no timeout for sockets when you connect directly to the server. You might be able to configure your load balancer to disable the timeout for the port you're using. I don't know anything about your load balancer though.
May 29, 2013 at 2:26 PM
Keithh,

can you show me where the code for the logic you mentioned, "There is no timeout for sockets when you connect directly to the server"? I want to learn the details, we not just use WSP, we also want to learn the great thing from it.

How do you configure for all the routers in your datacenter for the load balancing? You had mentioned in other thread, you are aslo using VIP to achieve load balancing, do you just set numconnections =1 or just set -1 as the timeout?

Thanks a lots