Infinispan client 容错机制

codingparty 2015-07-13

展开全文

声明：
         1、infinispan version: 7.2.3.Final
         2、客户端配置：默认配置
         3、集群模型：分布式集群
容错机制概述：
         1、infinispan client主动获取服务端的服务器列表，并建立链接。
         2、客户端向服务端发起请求，发生异常（org.infinispan.client.hotrod.exceptions.TransportException）后，将进行重试（重新发送请求）操作。根据infinispan.client.hotrod.max                _retries=10设置的大小决定重试的次数。
通过代码详细阐述容错机制：
          1、校验是否需要重试
                 注意： org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute()首次执行为正常操作。
                A: org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(){
                                   int retryCount = 0;
                                  Set<SocketAddress> failedServers = null;
                                   //校验是否需要重试
                                   while (shouldRetry(retryCount)) {
                                       ...
                                    }
                                  ...
                          }

                B： org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.shouldRetry(int retryCount) {
                                       //根据max _retries的设置校验是否需要重试
                                     return retryCount <= transportFactory.getMaxRetries();
                          }
          2、根据制定的key获取链接
                A：org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(){
                                   int retryCount = 0;
                                  Set<SocketAddress> failedServers = null
                                   while (shouldRetry(retryCount)) {
                                       Transport transport = null;
                                       try {
                                                 //获取链接
                                                 transport = getTransport(retryCount, failedServers);
                                                 return executeOperation(transport);
                                       } catch (TransportException te) {
                                          ...
                                       }...
                                    }
                                  ...
                           }
                B：org.infinispan.client.hotrod.impl.operations.getTransport(int retryCount, Set<SocketAddress> failedServers){
                        if (retryCount == 0) {
                                   //根据指定的key获取链接
                                   return transportFactory.getTransport(key, failedServers, cacheName);
                          } else {
                                  ...
                          }
                        }
          3、发生异常记录访问失败的节点地址
                A：org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(){
                                   int retryCount = 0;
                                  Set<SocketAddress> failedServers = null
                                   while (shouldRetry(retryCount)) {
                                      ..
                                       try {
                                            ...
                                       } catch (TransportException te) {
                                         //记录访问失败的节点地址
                                         if (failedServers == null) {
                                               failedServers = new HashSet<SocketAddress>();
                                             }
                                            failedServers.add(te.getServerAddress());
                                      if (transport != null) {
                                            ...
                                           //销毁发生异常的链接
                                          transportFactory.invalidateTransport(te.getServerAddress(), transport);
                                          }
                                       }...
                                    }
                                  ...
                           }
          4、记录尝试的次数
                A：org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute()                             {
                                  int retryCount = 0;
                                 Set<SocketAddress> failedServers = null;
                                 while (shouldRetry(retryCount)) {
                                          ...
                                         //记录尝试次数
                                         retryCount++;
                                  }
                                 ...
                            }
          5、发生重试操作(retryCount>0)，根据均衡策略获取链接
                A：org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(){
                                   int retryCount = 0;
                                  Set<SocketAddress> failedServers = null
                                   while (shouldRetry(retryCount)) {
                                       Transport transport = null;
                                       try {
                                                 //获取链接
                                                 transport = getTransport(retryCount, failedServers);
                                                 return executeOperation(transport);
                                       } catch (TransportException te) {
                                          ...
                                       }...
                                    }
                                  ...
                           }
                B：org.infinispan.client.hotrod.impl.operations.getTransport(int retryCount, Set<SocketAddress> failedServers){
                        if (retryCount == 0) {
                                ...
                          } else {
                              //获取链接
                              return transportFactory.getTransport(failedServers, cacheName);
                          }
                        }
                C：org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.getTransport(Set<SocketAddress> failedServers, byte[] cacheName) {
                          SocketAddress server;
                          synchronized (lock) {
                              server = getNextServer(failedServers, cacheName);
                              }
                            //根据节点地址获取链接
                             return borrowTransportFromPool(server);
                        }
                D： org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.getNextServer(Set<SocketAddress> failedServers, byte[] cacheName) {
                                    FailoverRequestBalancingStrategy balancer = getOrCreateIfAbsentBalancer(cacheName);
                                     //根据均衡策略获取节点地址
                                    SocketAddress server = balancer.nextServer(failedServers);
                                   ...
                               return server;
                             }
                E：org.infinispan.client.hotrod.impl.transport.tcp.RoundRobinBalancingStrategy.nextServer(Set<SocketAddress> failedServers) {
                         for (int i = 0;; ++i) {
                               SocketAddress server = getServerByIndex(index++);
                                       ...
                            if (index >= servers.length)
                                            index = 0;
                            //过滤掉失败的节点
                           if (failedServers == null || !failedServers.contains(server) || i >= failedServers.size()) {
                              return server;
                              }
                          }
                    }
总结：
         1、依照上述机制，整个集群中宕机任意节点保证客户端正常运行的情况下，重试次数至少为3次（infinispan.client.hotrod.max_retries=3）
         2、此org.infinispan.client.hotrod.exceptions.TransportException异常为c/s正常交互时，链接被服务端终断的异常，即客户端未将请求正常发送于服务端的异常会导致请求直接失败。那么当客户端获取已宕掉节点的链接通讯发生超时异常时则不会重试。宕掉节点情况下的容错机制基本不可用。