Memcached(简称为:MC)在互联网广泛使用,是最基础的架构。但MC的mget(即一次获取多个值)一直是一个难题,我们的要求是mget性能上要尽量接近普通memcache get。下面通过一段伪代码介绍了如何以接近get single value的性能实现mget,并且就该架构在实际环境中遇到的一些问题加以讨论。
场景
在开始这个话题之前先考虑一个问题,为什么需要MC mget?Redis不是已经很好的实现了list,hashset,hashtable,zset等等丰富的数据结构吗?这个问题需要从本厂的应用场景开始。用户登陆之后会修改自己的状态,同时获得自己关注人的状态。修改自己的状态是一次MC set过程。自己的关注人列表可以从Redis中获得,此时key是用户的uid,value是关注任的list。获得自己关注人的状态则是根据关注人uid的一次MC get,时间复杂度是O(1)。可以这样做,在程序中执行一个for循环,依次从MC中get关注人状态,这个get过程的时间复杂度是O(n)。当关注人列表扩展到2000时,每次MC get平均耗时2~5ms,这种线性循环获取好友状态的办法要耗时10s,是完全无法接受的。怎么解决这个问题呢?
通过NIO实现mget,并发的执行MC get
danga.memcached2.0.1已经使用NIO框架来实现mget,但是它的实现有些问题,参考:http://blog.csdn.net/e_wsq/article/details/7876801。mget伪代码如下:
- private final class Conn {
- public ByteBuffer outgoing;
- // 使用一个ByteBuffer list来存储从MC读出的内容
- public List<ByteBuffer> incoming = new ArrayList<ByteBuffer>();
-
- public Conn(Selector selector) {
- channel = getSock().getChannel();
- channel.configureBlocking( false );
- channel.register( selector, SelectionKey.OP_WRITE | SelectionKey.OP_READ, this );
-
- outgoing = ByteBuffer.wrap( request.append( "\r\n" ).toString().getBytes() );
- }
-
- public boolean isFinished() {
- // judge if get "END\r\n"
- }
-
- public ByteBuffer getBuffer() {
- int last = incoming.size()-1;
- if ( last >= 0 && incoming.get( last ).hasRemaining() ) {
- return incoming.get( last );
- }
- else {
- ByteBuffer newBuf = ByteBuffer.allocate( 8192 );
- incoming.add( newBuf );
- return newBuf;
- }
- }
- }
-
- public Object getMulti() throws Exception {
- selector = Selector.open();
- Conn conn = new Conn(selector);
-
- try {
- while(timeRemaining) {
- int n = selector.select(timeout));
- if ( n > 0 ) {
- Iterator<SelectionKey> it = selector.selectedKeys().iterator();
- while ( it.hasNext() ) {
- SelectionKey key = it.next();
- it.remove();
-
- if ( key.isReadable() )
- readResponse( key );
- else if ( key.isWritable() )
- writeRequest( key );
- }
- }
- else {
- // error...
- }
-
- timeRemaining = timeout - (SystemTimer.currentTimeMillis() - startTime);
- }
- }
- finally {
- selector.close();
- }
- }
-
- public void writeRequest( SelectionKey key ) throws IOException {
- ByteBuffer buf = ((Conn) key.attachment()).outgoing;
- SocketChannel sc = (SocketChannel)key.channel();
-
- if ( buf.hasRemaining() ) {
- sc.write( buf );
- }
-
- if ( !buf.hasRemaining() ) {
- // switching to read mode for server
- key.interestOps( SelectionKey.OP_READ );
- }
- }
-
- public void readResponse( SelectionKey key ) throws IOException {
- Conn conn = (Conn)key.attachment();
- ByteBuffer buf = conn.getBuffer();
- int count = conn.channel.read( buf );
- if ( count > 0 ) {
- if ( log.isDebugEnabled() )
- log.debug( "read " + count + " from " + conn.channel.socket().getInetAddress() );
-
- if ( conn.isFinished() ) {
- ...
- return;
- }
- }
- }
伪代码中主要给出了NIO中的一些逻辑。并发mget的好处是非常明显的,但这段代码有几个明显的坑。
mget伪代码的几个坑
1. Too many open files的坑
每次getMulti都执行Selector.open()?? Linux系统中,执行Selector.open()打开一对pipe(参考:http://blog.csdn.net/haoel/article/details/2224055),当后续IO慢时,Selector就不能及时关闭。造成大量pipe被创建,导致Too many open files错误。一般NIO的逻辑是只有一个全局selector,新channel注册后只需selector.wakeup() 即可。
2. 死循环的坑
Java6 NIO有两个众所周知的坑:http://bugs./view_bug.do?bug_id=6693490和http://bugs./bugdatabase/view_bug.do?bug_id=6403933。简单的说,就是Selector应该只在2种情况有返回值,即有网络事件发生或者超时。但是Selector有时却会在没有获得任何selectionKey的情况返回,这是一个Java6 NIO的bug。上面这段mget的伪代码中没有相关处理,容易造成死循环。我们可以参考MINA的解决方法,伪代码如下:
- long t0 = System.currentTimeMillis();
- int selected = select(1000L);
- long t1 = System.currentTimeMillis();
- long delta = (t1 - t0);
-
- if ((selected == 0) && !wakeupCalled.get() && (delta < 100)) {
- // Last chance : the select() may have been
- // interrupted because we have had an closed channel.
- if (isBrokenConnection()) {
- LOG.warn("Broken connection");
-
- // we can reselect immediately
- // set back the flag to false
- wakeupCalled.getAndSet(false);
-
- continue;
- } else {
- LOG.warn("Create a new selector. Selected is 0, delta = " + (t1 - t0));
- // Ok, we are hit by the nasty epoll
- // spinning.
- // Basically, there is a race condition
- // which causes a closing file descriptor not to be
- // considered as available as a selected channel, but
- // it stopped the select. The next time we will
- // call select(), it will exit immediately for the same
- // reason, and do so forever, consuming 100%
- // CPU.
- // We have to destroy the selector, and
- // register all the socket on a new one.
- registerNewSelector();
- }
-
- // Set back the flag to false
- wakeupCalled.getAndSet(false);
-
- // and continue the loop
- continue;
- }
这段代码非常清晰,触发条件是selector返回值为0,网络没有断开,并且时间<100ms就认为是触发了Java NIO的bug。处理的方法就是重建一个selector。另外一个可以参考的例子是Jetty的处理方法:http://wiki./Jetty/Feature/JVM_NIO_Bug
|