IntroductionI started programming with Java in 1999, after fifteen years with C and C++. I thought myself fairly competent at C-style memory management, using coding practices such as pointer handoffs, and tools such as Purify. I couldn't remember the last time I had a memory leak. So it was some measure of disdain that I approached Java's automatic memory management … and quickly fell in love. I hadn't realized just how much mental effort was expended in memory management, until I didn't have to do it any more. And then I met my first I can't remember what caused that first error, and I certainly didn't resolve it using reference objects. They didn't enter my toolbox until about a year later, when I was writing a server-side database cache and tried using soft references to limit the cache size. Turned out they weren't too useful there, either, for reasons that I'll discuss below. But once reference objects were in my toolbox, I found plenty of other uses for them, and gained a better understanding of the JVM as well. The Java Heap and Object Life CycleFor a C++ programmer new to Java, the relationship between stack and heap can be hard to grasp. In C++, objects may be created on the heap using the Integer foo = Integer(1); Java, unlike C++, stores all objects on the heap, and requires the public static void foo(String bar) { Integer baz = new Integer(bar); } The diagram below shows the relationship between the heap and stack for this method. The stack is divided into “frames,” which contain the parameters and local variables for each method in the call tree. Those variables that point to objects — in this case, the parameter Now look more closely at the first line of That's the “happy path.” There are several not-so-happy paths, and the one that we care about is when the Garbage CollectionWhile Java gives you a The garbage collector goes to work when the program tries to create a new object and there isn't enough space for it in the heap. The requesting thread is suspended while the collector looks through the heap, trying to find objects that are no longer actively used by the program, and reclaiming their space. If the collector is unable to free up enough space, and the JVM is unable to expand the heap, the Mark-SweepOne of the enduring myths of Java revolves around the garbage collector. Many people believe that the JVM keeps a reference count for each object, and the collector only looks at objects whose reference count is zero. In reality, the JVM uses a technique known as “mark-sweep.” The idea behind mark-sweep garbage collection is simple: every object that can't be reached by the program is garbage, and is eligible for collection. Mark-sweep collection has the following phases:
So what are the "roots"? In a simple Java application, they're method arguments and local variables (stored on the stack), the operands of the currently executing expression (also stored on the stack), and static class member variables. In programs that use their own classloaders, such as app-servers, the picture gets muddy: only classes loaded by the system classloader (the loader used by the JVM when it starts) contain root references. Any classloaders that the application creates are themselves subject to collection, once there are no more references to them. This is what allows app-servers to hot-deploy: they create a separate classloader for each deployed application, and let go of the classloader reference when the application is undeployed or redeployed. It's important to understand root references, because they define what a "strong" reference is: if you can follow a chain of references from a root to a particular object, then that object is "strongly" referenced. It will not be collected. So, returning to method Now consider the following: LinkedList foo = new LinkedList(); foo.add(new Integer(123)); Variable You may be wondering what happens if you have a circular reference: object A contains a reference to object B, which contains a reference back to A. The answer is that a mark-sweep collector isn't fooled: if neither A nor B can be reached by a chain of strong references, then they're eligible for collection. FinalizersC++ allows objects to define a destructor method: when the object goes out of scope or is explicitly deleted, its destructor is called to clean up the resources it used. For most objects, this means explicitly releasing any memory that the object allocated with However, memory isn't the only resource that might need to be cleaned up. Consider Any object can have a finalizer; all you have to do is declare the protected void finalize() throws Throwable { // cleanup your object here } While finalizers seem like an easy way to clean up after yourself, they do have some serious limitations. First, you should never rely on them for anything important, since an object's finalizer may never be called — the application might exit before the object is eligible for garbage collection. There are some other, more subtle problems with finalizers, but I'll hold off on these until we get to phantom references. Object Life Cycle (without Reference Objects)Putting it all together, an object's life can be summed up by the simple picture below: it's created, it's used, it becomes eligible for collection, and eventually it's collected. The shaded area represents the time during which the object is "strongly reachable," a term that becomes important by comparison with the reachability provided by reference objects. Enter Reference ObjectsJDK 1.2 introduced the
As you might guess, adding three new optional states to the object life-cycle diagram makes for a mess. Although the documentation indicates a logical progression from strongly reachable through soft, weak, and phantom, to reclaimed, the actual progression depends on what reference objects your program creates. If you create a It's also important to understand that not all objects are attached to reference objects — in fact, very few of them should be. A reference object is a layer of indirection: you go through the reference object to reach the referent, and clearly you don't want that layer of indirection throughout your code. Most programs, in fact, will use reference objects to access a relatively small number of the objects that the program creates. References and ReferentsA reference object is a layer of indirection between your program code and some other object, called a referent. Each reference object is constructed around its referent, and the referent cannot be changed. The reference object provides the SoftReference<List<Foo>> ref = new SoftReference<List<Foo>>(new LinkedList<Foo>()); // somewhere else in your code, you create a Foo that you want to add to the list List<Foo> list = ref.get(); if (list != null) { list.add(foo); } else { // list is gone; do whatever is appropriate } Or in words:
Also remember that soft, weak, and phantom references only come into play when there are no more strong references to the referent. They exist to let you hold onto objects past the point where they'd normally become food for the garbage collector. This may seem like a strange thing — if you no longer hold a strong reference, why would you care about the object? The reason depends on the specific reference type. Soft ReferencesWe'll start to answer that question with soft references. If an object is the referent of a The JDK documentation says that soft references are appropriate for a memory-sensitive cache: each of the cached objects is accessed through a To be useful in this role, however, the cached objects need to be pretty large — on the order of several kilobytes each. Useful, perhaps, if you're implementing a fileserver that expects the same files to be retrieved on a regular basis, or have large object graphs that need to be cached. But if your objects are small, then you'll have to clear a lot of them to make a difference, and the reference objects will add overhead to the whole process. Soft Reference as Circuit BreakerA better use of soft references is to provide a "circuit breaker" for memory allocation: put a soft reference between your code and the memory it allocates, and you avoid the dreaded For example, if you write a lot of JDBC code, you might have a method like the following to process query results in a generic way and ensure that the public static List<List<Object>> processResults(ResultSet rslt) throws SQLException { try { List<List<Object>> results = new LinkedList<List<Object>>(); ResultSetMetaData meta = rslt.getMetaData(); int colCount = meta.getColumnCount(); while (rslt.next()) { List<Object> row = new ArrayList<Object>(colCount); for (int ii = 1 ; ii <= colCount ; ii++) row.add(rslt.getObject(ii)); results.add(row); } return results; } finally { closeQuietly(rslt); } } The answer, of course, is an At this point, you may wonder: who cares? The query is going to abort in either case, why not just let the out-of-memory error do the job? The answer is that your application may not be the only thing affected. If you're running on an application server, your memory usage could take down other applications. Even in an unshared environment, a circuit-breaker improves the robustness of your application, because it confines the problem and gives you a chance to recover and continue. To create the circuit breaker, the first thing you need to do is wrap the results list in a SoftReference<List<List<Object>>> ref = new SoftReference<List<List<Object>>>(new LinkedList<List<Object>>()); And then, as you iterate through the results, create a strong reference to the list only when you need to update it: while (rslt.next()) { rowCount++; // store the row data List<List<Object>> results = ref.get(); if (results == null) throw new TooManyResultsException(rowCount); else results.add(row); results = null; } This works because almost all of the method's memory allocation happens in two places: the call to While those expensive operations happen, the only reference to the list is via the Once the expensive operations complete, you can hold a strong reference to the list with relative impunity. However, note that I use a Also note that I set the Soft References Aren't A Silver BulletWhile soft references can prevent many out-of-memory conditions, they can't prevent all of them. The problem is this: in order to actually use a soft reference, you have to create a strong reference to the referent: to add a row to the results, we need to have a reference to the actual list. During the time we hold that string reference, we are at risk for an out-of-memory error. The goal with a circuit breaker is to minimize the window during which it's useless: the time that you hold a strong reference to the object, and perhaps more important, the amount of allocation that happens during this time. In our case, we confine the strong reference to adding a row to the results, and we use a And I want to repeat that, while I hold the strong reference in a variable that quickly goes out of scope, the language spec says nothing about the JVM being required to clear variables that go out of scope. And as-of this writing, the Oracle/OpenJDK JVM does not do so. If I didn't explicitly clear the Finally, think carefully about non-obvious strong references. For example, you might want to add a circuit breaker while constructing XML documents using the DOM. However, each node in a DOM holds a reference to its parent, in effect holding a reference to every other node in the tree. And if you use a recursive call to build that document, your stack might be full of references to individual nodes. Weak ReferencesA weak reference, as its name suggests, is a reference object that doesn't put up a fight when the garbage collector comes knocking. If there are no strong or soft references to the referent, it's all but guaranteed to be collected. So what's the use? There are in fact two main uses: associating objects that have no inherent relationship, and reducing duplication via a canonicalizing map. The Problem With ObjectOutputStreamAs an example of the first case, I'm going to look at object serialization, which doesn't use weak references. But when you look at the stream specification, you see that there is in fact a relationship: in order to preserve object identity, the output stream associates a unique identifier with each object written, and subsequent requests to write the object instead write the identifier. This feature is absolutely critical to the stream's ability to serialize arbitrary object graphs: if it wasn't present, a self-referential graph would turn into an infinite stream of bytes. To implement this feature, both streams need to maintain a strong reference to every object written to the stream. For the programmer who decides to use object streams as an easy way to layer a messaging protocol onto a socket connection, this is a problem: messages are assumed transient, but the streams hold them in memory. Sooner or later, the program runs out of memory (unless the programmer knows to call Such non-inherent relationships are surprisingly common: they exist whenever the programmer needs to maintain context essential for the use of an object. Sometimes, as in the case of a servlet Weak references provide a way to maintain such associations while letting the garbage collector do its work: the weak reference remains valid only as long as there are also strong references. Returning to the object stream example, if you're using the stream for messaging, the message will be eligible for collection as soon as it's written. On the other hand, a stream used for RMI access to a long-lived data structure would maintain its sense of identity. Unfortunately, although the object stream protocol was updated with the 1.2 JDK, and weak references were added at the same time, the JDK developers didn't choose to combine them. So remember to call Eliminating Duplicate Data with Canonicalizing MapsObject streams notwithstanding, I don't believe there are many cases where you should associate two objects that don't have an inherent relationship. And some of the examples that I've seen, such as Swing listeners that clean up after themselves, seem more like hacks than valid design choices. In my opinion, the best use of weak references is to implement a canonicalizing map, a mechanism to ensure that only one instance of a value object exists at a time. A simple canonicalizing map works by using the same object as key and value: you probe the map with an arbitrary instance, and if there's already a value in the map, you return it. If there's no value in the map, you store the instance that was passed in (and return it). Of course, this only works for objects that can be used as map keys. Here's how we might implement private Map<String,String> _map = new HashMap<String,String>(); public synchronized String intern(String str) { if (_map.containsKey(str)) return _map.get(str); _map.put(str, str); return str; } This implementation is fine if you have a small number of strings to intern, perhaps within a single method that processes a file. However, let's say that you're writing a long-running application that has to process input from multiple sources, that contain a wide range of strings but still has a high level of duplication. For example, a server that processes uploaded files of postal address data: there will be lots of entries for New York City, not so many for Temperanceville VA. You would want to eliminate duplication of the former, but not hold onto the latter any longer than necessary. This is where a canonicalizing map with weak reference helps: it allows you to create a canonical instance only so long as some code in the program is using it. After the last strong reference disappears, the canonical string will be collected. If the string is seen again later, it becomes the new canonical instance. To improve our canonicalizer, we can replace private Map<String,WeakReference<String>> _map = new WeakHashMap<String,WeakReference<String>>(); public synchronized String intern(String str) { WeakReference<String> ref = _map.get(str); String s2 = (ref != null) ? ref.get() : null; if (s2 != null) return s2; _map.put(str, new WeakReference(str)); return str; } First thing to notice is that, while the map's key is a Second, note the process for returning a string: first we retrieve the weak reference. If it exists, then we retrieve the referent. But we have to check that object as well. It's possible that the reference is sitting in the map but is already cleared. Only if the referent is not null do we return it; otherwise we consider the passed-in string to be the new canonical version. Thirdly, note that I've synchronized the Finally, the documentation for Reference QueuesWhile testing a reference for The better solution is a reference queue: you associate a reference with a queue at construction time, and the reference will be put on the queue after it has been cleared. To discover which references have been cleared, you poll the queue. This can be done with a background thread, but it's often simpler to poll the queue at the time you create new references (this is what Reference queues are most often used with phantom references, described below, but can be used with any reference type. The following code is an example with weak references: it creates a bunch of buffers, accessed via a public static void main(String[] argv) throws Exception { Set<WeakReference<byte[]>> refs = new HashSet<WeakReference<byte[]>>(); ReferenceQueue<byte[]> queue = new ReferenceQueue<byte[]>(); for (int ii = 0 ; ii < 1000 ; ii++) { WeakReference<byte[]> ref = new WeakReference<byte[]>(new byte[1000000], queue); System.err.println(ii + ": created " + ref); refs.add(ref); Reference<? extends byte[]> r2; while ((r2 = queue.poll()) != null) { System.err.println("cleared " + r2); refs.remove(r2); } } } As always, there are things to note about this code. First, although we're creating Second is that we must hold a strong reference to the reference objects themselves. The reference object knows about the queue, but the queue doesn't know about the reference until it's enqueued. If we didn't maintain the strong reference to the reference object, it would itself be collected, and never added to the queue. I use a Phantom ReferencesPhantom references differ from soft and weak references in that they're not used to access their referents. Instead, their sole purpose is to tell you when their referent has already been collected. While this seems rather pointless, it actually allows you to perform resource cleanup with more flexibility than you get from finalizers. The Trouble With FinalizersBack in the description of object life cycle, I mentioned that finalizers have subtle problems that make them unsuitable for cleaning up non-memory resources. There are also a couple of non-subtle problems, that I'll cover here for completeness and then promptly ignore.
Now that those are out of the way, I believe the real problem with finalizers is that they introduce a gap between the time that the garbage collector first identifies an object for collection and the time that its memory is actually reclaimed, because finalization happens on its own thread, independent of the garbage collector's thread. The JVM is guaranteed to perform a full collection before it returns The following program demonstrates this behavior: each object has a finalizer that sleeps for half a second. Not much time at all, unless you've got thousands of objects to clean up. Every object goes out of scope immediately after it's created, yet at some point you'll run out of memory (if you want to run this example, I recommend using public class SlowFinalizer { public static void main(String[] argv) throws Exception { while (true) { Object foo = new SlowFinalizer(); } } // some member variables to take up space -- approx 200 bytes double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z; // and the finalizer, which does nothing by take time protected void finalize() throws Throwable { try { Thread.sleep(500L); } catch (InterruptedException ignored) {} super.finalize(); } } The Phantom KnowsPhantom references allow the application to learn when an object is no longer used, so that the application can clean up the object's non-memory resources. Unlike finalizers, however, the object itself has already been collected by the time the application learns this. Also unlike finalizers, cleanup is scheduled by the application, not the garbage collector. You might dedicate one or more threads to cleanup, perhaps increasing the number if the number of objects demands it. An alternative — and often simpler — approach is to use an object factory, and clean up after any collected instances before creating a new one. The key point to understand about phantom references is that you can't use the reference to access the object: Implementing a Connection Pool with Phantom ReferencesDatabase connections are one of the most precious resources in any application: they take time to establish, and database servers place strict limits on the number of simultaneous open connections that they'll accept. For all that, programmers are remarkably careless with them, sometimes opening a new connection for every query and either forgetting to close it or not closing it in a Rather than allow the application to open direct connections to the database, most application server deployments use a connection pool: the pool maintains a (normally fixed) set of open connections, and hands them to the program as needed. Production-quality pools provide several ways to prevent connection leaks, including timeouts (to identify queries that run excessively long) and recovery of connections that are left for the garbage collector. This latter feature serves as a great example of phantom references. To make it work, the The least interesting part of the pool is the public class PooledConnection implements InvocationHandler { private ConnectionPool _pool; private Connection _cxt; public PooledConnection(ConnectionPool pool, Connection cxt) { _pool = pool; _cxt = cxt; } private Connection getConnection() { try { if ((_cxt == null) || _cxt.isClosed()) throw new RuntimeException("Connection is closed"); } catch (SQLException ex) { throw new RuntimeException("unable to determine if underlying connection is open", ex); } return _cxt; } public static Connection newInstance(ConnectionPool pool, Connection cxt) { return (Connection)Proxy.newProxyInstance( PooledConnection.class.getClassLoader(), new Class[] { Connection.class }, new PooledConnection(pool, cxt)); } @Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { // if calling close() or isClosed(), invoke our implementation // otherwise, invoke the passed method on the delegate } private void close() throws SQLException { if (_cxt != null) { _pool.releaseConnection(_cxt); _cxt = null; } } private boolean isClosed() throws SQLException { return (_cxt == null) || (_cxt.isClosed()); } } The most important thing to note is that The So now let's turn our attention to the pool itself, starting with the objects it uses to manage connections. private Queue<Connection> _pool = new LinkedList<Connection>(); private ReferenceQueue<Object> _refQueue = new ReferenceQueue<Object>(); private IdentityHashMap<Object,Connection> _ref2Cxt = new IdentityHashMap<Object,Connection>(); private IdentityHashMap<Connection,Object> _cxt2Ref = new IdentityHashMap<Connection,Object>(); Available connections are initialized when the pool is constructed and stored in As I've said before, the actual database connection will be wrapped in a private synchronized Connection wrapConnection(Connection cxt) { Connection wrapped = PooledConnection.newInstance(this, cxt); PhantomReference<Connection> ref = new PhantomReference<Connection>(wrapped, _refQueue); _cxt2Ref.put(cxt, ref); _ref2Cxt.put(ref, cxt); System.err.println("Acquired connection " + cxt ); return wrapped; } The counterpart of synchronized void releaseConnection(Connection cxt) { Object ref = _cxt2Ref.remove(cxt); _ref2Cxt.remove(ref); _pool.offer(cxt); System.err.println("Released connection " + cxt); } The other variant is called using the phantom reference; it's the “sad path,” followed when the application doesn't remember to close the connection. In this case, all we've got is the phantom reference, and we need to use the mapping to retrieve the actual connection (which is then returned to the pool using the first variant). . private synchronized void releaseConnection(Reference<?> ref) { Connection cxt = _ref2Cxt.remove(ref); if (cxt != null) releaseConnection(cxt); } There is one edge case: what happens if the reference gets enqueued after the application has called OK, you've seen the low-level code, now it's time for the only method that the application will call: public Connection getConnection() throws SQLException { while (true) { synchronized (this) { if (_pool.size() > 0) return wrapConnection(_pool.remove()); } tryWaitingForGarbageCollector(); } } The happy path for Before following that path, I want to talk about synchronization. Clearly, all access to the internal data structures must be synchronized, because multiple threads may attempt to get or return connections concurrently. As long as there are connections in So, what happens if we call private void tryWaitingForGarbageCollector() { try { Reference<?> ref = _refQueue.remove(100); if (ref != null) releaseConnection(ref); } catch (InterruptedException ignored) { // we have to catch this exception, but it provides no information here // a production-quality pool might use it as part of an orderly shutdown } } This function highlights another set of conflicting goals: we don't want to waste time if there aren't any enqueued references, but we also don't want to spin in a tight loop in which we repeatedly check The Trouble with Phantom ReferencesSeveral pages back, I noted that finalizers are not guaranteed to be called. Neither are phantom references, and for the same reasons: if the collector doesn't run, unreachable objects aren't collected, and references to those objects won't be enqueued. Consider a program did nothing but call There are, of course, ways to resolve this problem. One of the simplest is to call That doesn't mean that you should ignore phantom references and just use a finalizer. In the case of a connection pool, for example, you might want to explicitly shut down the pool and close all of the underlying connections. You could do that with finalizers, but would need just as much bookkeeping as with phantom references. In that case, the additional control that you get with references (versus an arbitrary finalization thread) makes them a better choice. A Final Thought: Sometimes You Just Need More MemoryWhile reference objects are a tremendously useful tool to manage your memory consumption, sometimes they're not sufficient and sometimes they're overkill. For example, let's say that you're building a large object graph, containing data that you read from the database. While you could use soft references as a circuit breaker for the read, and weak references to canonicalize that data, ultimately your program requires a certain amount of memory to run. If you can't give it enough to actually accomplish any work, it doesn't matter how robust your error recovery is. Your first response to During development, you should specify a large heap size — 1 Gb or more if you have the physical memory - and pay careful attention to how much memory is actually used ( The bottom line is that you need to understand your applications. A canonicalizing map won't help you if you don't have duplication. Soft references won't help if you expect to execute multi-million row queries on a regular basis. But in the situations where they can be used, reference objects are often life savers. Additional InformationYou can download the sample programs for this article:
The “string canonicalizer” class is available from SourceForge, licensed under Apache 2.0. Sun has many articles on tuning their JVM's memory management. This article is an excellent introduction, and provides links to additional documentation. Brian Goetz has a great column on the IBM developerWorks site, "Java Theory and Practice." A few years ago, he wrote columns on using both soft and weak references. These articles go into depth on some of the topics that I simply skimmed over, such as using 参考http:///index.php?page=java.refobj |
|
来自: ansatsing > 《java高效编程》