I was working with NetBeans the other day, debugging some of my
own code. I had been running the debugger repeatedly, tracing through
the code over and again. NetBeans gradually slowed down, and stepping
through lines of code became downright painful. I finally stopped and
restarted it.
(I wrote this last Spring, in March 2004 I think, but never posted it because I hadn’t set up a blog on my site. My current kind and convenient hosts don’t have it set up, I didn’t want to futz so…older stuff goes up for now, till I am caught up. Hence the old NB release referenced here.)
Some time later, maybe the next day, I thought maybe this was due to
a memory limitation–that perhaps NB was running out of memory and the
garbage collector was working overtime trying to free up some small
portion near the top that wasn’t spoken for. Sure enough, some time in
the past I had capped the memory at 96M in my configuration settings.
Increasing it to 128M cleared the problem, at least for the time being.
I’m not writing to bag on NB. It’s a good tool–the 3.6 release is
particularly good–and I use it, as well as jEdit, regularly. What I
started to think about was the problem with memory usage in Java
applications, particularly GUI applications using the Swing toolkit.
The thought I had was that, one the one hand, Java frees us from
worrying about memory management, which is great. On the other hand,
perhaps this apparent freedom leads us to treat memory too
casually–particularly, as an indifferentiated, massive pool from which
we can draw without restraint.
If you are reading this blog, you probably don’t need to be told of
the advantages of memory management in Java. But just to point out–we
have safe access to arrays, protection from invalid pointer references,
automatic memory allocation and deallocation. This is all good. It
seems to save a lot of time. The garbage collectors have gotten much
better over the last few years, and the dreaded, “GC! Stop and put down
your weapons!” pause is less noticeable even in complex GUI apps these
days.
On the other hand, we now create programs within a safe
environment–an environment which, while more simple and less dangerous
than where we used to live, also insulates us from the realities of the
world our programs run in. These realities cannot be done away with
just by locking the door and taping the windows. There are costs to
using memory on most operating systems that most programmers develop
for these days. There is a cost, in time, to allocate memory, track it
and to release it; there are actual limitation to physical memory and
virtual memory on disk is incredibly expensive to use
(performance-wise). But we are isolated from these costs because in the
sandbox where we live, they are invisible. We have to look for them,
test for them, probe for their presence. When they stick their ugly
head up, as happened with me, finding out the root cause of the problem
is almost pointless, given the growing complexity of our applications.
Not impossible, just very time consuming.
What makes this even worse is that in using Java, a great deal of
our power comes not from the expressive power of the language itself,
but from the large, and growing, library of packages we have access to.
This includes not only the large and impressive JDK, but a multitude of
free software, open source and commercial packages we may end up
pulling into our projects. It takes time enough to find and learn how
to use any of these packages. In almost all of those I can think of,
there is nothing in the documentation related to how much memory any
given class will use, and, by extension, no information about how much
a combination of classes in that package will use. I have a 10KB XML
file. How much memory will it take when loaded into an XML DOM? What
about using toolkit X versus toolkit Y? What about if I include or
exclude comments from the XML file? Do I have options for reducing the
footprint at all? In my experience, we just don’t know. At best we
might get a general comment on a toolkit’s readme, something like,
“Memory footprint reduced 10% in this release.”
And yet the tools for figuring this out, for making the problem
visible, are employed after the fact–memory profilers. Some of the new
information available from the VM in JDK 1.5 might make this process a
little easier.
I suggest the root problem is that inherent in the language design
is a message: don’t worry about it. Don’t worry about how memory is
allocated, by which process, using which API call in the O/S. Don’t
worry about how much space a given class will take. Just start coding.
When you need an object, just instantiate the class. Once your program
is written, you can handle major memory problems by just increasing the
size of the heap, or by running a profiler and punishing the worst
offenders.
To make matters worse, in current coding trends programmers often
recommend that we use caching to improve performance across an
application. So not only do we not know how much memory we are
using–we grab it and hold on to it for the long term. I suspect this
is design feature of NB that aggravated the problem. The garbage
collector was doing its job–there just was barely any memory it could
free up, because most of it was spoken for, and would not be released.
So I was thinking of this as a modern version of “The Tragedy of the
Commons”. I won’t repeat that chestnut here. The point is that memory
is a shared resource, in two senses. First, it is shared between
programs running on your PC, and with the OS itself. Second, it is
shared between you (writing your program), other people you are coding
with, and every person who had a hand in all those libraries and
toolkits you are using. All of them are drawing from the same pool of
memory. All of them are acting as if, in general, there was no real
cost to using that memory. And even if they did think about it, we
probably don’t know anyway.
My general thought here is not that there is a fundamental problem
in Java’s memory management model, just that it gives us a false sense
of complete isolation and freedom. We are not completely isolated or
completely free from worrying about memory. It’s similar to what Joel
Spolsky calls “leaky abstractions”. JDBC doesn’t isolate us from
differences in database engines. You could write completely generic SQL
using JDBC (I think) but in the real world you find you can’t–you have
to optimize for access paths to tables, just to take one example, and
that may take advantages of indexing features available on one database
platform but not another. Your code is pretty portable, but will run
differently on another RDBMS, because the JDBC abstraction is “leaky”:
in this case, the underlying RDBMS shows through in how the application
performs.
So I think the danger is this illusion of freedom. What’s not clear
to me is if this is a problem that would be less prevalent if my code
were sprinkled with calls to allocate and deallocate memory (thus
reminding me of what I was using).
I’m not sure, actually, I have few ideas, as I write this, for how
to properly name the problem, much less to suggest an alternative or a
solution. The problems seem to include:
- Without extensive runtime profiling, I have no idea how much memory
my application will require at startup, using various configuration
parameters, on different JVMs, etc. - Without extensive runtime profiling, I have no idea how much memory
classes, or combinations of classes, will use. That includes my own
classes and those in many other packages I reference in my application,
as well as all the classes referenced indirectly by those packages I
know nothing about. - I have few options for controlling memory use once I find a
problem. Repeat after me: “I will not recode javax.swing.text to be
more memory efficient, I will not recode…” and so on. - If I do accurately profile an application–or, if I do profile a
single class–I have no idea how much of the memory use is data
dependent, and how much is data independent.
and so on.
Finally, it’s possible that even if I did have all this information,
it’s mostly pointless. Outside of very memory-constrained applications,
we usually don’t care unless our users are impacted (by being able to
run fewer apps side by side without performance degradation due to GC
and virtual memory swapping), or unless we are in danger of running out
of memory.
(originally posted as this JRoller entry)