When running Java-based jobs (or other multi-threaded jobs) on the physnodes, you may encounter problems with memory allocation - the JVM fails to start up, reporting a lack of memory. This is not related to the actual memory usage of your program, but to the way in which memory allocation is handled by the Sun Grid Engine. The problem can be solved by suitable configuration options.
Memory allocation in Java
The Java Virtual Machine (JVM) and the underlying C library, glibc, allocate a potentially large amount of virtual memory on startup. Note that this is virtual memory (allocated address space) and not resident memory (actual usage of RAM). On 64-bit machines, virtual memory is normally a very cheap resource. However, the physnodes restricts jobs in their use of virtual memory, not resident memory; hence virtual memory allocation needs to be controlled.
One main point is the Java heap, i.e., the main area where Java objects are stored. The JVM allocates virtual memory for the maximum allowed heap space on startup.The default value for this is determined by the total amount of RAM in the machine, which is not a useful choice in a cluster environment. Rather, the maximum heap size should be specified explicitly with the "-Xmx=..." startup parameter.
Another (maybe unexpected) item is address allocation for threads. For every thread created, glibc allocates an address space of 64 GB ("arena"). Java internally creates a number of threads on startup: ~10 for internal use, and further threads for the garbage collector (can be >= 10 depending on the number of processor cores). This leads to a large allocation of virtual memory. To limit this effect, consider:
- Setting the environment variable MALLOC_ARENA_MAX=4. This affects glibc and will limit the number of "arenas" allocated.
- Limiting the number of GC threads, using the "-XX:ParallelGCThreads=..." startup parameter.
An example job descriptor file
The following job decriptor file would be suitable for a non-memory intensive, non-threaded Java job (max.100 MByte heap space).