[Novalug] a system hw / engineering question

Megan Larko larkoc at iges.org
Tue Oct 9 13:26:57 EDT 2007


Happy Tuesday Folks,

Although I have built my own assemblage of hardware for spec I have 
never actually designed/engineered any computer hw components.  That 
said, please be patient with me if you choose to respond to this query:

At my job we have several computers running RHEL5 and Fedora Core6. 
There is a Luster file system with user data connected to the 
master/head nodes as well as to the compute nodes (accessed by users via 
the torque scheduler).  The network interconnects is infiniband.

A user submits a job via torque to the compute nodes (cn#) requesting 8 
processors and 16Gb of RAM.  If the requested processors are all on a 
single cn, the job fails because it states that it does not have 
sufficient memory resources.  Each cn has 32Gb of RAM in it.  If the 
exact same script/code is submitted to two cns, requesting 4 CPUs per cn 
and 16Gb memory, the job runs, but it uses more wallclock time per step.

The CPUs involved are Dual Core Opterons, Dual Core Xeon.  All of which 
are between 2.0GHz and 2.4GHz.

ASCII Diagram of motherboard (Tyan for AMD, Asus for Intel) layout:

                                       --------
                                       --------
                                       --------
                                       --------
    | | | |                             XX
    | | | |                             XX
    | | | |  XX
    | | | |  XX

             XX                    XX  | | | |
             XX                    XX  | | | |
       --------                        | | | |
       --------                        | | | |
       --------
       --------

...where...
  dashed lines indicate memory slots (fully populated)
  XX symbol indicates CPU hw

I am guessing (really genuinely guessing) that if the users job is using 
  both parts of a dual-core CPU then its access to memory is coupled to 
those DIMMS positioned close to that CPU unit and as such each virtual 
CPU of a dual-core, for example, would have to share that memory 
resource.  If the users job accesses only 1 core of the CPU then that 
one core (assuming no other jobs on the box at the time) would have 
access to the full population of memory seated next to  it.  IOW one 
core has access to all of my dashed lines and running dual-core one a 
single CPU has to share (split??) that dashed line memory access.  So 
the user is better off using only part---one core---of the CPU and 
extending the job over more cn's than trying to run on a multi-core CPU 
for an apparently memory-intensive job.

Is this a reasonable guess?   Could the problem perhaps lie elsewhere 
such as shared L1 and L2 cache on the physical CPUs?

We are considering purchasing a Quad-core Intel 5335 (771 socket) in the 
very near future.   If I am going to see many job failures because of 
insufficient memory errors I may push for no more than dual-core in our 
system.   Would changing to 2Gb DIMMS and giving a cn 64Gb (max the 
boards can recognize) be a reasonable action to pair with the purchase 
of a Quad-core processor?

Okay hardware and engineering gurus, strut your stuff!!

Thanks in advance,
megan




More information about the Novalug mailing list