[Novalug] a system hw / engineering question
Megan Larko
larkoc at iges.org
Tue Oct 9 13:26:57 EDT 2007
Happy Tuesday Folks,
Although I have built my own assemblage of hardware for spec I have
never actually designed/engineered any computer hw components. That
said, please be patient with me if you choose to respond to this query:
At my job we have several computers running RHEL5 and Fedora Core6.
There is a Luster file system with user data connected to the
master/head nodes as well as to the compute nodes (accessed by users via
the torque scheduler). The network interconnects is infiniband.
A user submits a job via torque to the compute nodes (cn#) requesting 8
processors and 16Gb of RAM. If the requested processors are all on a
single cn, the job fails because it states that it does not have
sufficient memory resources. Each cn has 32Gb of RAM in it. If the
exact same script/code is submitted to two cns, requesting 4 CPUs per cn
and 16Gb memory, the job runs, but it uses more wallclock time per step.
The CPUs involved are Dual Core Opterons, Dual Core Xeon. All of which
are between 2.0GHz and 2.4GHz.
ASCII Diagram of motherboard (Tyan for AMD, Asus for Intel) layout:
--------
--------
--------
--------
| | | | XX
| | | | XX
| | | | XX
| | | | XX
XX XX | | | |
XX XX | | | |
-------- | | | |
-------- | | | |
--------
--------
...where...
dashed lines indicate memory slots (fully populated)
XX symbol indicates CPU hw
I am guessing (really genuinely guessing) that if the users job is using
both parts of a dual-core CPU then its access to memory is coupled to
those DIMMS positioned close to that CPU unit and as such each virtual
CPU of a dual-core, for example, would have to share that memory
resource. If the users job accesses only 1 core of the CPU then that
one core (assuming no other jobs on the box at the time) would have
access to the full population of memory seated next to it. IOW one
core has access to all of my dashed lines and running dual-core one a
single CPU has to share (split??) that dashed line memory access. So
the user is better off using only part---one core---of the CPU and
extending the job over more cn's than trying to run on a multi-core CPU
for an apparently memory-intensive job.
Is this a reasonable guess? Could the problem perhaps lie elsewhere
such as shared L1 and L2 cache on the physical CPUs?
We are considering purchasing a Quad-core Intel 5335 (771 socket) in the
very near future. If I am going to see many job failures because of
insufficient memory errors I may push for no more than dual-core in our
system. Would changing to 2Gb DIMMS and giving a cn 64Gb (max the
boards can recognize) be a reasonable action to pair with the purchase
of a Quad-core processor?
Okay hardware and engineering gurus, strut your stuff!!
Thanks in advance,
megan
More information about the Novalug
mailing list