RZ AMD
Hardware
- 256 total CPUs and 512 GB total RAM:
- 64 Sun Fire V40z
linuxoc00-63@ 4 2.2 GHz Opteron 848 8 GB
- 64 Sun Fire V40z
Links
- Rechenzentrum: primer • queues
- Status: overall load • individual loads
- Reviews: LinuxWorld • Anandtech
Sample SGE batch job file
#!/usr/bin/ksh
#$ -l ostype=linux
#$ -pe mpi_linux_amd64_v40z 8
#$ -l h_rt=0:05:00 -l h_vmem=400M
#$ -o run.qout -j y -N test
. mpich.init -arch LINUX32
cd $SGE_O_WORKDIR
mpirun -np $NSLOTS -machinefile ${SGE_MACHINES}.mpich $HOME/bin/i386/xns
Parallel environments
A list of available parallel environments can be obtained with qconf -spl. Especially interesting is mpi_linux_amd64_v40z_je4tasks which forces SGE to allocate only sets of 4 CPUs comprising a node. That can increase queueing time but reduce competition with other jobs.
Growing pains
XNS is not very well-behaved on the AMD cluster, crashing randomly with messages resembling:
p0_2336: (175.094079) net_recv failed for fd = 12 p0_2336: p4_error: net_recv read, errno = : 104




