RCNP supercomputer/Users' guide - Update 3 Oct 2003

RCNP Supercomputer Users' Guide

Ver.1.0 [3 Oct 2003, H.Matsufuru]

[Japanese]

This guide summarizes fundamentals to use RCNP supercomputer system.

Related issues:


Structure of supercomputer system

The supercomputer system NEC SX-5 is managed in collaboration with Cybermedia Center (CMC) and Institute of Laser Engineering (ILE). Among total 8 nodes, 6 nodes are set in CMC, one node is set both in RCNP and in ILE. Users belonging to RCNP can use the node in RCNP (RCNP-node) and 1/6 of total budget for 6 nodes in CMC.

One node contains 16 processors. The resources for RCNP are categorized as follows.


Login

For cooperative management with CMC and ILE, the account name of RCNP users are of the form rcnpXXXX (XXXX is four-digits number). One can know the corresponding username on RCNP generic machine (senri) by a command sx2name:

   %  sx2name rcnpXXXX

The RCNP user can choose to login front-end machines or interactive processors of RCNP node (sx57). You are recommended to use the frontend machine to decrease the task of vector processors.

On these machines, you can develop programs, compile, submit a job-queue. In the both cases, the home directory is common and physically locates on the local disk of RCNP node.

Front-end machines (CMC):
First you access common host for login:

% ssh -l rcnpXXXX login.hpc.cmc.osaka-u.ac.jp
After password is certified, you are asked to type a system you login. Please choose front02 or front03.
These machines have development environment such as a cross-compiler analyzers, and you can also submit jobs.

RCNP node (interactive) = sx57:
Hostname: sx5.rcnp.osaka-u.ac.jp These CPU can also be used for test class jobs. The cross-compiler on the front-end is faster than the self-compiler on the SX-5. You can only login via generic computer system of RCNP (senri/saho).

For access both the frontend and RCNP generic systems, you need to use ssh from outside networks.

You can also access to the files via NFS from RCNP generic system (senri).
On senri, your sx5 home and work directories are mounted as /home.sx5/rcnpXXXX and /work.sx5/rcnpXXXX, respectively. (For each disk, see the next section.)


Changing password

To change your password at super computer, use yppasswd command on SX57.


Disk systems

Following disk systems are available for RCNP users.

MP(sx57) MP(cmc) MP(ibm) hosted speed(sx57) speed(cmc) size quota comment
/sx/rcnp/home /sx/rcnp/home /home.super sx57 high mid 1TB 5GB/user
/sx/rcnp/work /sx/rcnp/work /work.super sx57 high mid 2TB none
/sx/rcnp/data
/data.super ibm slow
3TB none
/sx/rcnp/data2
/data2.super fss slow
5.5TB none
/sx/rcnp/data3 /data3.super fss slow 2.3TB none only for Kanazawa-U
/sxshort/rcnp /sxshort/rcnp
CMC mid high 8TB none

/sxshort/rcnp is common for all the users of three centers, and the data have life time of 2 weeks.

Below the above directories, directories for each users are created (rcnpXXXX).

Since all the systems are common resources for the users, please delete unnecessary files quickly.

The cluster size of /sx/rcnp/work is 4 MB. Since the file size is measured in units of 4 MB, whatever the file size is small, it uses at least 4 MB automatically. Please keep small files on home directory or disks on senri. Even on the home disk, cluster size is not so small. You are recommended to keep your files on senri or bind them by tar-command.

Some of disks (see above table) can be accessed from RCNP generic system senri via NFS. In contrary, the file systems on senri cannot be accessed from SX.


Compile

Programs are compiled by the self-compiler on sx57 (interactive mode), or by cross-compiler on the frontend machines.

The use of cross-compiler is recommended. Please use the self-compiler just for verifying the consistency of results or in the cases of compiler bugs. Compiler on sx57 is slow, and thus obstruct other jobs.

Available languages are FORTRAN90, HPF, C, C++. Please notice that FORTRAN77 is not supported.

For the parallel programming, MPI is available for more than 16 processors, and MPI, HPF, OpenMP are available for less than or equal to 16 processors (inside one node).

Fortran90:

Compile commands are f90 on sx57, and sxf90 on the frontend. For other languages, or automatic parallelizations, please access the manuals at CMC sites.

MPI:

Compile commands are mpif90 on sx57, and sxmpif90 on the frontend.


Queue system

RCNP-node

All queue classes are for single processor computation only.

queue CPU Time memory (default/max) notice
RS 1-2 5min 2/16 common with CPUs for interactive mode
RM 14-16 2 hours 2/8 -
RL 14-16 10 hours 2/32 -

There is no restriction for number of jobs for each user.

The job classes RM and RL share 14-16 CPUs. These two job classes use the allocated processor exclusively until the whole computation is finished. The system selects number of jobs by watching the performance of them, and hence sometimes jobs more than number of CPU are running.

The total memory size is 128 GB. If you use large memory size, the priority of your job may be lowered. You are recommended to specify the memory size to be used, if possible. The size of your program can be measured with the size command for a rough estimate, and after execution of SX-5, the precise size appears in the log file of your run.

Including the default value, once the memory size is specified by the job, queue system reserves that size of memory area. If you specify too large value, succeeding jobs may not able to start for ``the lack of memory''. Therefore please specify an appropriate size. Of course if the job uses memory more than specified, the job fails to run.

CMC-node

Parallel computation is done on CMC queue class. Please refer the pages of CMC.

Queuing jobs

The followings are written based on NQS II working since August 2003. The command options and scripts based on the previous NQS also work if you use for example QSTAT (capital) instead of qstat.

To submit execution scripts to NQS, one uses qsub command on the frontends or sx57:

% qsub script.sh
where script.sh is an execution script file.

How the jobs are working can be checked by the commands qstat or jobr. The latter is convenient to watch the status of all the submitted jobs.

To kill the submitted jobs, qdel command is used:

% qdel job-id
job-id is the ID number of submitted job and found by qstat or jobr commands. For example, job-id submitted to RCNP node is in the form XXXXX.rcnp (XXXXX is some number).

Fair-share scheduling

To carry out a fair execution of jobs for each users, ``fair-share'' scheduling is introduced. The system determines the priority of a submitted job according to the resources used by the user until then. Users who have not used so much in a certain past period are given high priority.


Example of execution script

The following is an example of execution file to be submitted to NQS. This is the case for submission to RM_rcnp class.

-------------------------------------------------------------------

#!/bin/csh
#PBS -q RM@rcnp <-- class of queue
#PBS -o data <-- output log-file
#PBS -l memsz_job=1gb,cputim_job=2:00:00 <-- maximum memory size, CPU time

setenv F_FILEINF YES <-- To get information of file I/O
setenv F_PROGINF YES <-- To get information of run of program

cd ~/WORKING_DIR/ <-- specify working directory

./a.out

-------------------------------------------------------------------
Right of ``<--'' are comments. 1st--3rd lines can also specified by options of qsub command at submission.

If you use memory size more than the default setting, it must be specified.

The ``-l'' option, #PBS -l memsz_job=1gb etc., must be written in single line.

Since the setting for getting program information does not disturb the performance, it is convenient to turn on.


Library, applications

Please access other documents at web-sites of RCNP or CMC.


Consulting service

The help desk is open, but since inquiry should be written in Japanese, if you prefer to use English, please use consult below.
URL of help desk:

http://helpdesk.center.osaka-u.ac.jp/

Concerning RCNP equipments and network, as the generic system of RCNP, you can submit your question to

consult@rcnp.osaka-u.ac.jp
by E-mail. Some of inquiry submitted to consult might be asked to resubmitted to the help desk.

Update history


Computer and Network Group, Research Center for Nuclear Physics, Osaka University 2003