OpenPBS 脚本样本 非常值得参考
"Using your account"
"Login without using password"Users can generate an authentication key to login to ABACUS from another UNIX machine without using the password. The authentication key is different for each machine, each pair of machines need to set it up individually. Suppose a user named "foobar" wants to login to ABACUS from another UNIX machine monolith, follow these steps, monolith:~% ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/foobar/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/foobar/.ssh/id_rsa. Your public key has been saved in /home/foobar/.ssh/id_rsa.pub. The key fingerprint is: 0c:44:8c:3e:b9:b4:20:e3:83:4b:19:d9:54:cf:65:35 foobar@monolithPlease note, when the system prompts for passphrase, just enter, don't type any passphrase. monolith:~% cd .ssh monolith:~/.ssh% scp id_rsa.pub abacus:On ABACUS, [foobar@head ~]$ cd .sshIf the file authorized_keys does not already exist, [foobar@head .ssh]$ touch authorized_keys [foobar@head .ssh]$ cat ~/id_rsa.pub >> authorized_keysNow, user foobar can login to ABACUS from monolith without typing the password, monolith:~% ssh foobar@abacus "Using a job queuing system"TORQUE/PBS and Maui were installed on ABACUS for batch processing. The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and delete jobs. It has the following components. Job Server - also called pbs_server provides the basic batch services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job. Job Executor - a daemon (pbs_mom) that actually places the job into execution when it receives a copy of the job from the Job Server, and returns the job's output to the user. Job Scheduler - a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own Scheduler. Maui Scheduler is used on ABACUS. Below are the steps needed to run user job:
PBS Options Below are some of the commonly used PBS options in a job script file. The options start with "#PBS." Option Description ====== =========== #PBS -N MyJob Assigns a job name. The default is the name of PBS job script. #PBS -l nodes=4:ppn=2 The number of nodes and processors per node. #PBS -q queuename Assigns the queue your job will use. #PBS -l walltime=01:00:00 The maximum wall-clock time during which this job can run. #PBS -o mypath/my.out The path and file name for standard output. #PBS -e mypath/my.err The path and file name for standard error. #PBS -j oe Join option that merges the standard error stream with the standard output stream of the job. #PBS -W stagein=file_list Copies the file onto the execution host before the job starts. #PBS -W stageout=file_list Copies the file from the execution host after the job completes. #PBS -m b Sends mail to the user when the job begins. #PBS -m e Sends mail to the user when the job ends. #PBS -m a Sends mail to the user when job aborts (with an error). #PBS -m ba Allows a user to have more than 1 command with the same flag by grouping the messages together on 1 line, else only the last command gets executed. #PBS -r n Indicates that a job should not rerun if it fails. #PBS -V Exports all environment variables to the job.Job Script Example A job script may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options. For example, a simple job script, named geo1.bash, contains the following lines: #!/bin/bash #PBS -l nodes=1:ppn=1 #PBS -V PBS_O_WORKDIR=/home/huang/temp myPROG='/home/huang/software/nwchem-4.7/bin/LINUX64_x86_64/nwchem' myARGS='/home/huang/software/tce-test/geo-0.98.nw' cd $PBS_O_WORKDIR $myPROG $myARGS >& out1An example to run a job in a specific node, contains the following lines: #!/bin/bash #PBS -l nodes=node035:ppn=1 #PBS -V PBS_O_WORKDIR=/home/huang/temp myPROG='/home/huang/software/nwchem-4.7/bin/LINUX64_x86_64/nwchem' myARGS='/home/huang/software/tce-test/geo-0.98.nw' cd $PBS_O_WORKDIR $myPROG $myARGS >& out1Another example, a MPI job scipt, named geo2.bash, contains the following lines: #!/bin/bash #PBS -l nodes=4:ppn=4 #PBS -V NCPUS=16 PBS_O_WORKDIR=/home/huang/temp cd $PBS_O_WORKDIR cat $PBS_NODEFILE > .machinefile myPROG='/home/huang/software/nwchem-4.7/bin/LINUX64_x86_64/nwchem_mpi' myARGS='/home/huang/software/tce-test/geo-0.98.nw' MPIRUN='/opt/mpich.pgi/bin/mpirun' $MPIRUN -np $NCPUS -machinefile .machinefile $myPROG $myARGS >& out2 The above job script templates should be modified for the need of the job. You need to change the contents of the variables PBS_O_WORKDIR, myPROG and myARGS only.
Use the qsub command to submit the job, qsub geo2.bashPBS assigns a job a unique job identifier once it is submitted (e.g. 70.head). This job identifier will be used to monitor status of the job later. After a job has been queued, it is selected for execution based on the time it has been in the queue, wall-clock time limit, and number of processors.
Below are the PBS commands for monitoring a job:
Command Function ======= ======== qstat -a check status of jobs, queues, and the PBS server qstat -f get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc. qdel JobID delete a job from the queue qhold JobID hold a job if it is in the queue qrls JobID release a job from hold There are some quite useful Maui commands for monitoring a job, too: Command Description ======= =========== showq Show a detailed list of submitted jobs showbf Show the free resources (time and processors available) at the moment checkjob JobID Show a detailed description of the job JobID showstart JobID Gives an estimate of the expected started time of the job JobID For example, to check the status of a job, qstat -f 70.head or checkjob 70.head "File backup"File systems on the head node are backed up to tape drives once a week. Incremental backup for the /home file system to another Linux machine is done daily. Users are also encouraged to back up their files to another system or any removable media by themselves for safety. For example, to copy file over to another UNIX/Linux machine, users can use rsync or scp commands. To copy files over to their PCs, users can use 'SSH Secure File Transfer Client'. "Using the cluster in a courteous way"You might be wondering why your jobs are running slowly sometimes. There are numerous possible explanations for abacus's performance. However, the system load and the NFS file system are the two common issues causing the problem.
|
|