nqsrn(1)
NQSRN - Network Queueing System Release 3.36 Release Notes.
DESCRIPTION
The Network Queueing System (NQS) provides the means to
submit batch jobs to local and remote Unix and VMS machines.
This is the release notes for Release 3.36 of NQS at
Monsanto, and includes information on all changes made to
NQS since the release of Version 3.35. The release notes
for earlier versions of NQS at Monsanto are appended to this
document.
Version 3.36 consists of several small fixes and
enhancements to release 3.35. Here are the major changes
since release 3.35:
This version of NQS now supports Solaris. Thanks for the
help from Don Hjort (dhjort@fhcrc.org).
There is now some support for compute servers of varying
compute power. Previously the load balancing code assumed
that the servers all had the same compute power. Now it is
possible through the "help set server" command in qmgr to
define the relative performance of servers to the scheduler.
The scheduler will then take that information into account
when assigning jobs to servers. See the qmgr man pages for
more information.
I received more fixes to the DEC OSF/1 code from Bruno Wolff
III (bruno@alpha2.csd.uwm.edu) which I have added to the
code.
A fix was made to correct problems relating to interpreting
times and the Daylight time / Standard time switch. It was
discovered in the fall of 1993 when the date of the switch
did not match when NQS thought it should be. This release
will be out in time to see how it works in 1994.
This version now supports IRIX 5.2 and this is the default
version.
The code for determining if an inbound load balanced queue
should reject or permit a request has been improved.
The qalter utility has been added. See the man page for
more information.
When the nqsdaemon sends a broadcast message, it now sets up
the broadcast child as its own session and does not wait for
it.
The -S switch has been added to qacct to create summary
The printing of debug messages has been reviewed so that no
messages are printed if the debug level is set to 0. Of
course, the cure for exessive log messages is to set
/dev/null as the log file.
A file descriptor leak was fixed by changing establish.c so
that the socket is closed if there are any errors.
The QSUB_QUEUE environment variable has been added which
provides the name of the execution queue to the running
request.
NQS Version 3.35 Release Notes
The main enhancement in this release is the ability to stage
a release of NQS. A common problem is that long running NQS
jobs prevent upgrading to a newer version easily. This
feature allows one to place a new release of NQS in a
staging area. NQS checks when a job completes to determine
if there is no other activity, and if not, will shut itself
down, install the new version and restart itself. This is
useful when it is not urgent to install a new release of
NQS, and one does not want to disable the queues.
Here is a summary of other changes since Version 3.34:
The makefiles were reorganized, with the site-specific
defines at the top and the architecture-specific defines
further down.
Qmgr set pipe_client now checks to see if the pipeclient
file exists and is a regular file and will not allow you to
change it if it isn't.
Fixed a small, but embarassing, security hole in qdel.
The $(NQS_SPOOL)/private directory has been made mode 700
rather than 755. This increases the security of the NQS
spool area. For existing sites you should manually do a
chmod on this directory.
Added the -u flag for qcat to indicate the owner of the
request.
The snap file is now written to the current working
directory unless an absolute path is given.
The entry for the qmgr command "delete request" was removed,
since it didn't work anyway.
Determined by trial and error the maximum CPU time allowed
If attempt to set a limit higher than this, the message
"Overflow or semantic error in cpu time limit." will be
returned.
Remote request completion message to the NQS scheduler are
now not checked for a valid username, as not all users will
have accounts on the scheduler machine.
Fixed bugs with processing load information and setting
scheduler for mids greater than 0x7fffffff. Thanks to
Ryoichi Shibano (shibano@bsd2.kbnes.nec.co.jp).
NQS Version 3.34 Release Notes
The main changes in this release is the addition of two
enhancements supplied by Chuck Keagle at Boeing, The first
is the ability to assign Machine IDs implicitly based on the
IP address of the hosts. This makes it easier to do
distributed management of NQS. The second change makes it
easier to install the parts of NQS into locations other than
the standard (/usr/lib/nqs, /usr/spool/nqs, etc.). Thanks
to Chuck and Boeing for making these enhancements available.
Other changes include:
Added support for DEC's OSF for DECstation AXP machines
courtesy of Claus Kalle, University of Cologne
(Kalle@rrz.Uni-Koeln.DE).
The problem with IRIX and NIS has been fixed, so that one no
longer has to link against old versions of the IRIX
libraries.
Added the ability to indicate a token in the sdout and/or
stderr name of the request which would be replaced by the
request number. This token is defined as "#".
Fixed the problem which occured that when the request name
started with a digit qsub would drop the last character of
the request name.
Added the "unimplemented feature" of qresume/qsuspend -a.
Changed the utilities to print out the NQS version, rather
than a version of their own.
Documented nmapmgr by creating nmapmgr.1m.
Documented qmsg by creating qmsg.1.
Changed nmapmgr to run the CREATE command implicitly if the
database has not already been created.
Tom Schwab of Gesellschaft fuer Schwerionenforschung mbH
(schwab@rzri6b.gsi.de).
Qstat now returns 0 if all queues are found, and 1 if there
is any error.
Qstat @local-machine gets the status of the local machine
instead of returning the error "Queue: @local-machine does
not exist"
Accounting file now records the request name.
There have been several changes to qacct: enabled the -a
(after), -b (before), -i (input file), and -o (output file)
switches. Added request name to the Request Start message
and the report listing. Added the -s (summary) and -r
(rollover) processing.
NQS now does not count queued requests when deciding whether
to allow a request to be transferred when a queue is load
balanced inbound.
Fixed bug where excess TCP/IP ports would be kept open after
sending messages to the NQS Scheduler.
Time zones with long names were causing problems at certain
locations, so the default time zone buffer in nqs_boot.c was
increased in size.
NQS Version 3.31 Release Notes
Changes to User Commands
New utility -- Qcat
The Qcat utility has been provided which allows one to
view the spooled NQS log file, error file, or input
script file.
New utility -- Qacct
The Qacct utility prints out the NQS accounting
information in a summary format.
Qsub changes
Requests now default to not restartable. The -rs
switch has been added to indicate that a request is
restartable.
Qstat changes
Qstat -l now indicates whether a job is restartable or
recoverable.
Each user can specify a list of systems to check when
running the "qstat -d" command. Qstat now checks for a
file in the user's home directory called .qstat with
If present, this file will supercede the system-wide
file.
Qlimit changes
Qlimit now can print limit information for remote
systems and the -v switch has been added to print out
the version number.
Broadcast messages
A user can override the system default list of systems
to which the broadcast message is sent. The request
termination processing code now first checks for a file
in the user's home directory called .nqs-domain with
the same format as the /usr/lib/nqs/nqs-domain file.
If this file is present in the user's home directory on
the execution machine, broadcast messages will be sent
to this list of systems rather than the default list.
Changes to System Manager Commands
Accounting changes
NQS now writes startup and shutdown messages to the
accounting log. The request completion message now
includes the completion time.
Logfile handling changes
The log file /usr/lib/nqs/logfile is no longer
automatically created when NQS starts up. The logfile
is not truncated when opened, so that successive
restarts will append to the logfile. If this is not
the desired behavior, you must manually delete the
logfile before starting up NQS. A sample script is
provided in support/nqs_log_rollover to roll over the
NQS log files daily and delete the files older than 7
days.
Load Balancing Changes
Load Balanced outbound processing was fixed to round
robin by position in destination list, not by machine
ID. Load Balance inbound processing was modified to
consider complex user limits and complex run limits.
Scheduling Changes
The NQS scheduling algorithms have been changed. The
scheduler now checks the list of waiting jobs before
any jobs are scheduled. Previously, waiting jobs had
to be triggered by a timer, which meant that the system
may be idle waiting for the timer to expire. Since NQS
now distinquishes between jobs actually waiting for a
certain time and jobs waiting for resources, it can
schedule jobs more promptly with less waste of system
resources.
the node that does the scheduling for all the nodes in
the NQS "cluster". This node should do the load
balancing for the other nodes. It is defined by setting
the scheduler using the new Qmgr command "set
scheduler". The system will start up a new NQS daemon
called the loaddaemon which will send load information
to the scheduler node at intervals. The interval
defaults to 3 minutes, but can be modified using the
"set default load_interval" Qmgr command.
There has been the extension of the NQS network
protocol to add two new message types. The first is
the remote request completion message and the second is
the load message. These are directed to the scheduler,
so as long as a scheduler is not defined, the protocol
is the same as earlier versions.
Delivery of Jobs to Pipe Queue Destinations
The way jobs are delivered to pipe queue destinations
may be confusing. Here is a short explanation:
If a pipe queue has multiple destinations but is not
load balanced outbound, the same destination will be
chosen every time. This destination will be the queue
on the system with the lowest mid. This may or may not
be the first destination in the list when you do a
qstat -x.
If a pipe queue has multiple destinations and is load
balanced outbound but is not on the scheduler machine,
then the destinations will be chosen using a round
robin algorithm. The sequence number of the request
will be divided by the number of destinations, and the
remainder is the index into the list of the
destinations.
If the pipe queue has multiple destinations, is load
balanced outbound, and is on the scheduler machine,
then the load information is taken into account when
the job is attempted to be delivered to a machine.
Assuming that load information is available for all of
the machines, the destinations are ordered by load
(number of jobs and secondarily the 5 minute load
average) and the job is attempted to be delivered to
the "lightest loaded" machine.
The load balancing code is admittedly elementary. It
does not distinguish processor speed, memory, or any
other characteristic of machines. In addition, the
scalability of the implementation has not been tested.
The compute cluster used here is on the order of 4-10
environment is probably not great. I would hesitate to
guess for performance for clusters of 20 or more
machines served by a single scheduler.
Check for a default path
Under some circumstances the default path set up by NQS
for a request may not be appropriate. The NQS Shepard
processing has been changed to check for a file called
/etc/environment to set up the default path. If there
is a file /etc/environment (present in AIX 3.2), and it
has a line indicating the path with the format:
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin
then this path will be used for the requests.
Platforms supported
HP 9000/700 series machines running HPUX systems are
supported in this release. Sun support seems to be
improved, but I don't have reliable access to a Sun
machine, so I cannot be sure.
New qmgr commands
Here are the new qmgr commands added in 3.31:
set scheduler <node>
This command sets the scheduler to the indicated
node. This means that request completion messages
and load messages (if the load daemon is running)
will be sent to this system.
set no_scheduler
This command indicates that there is no scheduler.
set default load_interval
This command indicates the number of minutes
between load messages sent to the scheduler.
set load daemon=(file-spec)
This command indicates the full file specification
for the load daemon.
set no_load_daemon
This command indicates that there is no load
daemon specified.
start load_dameon
This command indicates that the load daemon is to
be started, and is to send load messages to the
scheduler indicated.
stop load_daemon
This command terminates the load daemon.
This command causes qmgr to echo every line of
input. Its primary use is to echo command lines
in regression testing scrips.
Bug Fixes
Fixed problem with incorrect start times with qstat -m.
Fixed problem on SGI where qstat local and remote showed
different start times.
Fixed bugs in Qsuspend / Qresume processing
Fixed the queue ordering for waiting jobs to strictly FIFO.
When running the test version, give the processes the
appropriate names (This does not work on SGI platforms).
An NQS manager can now suspend or resume other's jobs.
NQS now tolerates whitespace between the hostname and
username in the .rhosts file.
NQS Version 3.20 Release Notes
An additional concept has been added to this version of NQS.
This is concept of an NQS "domain". An NQS domain is all of
the machines which submit NQS jobs among each other. It
includes the compute servers and the personal workstations
of a group. This concept has been utilized in the use of
broadcast messages, as indicated below.
Here is a list of the changes and enhancements:
Queue complexes
NQS managers can now associate several queues and
create a user run limit and a total run limit for the
queues as a group.
Move queue and requests
NQS managers can move one or more queued requests from
one queue to another. In the same way they can move
all queued requests in a queue to another queue.
Hold and Release requests
Queued requests can be held, which means that they will
not be scheduled to run until manually released. Qhold
and Qrls are the new commands which implement these
functions.
Suspend and Resume requests
On SGI computers one can suspend running requests and
Qresume are the new commands which implement these
functions.
Modify requests
NQS managers can modify the time limit, nice value, and
request memory limit of queued requests using Qmgr.
Load balancing
Simple load balancing has been implemented by allowing
pipe queues to be designated load-balanced inbound or
outbound. Load-balanced inbound means that the pipe
queue will not accept requests unless it can
immediately run in one of its destination queues.
Load-balanced outbound means that the pipe queue will
round-robin on its destinations rather than always
selecting the first destination in the list.
Log file return
NQS will make every effort to return the log file. If
it cannot return the file to the designated location,
NQS will try to deliver the log file to the user's home
directory on the execution machine. If NQS cannot
deliver the file there, it will move it to the
directory /usr/spool/nqs/dump/<username> and send a
mail message to that effect. The user will have to
move the log file out manually.
Qstat changes
There is a new Qstat switch, -c. This prints out
information on queue complexes. In addition, the sense
of the -d switch is changed slightly. It now checks
all of the queues in the NQS domain rather than all
systems which are destinations for the pipe queues.
Qsub changes
There are two new Qsub commands, -bb and -be, which
indicate that a broadcast message is to be written to
the user's terminal(s) when the job begins and ends,
respectively. The message indicates the time, the
request, and the system on which it is running. The
message is delivered to all terminals within the NQS
domain. If one does not want the message displayed on
a particular window, use the mesg -n command in that
window to turn the messages off.
Qmgr changes
Qmgr now accepts single commands from the command line
and returns to the shell. Qmgr has numerous command
additions. Notable is the addition of the snap
function to print out the current NQS configuration,
and start nqs which starts up NQS. The Qmgr help screen
New Qmgr commands are
Create complex
Delete complex
Set complex run_limit
Set complex user_limit
Move request
Move queue
Modify request
Set global_pipe_limit
Set queue lb_in
Set queue lb_out
Set queue pipeonly
Set queue nolb_in
Set queue nolb_out
Set queue nopipeonly
Snap
Start nqs
NQS Version 3.0 Release Notes
There are several changes which are general and apply to all
parts of NQS. These are:
1. Queue names can be a maximum of 31 characters in size.
2. Request names can be a maximum of 63 characters in
size.
3. Several new limits have been added to queues. For
further information, see the Qmgr and Qsub descriptions
below. The new limits are:
Corefile limit
Data limit
Nice value limit
Per process cpu limit
Per process memory limit
Per process permfile limit
Stack limit
Working set limit
4. The -v switch has been added to utilities to print
their current version.
Information on changes to specific programs follow. The man
pages for these uilities have been updated to indicate the
changes.
The main change to Qdel is the addition of the -r and -c
switches. The -r switch allows you to indicate a request
name pattern to delete. The -c switch, which is only valid
with -r, indicates that the user is to be prompted to
confirm each deletion. Qdel will print the name of the
request matching the pattern and query the user whether the
request is to be deleted or not, or to quit.
Qsub
Qsub has been enhanced with the -d switch, which indicates
that the script file is to be deleted after spooling. This
is valuable for temporary scripts that can easily be
recreated.
In addition, the switches associated with the new limits
above are no effective. The Qsub switches and their
corresponding limits are as follows:
-lc Corefile limit
-ld Data limit
-ln Nice value limit
-lt Per process cpu limit
-lm Per process memory limit
-ls Per process permfile limit
-lw Working set limit
Qstat
Qstat has been extensively changed from the original Cosmic
release. A new summary format has been made the default,
and several additional switches have been added to indicate
which requests are to be displayed. See the man pages on
Qstat for further information.
Qmgr
The Qmgr utility has been modified in several ways. Here
are the changes, and the commands that are affected:
1. Set a user limit for a queue (create batch_queue,
set user_limit). The allows the system manager to
set a limit on the number of jobs that an
individual user can run in a queue at the same
time.
2. Set a non-degrading priority on a queue (set ndp).
This sets a non-degrading priority on a queue to
determine the effective priority of requests in
that queue.
3. Set a global batch limit (set global_batch_limit).
This determines the maximum number of jobs that
in queues
4. The commands which modify the above limits on requests
are now effective. The commands are:
set corefile_limit=(limit) queue
set data_limit=(limit) queue
set nice_value_limit=(nice-value) queue
set per_process cpu_limit=(limit) queue
set per_process memory_limit=(limit) queue
set per_process permfile_limit=(limit) queue
set working_set_limit=(limit) queue
SEE ALSO
nqs(1), nqsconfig(1), nqsgs(1), qacct(1), qcat(1), qdel(1),
qdev(1), qhold(1), qlimit(1), qmgr(1), qpr(1), qresume(1),
qrls(1), qstat(1), qsub(1), and qsuspend(1) in the NPSN UNIX
System Administrator Reference Manual.
NQS HISTORY
Origin: Sterling Software Incorporated
May 1986 - Brent Kingsbury, Sterling Software
Original release.
August, 1994 - John Roman, Monsanto Company
Release 3.36.