UPSMON(8)
=========

NAME
----

upsmon - UPS monitor and shutdown controller

SYNOPSIS
--------

*upsmon* -h

*upsmon* -c 'command' [-P 'pid']

*upsmon* [-D] [-F | -B] [-K] [-p] [-u 'user']

DESCRIPTION
-----------

*upsmon* is the client process that is responsible for the most important part
of UPS monitoring -- shutting down the system when the power goes out.  It
can call out to other helper programs for notification purposes during
power events.

upsmon can monitor multiple systems using a single process.  Every UPS
that is defined in the linkman:upsmon.conf[5] configuration file is assigned
a power value and a type (*primary* or *secondary*).

OPTIONS
-------

*-c* 'command'::
Send the command 'command' to the existing upsmon process.  Valid
commands are:

*fsd*;; shutdown all primary-mode UPSes (use with caution)

*stop*;; stop monitoring and exit

*reload*;; reread linkman:upsmon.conf[5] configuration file.  See
"reloading nuances" below if this doesn't work.

*-P* 'pid'::
Send the command signal above using specified PID number, rather than
consulting the PID file.  This can help define service units which
start main `upsmon` as a foreground process so it does not have to
rely on a PID file.

*-D*::
Raise the debugging level.  upsmon will run in the foreground by default,
and will print information on stdout about the monitoring process.
Use this option multiple times for more details.

*-F*::
upsmon will run in the foreground, regardless of debugging settings.

*-B*::
upsmon will run in the background, regardless of debugging settings.

*-K*::
Test for the shutdown flag.  If it exists and contains the magic string
from upsmon, then upsmon will exit with `EXIT_SUCCESS`.  Any other condition
will make upsmon exit with `EXIT_FAILURE`.
+
You can test for a successful exit from `upsmon -K` in your shutdown
scripts to know when to call linkman:upsdrvctl[8] to shut down the UPS.

*-p*::
Run privileged all the time.  Normally upsmon will split into two
processes.  The majority of the code runs as an unprivileged user, and
only a tiny stub runs as `root` to eventually call `SHUTDOWNCMD` if needed.
This switch will disable that mode, and run the old "all root all the time"
system.
+
This is not the recommended mode, and you should not use this unless you
have a very good reason.

*-u* 'user'::
Set the user for the unprivileged monitoring process.  This has no effect
when using '-p'.
+
The default 'user' is set at build configuration time with
`configure --with-user=...`.  Typically this is 'nobody' (if not
otherwise configured), which is far from ideal, so most packaged
distributions will probably have a specific 'nut' user for this task.
If your notification scripts need to run as a specific user, set it here.
+
You can also set this in the linkman:upsmon.conf[5] file with the
`RUN_AS_USER` directive.

COMMON OPTIONS
--------------

*-h*::
Show the command-line help message.

*-V*::
Show NUT version banner.  More details may be available if you also
`export NUT_DEBUG_LEVEL=1` or greater verbosity level.

*-W* 'secs'::
Set the timeout for initial network connections (by default they are
indefinitely non-blocking, or until the system interrupts the attempt).
Overrides the optional `NUT_DEFAULT_CONNECT_TIMEOUT` environment variable.

UPS DEFINITIONS
---------------

In the linkman:upsmon.conf[5], you must specify at least one UPS that will
be monitored.  Use the MONITOR directive like this:

	MONITOR 'system' 'powervalue' 'username' 'password' 'type'

The 'system' refers to a linkman:upsd[8] server, in the form
+upsname[@hostname[:port]]+.  The default 'hostname' is "localhost",
and default 'port' is '3493'.  Some examples follow:

 - "su700@mybox" means a UPS called "su700" on a system called "mybox".
This is the normal form.

 - "fenton@bigbox:5678" is a UPS called "fenton" on a system called
"bigbox" which runs linkman:upsd[8] on port "5678".

The 'powervalue' refers to how many power supplies on this client system are
being driven this UPS.  This is typically set to '1', but see the section
on power values below.

The 'username' is a section in your linkman:upsd.users[5] file.
Whatever password you set in that section must match the 'password'
set in this file for the corresponding data server instance.

The type set in that section of the data server configuration must also match
the 'type' here -- *primary* or *secondary* (consider it an "upsmon role").
In general, a "primary" client process is one running on the system with the
UPS actually plugged into a serial or USB port, and a "secondary" is drawing
power from the UPS but can't talk to it directly.
See the section on UPS types for more.

NOTIFY EVENTS
-------------

*upsmon* senses several events as it monitors each UPS.  They are called
"notify events", as they can be used to tell the users and admins about the
change in status.  See the additional NOTIFY-related sections below for
information on customizing the delivery of these messages.

*ONLINE*::
The UPS is back on line.

*ONBATT*::
The UPS is on battery.

*LOWBATT*::
The UPS battery is low (as determined by the driver).

*FSD*::
The UPS has been commanded into the "forced shutdown" mode.

*COMMOK*::
Communication with the UPS has been established.

*COMMBAD*::
Communication with the UPS was just lost.

*SHUTDOWN*::
The local system is being shut down.

*REPLBATT*::
The UPS needs to have its battery replaced.

*NOCOMM*::
The UPS can't be contacted for monitoring.

*NOPARENT*::
`upsmon` parent process died - shutdown impossible.

*CAL*::
UPS calibration in progress.

*NOTCAL*::
UPS calibration finished.

*OFF*::
UPS administratively OFF or asleep.

*NOTOFF*::
UPS no longer administratively OFF or asleep.

*BYPASS*::
UPS on bypass (powered, not protecting).

*NOTBYPASS*::
UPS no longer on bypass.

*ECO*::
UPS in ECO or similar mode (as defined and named by vendor); other
names include High Efficiency mode and Energy Saver System.
+
For example, Eaton docs define High Efficiency mode as a sort of
hardware-monitored bypass with switch-over under 10ms into Online
mode in case of troubles (so what was known as Line-Interactive
in the older days), which can be *chosen* instead of always running
in a dual-conversion mode (safer, but more wasteful and with a toll
on battery life). Older devices only implemented one or the other.

*NOTECO*::
UPS no longer in ECO mode (see above).

*ALARM*::
UPS has one or more active alarms (look at `ups.alarm` for details).

*NOTALARM*::
UPS is no longer in an alarm state (no active alarms).

*OVER*::
UPS is overloaded.

*NOTOVER*::
UPS is no longer overloaded.

*TRIM*::
UPS is trimming incoming voltage.

*NOTTRIM*::
UPS is no longer trimming incoming voltage.

*BOOST*::
UPS is boosting incoming voltage.

*NOTBOOST*::
UPS is no longer boosting incoming voltage.

*OTHER*::
UPS has at least one unclassified status token.

*NOTOTHER*::
UPS has no unclassified status tokens anymore.

*SUSPEND_STARTING*::
OS is entering sleep/suspend/hibernate mode.

*SUSPEND_FINISHED*::
OS just finished sleep/suspend/hibernate mode, de-activating
obsolete UPS readings to avoid an unfortunate shutdown.


NOTIFY COMMAND
--------------

In linkman:upsmon.conf[5], you can configure a program called the NOTIFYCMD
that will handle events that occur.

The syntax is: +NOTIFYCMD+ "'path to program'"

Example:

 - `NOTIFYCMD "/usr/local/bin/notifyme"`

Remember to wrap the path in "double quotes" if it contains any spaces.
It should probably not rely on receiving any command-line arguments.

The program you run as your NOTIFYCMD can use the environment variables
NOTIFYTYPE and UPSNAME to know what has happened and on which UPS.  It
also receives the notification message (see below) as the first (and
only) argument, so you can deliver a pre-formatted message too.

Note that the NOTIFYCMD will only be called for a given event when you set
the EXEC flag by using the notify flags, as detailed below.

NOTIFY FLAGS
------------

By default, all notify events (see above) generate a global message
(wall) to all users, plus they are logged via the syslog.
Except for Windows where upsmon only writes to the syslog by default.
You can change this with the NOTIFYFLAG directive in the configuration file:

The syntax is: +NOTIFYFLAG+ 'notifytype' 'flags'

Examples:

 - `NOTIFYFLAG ONLINE SYSLOG`

 - `NOTIFYFLAG ONBATT SYSLOG+WALL`

 - `NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC`

The flags that can be set on a given notify event are:

*SYSLOG*::
Write this message to the syslog.

*WALL*::
Send this message to all users on the system via linkmanext:wall[1].

*EXEC*::
Execute the NOTIFYCMD.

*IGNORE*::
Don't do anything.  If you use this, don't use any of the other flags.

You can mix these flags; for example,  `SYSLOG+WALL+EXEC` does all three
for a given event.

NOTIFY MESSAGES
---------------

upsmon comes with default messages for each of the NOTIFY events.  These
can be changed with the NOTIFYMSG directive.

The syntax is: +NOTIFYMSG+ 'type' "'message'"

Examples:

 - `NOTIFYMSG ONLINE "UPS %s is getting line power"`

 - `NOTIFYMSG ONBATT "Someone pulled the plug on %s"`

The first instance of `%s` is replaced with the identifier of the UPS that
generated the event.  These messages are used when sending walls to the
users directly from upsmon, and are also passed to the NOTIFYCMD.

NOTE: Certain notifications, such as `NOTIFY_ALARM` and `NOTIFY_OTHER`,
can accept a second instance of `%s` that would be replaced with the
alarm text or the unrecognized token, respectively.

POWER VALUES
------------

The "current overall power value" is the sum of all UPSes that are
currently able to supply power to the system hosting upsmon.  Any
UPS that is either on line or just on battery contributes to this
number.  If a UPS is critical (on battery and low battery) or has been
put into "forced shutdown" mode, it no longer contributes.

The syntax is: +MONITOR+ 'upsname' 'powervalue' 'username' 'password' 'type'

A "power value" on a MONITOR line in the config file is the number of
power supplies that the UPS runs on the current system.

* Normally, you only have one power supply, so it will be set to '1'.
+
Example:

 - `MONITOR myups@myhost 1 username mypassword primary`

* On a large server with redundant power supplies, the power value for a UPS
may be greater than 1.  You may also have more than one of them defined.
+
Examples for a server with four power supply modules and two UPSes (each
feeding two power supplies of that server):

 - `MONITOR ups-alpha@myhost 2 username mypassword primary`

 - `MONITOR ups-beta@myhost 2 username mypassword primary`

* You can also set the power value for a UPS to '0' if it does not supply any
power to that system.  This is generally used when you want to use the
upsmon notification features for a UPS even though it's not actually
powering the system that hosts this instance of the upsmon client.
+
Don't set this to "primary" unless you really want to power this UPS off
when this instance of upsmon needs to shut down for its own reasons.
+
Example:

 - `MONITOR faraway@anotherbox 0 username mypassword secondary`

The "minimum power value" is the number of power supplies that must be
receiving power in order to keep this `upsmon` client's computer running.

The syntax is: +MINSUPPLIES+ 'value'

* Typical PCs only have 1, so most users will leave this at the default.
+
Example:

 - `MINSUPPLIES 1`

* If you have a server or similar system with redundant power, then this
value will usually be set higher.  One that requires three power supplies
out of 4 to be running at all times would simply set it to 3.
+
--
Example:

 - `MINSUPPLIES 3`
--
+
When the current overall healthy UPS-protected external power value drops
below the minimum required power value, `upsmon` starts the shutdown sequence.
This design allows you to lose some of your power supplies in a redundant
power environment without bringing down the entire system, while still
working properly for smaller systems.

UPS TYPES
---------

*upsmon* and linkman:upsd[8] don't always run on the same system.  When they
do, any UPSes that are directly attached to the upsmon host should be
monitored in "primary" mode.  This makes upsmon take charge of that equipment,
and it will wait for the "secondary" systems to disconnect before shutting
down the local system.  This allows the distant systems (monitoring over
the network) to shut down cleanly before `upsdrvctl shutdown` runs locally
and turns them all off.

When upsmon runs as a secondary, it is relying on the distant system to tell
it about the state of the UPS.  When that UPS goes critical (on battery
and low battery), it immediately invokes the local shutdown command.  This
needs to happen quickly.  Once all secondaries disconnect from the distant
linkman:upsd[8] server, its primary-mode upsmon will start its own shutdown
process.  Your secondary systems must all quiesce and shut down before the
primary turns off the shared power source, or filesystem damage may result.

upsmon deals with secondaries that get wedged, hang, or otherwise fail to
gracefully disconnect from linkman:upsd[8] in a timely manner with the
HOSTSYNC timer.  During a shutdown situation, the primary upsmon will give
up after this interval and it will shut down anyway.  This keeps the primary
from sitting there forever (which would endanger that host) if a secondary
should break somehow.  This defaults to 15 seconds.

If your primary system is shutting down too quickly, set the FINALDELAY
interval to something greater than the default 15 seconds.  Don't set
this too high, or your UPS battery may run out of power before the
primary upsmon process shuts down that system.

TIMED SHUTDOWNS
---------------

For those rare situations where the shutdown process can't be completed
between the time that low battery is signalled and the UPS actually powers
off the load, use the linkman:upssched[8] helper program.  You can use it
along with upsmon to schedule a shutdown based on the "on battery" event.
upssched can then come back to upsmon to initiate the shutdown once it has
run on battery too long.

This can be complicated and messy, so stick to the default critical UPS
handling if you can.

REDUNDANT POWER SUPPLIES
------------------------

If you have more than one power supply for redundant power, you may also
have more than one UPS feeding your computer.  upsmon can handle this.  Be
sure to set the UPS power values appropriately and the MINSUPPLIES value
high enough so that it keeps running until it really does need to shut
down.

For example, the HP NetServer LH4 by default has 3 power supplies
installed, with one bay empty.  It has two power cords, one per side of
the box.  This means that one power cord powers two power supply bays,
and that you can only have two UPSes supplying power.

Connect UPS "alpha" to the cord feeding two power supplies, and UPS
"beta" to the cord that feeds the third and the empty slot.  Define alpha
as a powervalue of 2, and beta as a powervalue of 1.  Set the MINSUPPLIES
to 2.

When alpha goes on battery, your current overall power value will stay
at 3, as it's still supplying power.  However, once it goes critical (on
battery and low battery), it will stop contributing to the current overall
power value.  That means the value will be 1 (beta alone), which is less
than 2.  That is insufficient to run the system, and upsmon will invoke
the shutdown sequence.

However, if beta goes critical, subtracting its contribution will take the
current overall value from 3 to 2.  This is just high enough to satisfy
the minimum, so the system will continue running as before.  If beta
returns later, it will be re-added and the current value will go back to
3.  This allows you to swap out UPSes, change a power configuration, or
whatever, as long as you maintain the minimum power value at all times.

MIXED OPERATIONS
----------------

Besides being able to monitor multiple UPSes, upsmon can also monitor them
as different roles.  If you have a system with multiple power supplies
serviced by separate UPS batteries, it's possible to be a primary on one
UPS and a secondary on the other.  This usually happens when you run
out of serial or USB ports and need to do the monitoring through another
system nearby.

This is also complicated, especially when it comes time to power down a
UPS that has gone critical but doesn't supply the local system.  You can
do this with some scripting magic in your notify command script, but it's
beyond the scope of this manual.

FORCED SHUTDOWNS
----------------

When upsmon is forced to bring down the local system, it sets the
"FSD" (forced shutdown) flag on any UPSes that it is running in primary
mode.  This is used to synchronize secondary systems in the event that
a primary which is otherwise OK needs to be brought down due to some
pressing event on the UPS manager system.

You can manually invoke this mode on the system with primary-mode upsmon
by starting another copy of the program with `-c fsd` command line argument.
This is useful when you want to initiate a shutdown before the critical
stage through some external means, such as linkman:upssched[8].

WARNING: Please note that by design, since we require power-cycling the
load and don't want some systems to be powered off while others remain
running if the "wall power" returns at the wrong moment as usual, the "FSD"
flag can not be removed from the data server unless its daemon is restarted.
If we do take the first step in critical mode, then we normally intend to go
all the way -- shut down all the servers gracefully, and power down the UPS.

Keep in mind that some UPS devices and corresponding drivers would also latch
or otherwise resurface the "FSD" state again even if "wall power" is available,
but the remaining battery charge is below a threshold configured as "safe" in
the device (usually if you manually power on the UPS after a long power outage).
This is by design of respective UPS vendors, since in such situation they
can not guarantee that if a new power outage happens, their UPS would safely
shut down your systems again. So it is deemed better and safer to stay dark
until batteries become sufficiently charged.

When it is time to shut down, upsmon creates POWERDOWNFLAG to
communicate to the operating system that the UPS should be commanded
off late in the shutdown sequence.  This file is removed if present
when upsmon starts, so that the next normal shutdown does not cause
the UPS to be turned off.  (The file can't in general be removed
during shutdown because the filesystem might be read only.  If the
file is in a RAM-backed filesystem, the it won't be present and the
check to remove it won't fire.)

SIMULATING POWER FAILURES
-------------------------

To test a synchronized shutdown without pulling the plug on your UPS(es),
you need only set the forced shutdown (FSD) flag on them.  You can do this
by calling upsmon again to set the flag, i.e.:

	:; upsmon -c fsd

After that, the primary and the secondary will do their usual shutdown
sequence as if the battery had gone critical, while you can time how long
it takes for them.  This is much easier on your UPS equipment, and it beats
crawling under a desk to find the plug.

Note you can also use a dummy SHUTDOWNCMD setting to just report that the
systems would shut down at this point, without actually disrupting their work.

WARNING: After such "dummy" experiments you may have to restart the NUT data
server `upsd` to clear its "FSD" flag for the devices and clients involved,
and make sure no files named by `POWERDOWNFLAG` option (e.g. `/etc/killpower`)
remain on the `upsmon primary` systems under test.

DEAD UPSES
----------

In the event that upsmon can't reach linkman:upsd[8], it declares that UPS
"dead" after some interval controlled by DEADTIME in the
linkman:upsmon.conf[5].  If this happens while that UPS was last known to be
on battery, it is assumed to have gone critical and no longer contributes
to the overall power value.

upsmon will alert you to a UPS that can't be contacted for monitoring
with a "NOCOMM" notifier by default every 300 seconds.  This can be
changed with the NOCOMMWARNTIME setting.

Also upsmon normally reports polling failures for each device that are in place
for each POLLFREQ loop (e.g. "Data stale" or "Driver not connected") to
system log as configured.  If your devices are expected to be AWOL for an
extended timeframe, you can use POLLFAIL_LOG_THROTTLE_MAX to reduce the
stress on syslog traffic and storage, by posting these messages only once
in every several loop cycles, and when the error condition has changed or
cleared.  A negative value means standard behavior (log on every loop,
effectively same as when `max=1`), and a zero value means to never repeat
the message (log only on start and end/change of the failure state).

Note that this throttle only applies to one latest-active error state per
monitored device.

UPS ALARMS
----------

UPS manufacturers and UPS drivers may implement device-specific alarms to
notify the user about potentially severe conditions of the UPS. The status
"ALARM" will be set on such UPS as a common denominator for the different
implementations throughout the various UPS drivers and generally anytime
that a "ups.alarm" variable is seen reported by the specific UPS driver.

Alarms can be acted upon by the user using the "ALARM" and "NOTALARM" notifiers,
which are reported by the `upsmon` when the "ALARM" status occurs or disappears.

As alarms raised by the UPS are usually severe in nature, the `upsmon` watches
a UPS in such a state more closely, bumping up the polling frequency as needed.

When connection loss occurs in such an alarm state, the `upsmon` will by default
consider the volatile UPS as critical/dead and this may lead to false shutdowns
if the actual alarm was in fact mundane in nature (e.g. caused by a HE/ECO mode).
This can be changed by utilizing the `ALARMCRITICAL` setting within `upsmon.conf`.

RELOADING NUANCES
-----------------

upsmon usually gives up root powers for the process that does most of
the work, including handling signals like SIGHUP to reload the configuration
file.  This means your linkman:upsmon.conf[5] file must be readable by
the non-root account that upsmon switches to.

If you want reloads to work, upsmon must run as some user that has
permissions to read the configuration file.  We recommend making a new
user just for this purpose, as making the file readable by "nobody"
(the default user) would be a bad idea; packages typically ship with
a `nut` or `ups` user to run NUT daemon services.

See the `RUN_AS_USER` section in linkman:upsmon.conf[5] for more on this topic.

Additionally, you can't change the `SHUTDOWNCMD` or `POWERDOWNFLAG`
definitions with a reload due to the split-process model.  If you change
those values, you *must* stop upsmon and start it back up.  upsmon
will warn you in the syslog if you make changes to either of those
values during a reload.

ENVIRONMENT VARIABLES
---------------------

*NUT_DEBUG_LEVEL* sets default debug verbosity if no *-D* arguments
were provided on command line, but does not request that the daemon
runs in foreground mode.

*NUT_CONFPATH* is the path name of the directory that contains
`upsmon.conf` and other configuration files.  If this variable is not set,
*upsmon* uses a built-in default, which is often `/usr/local/ups/etc`.

*NUT_QUIET_INIT_UPSNOTIFY=true* can be used to prevent daemons which can
notify service management frameworks (such as systemd) about passing
their lifecycle milestones from emitting such notifications (including
those about lack of system support for such modern features, once per run).

*NUT_QUIET_INIT_BANNER=true* can be used to suppress NUT tool name and
version banner. NOT recommended for services due to adverse troubleshooting
impact, but may be helpful in shell profiles or scripts which process NUT
tool outputs.

FILES
-----

linkman:upsmon.conf[5]

SEE ALSO
--------

Server:
~~~~~~~

linkman:upsd[8]

Clients:
~~~~~~~~

linkman:upsc[8], linkman:upscmd[8],
linkman:upsrw[8], linkman:upsmon[8]

CGI programs:
~~~~~~~~~~~~~

linkman:upsset.cgi[8], linkman:upsstats.cgi[8], linkman:upsimage.cgi[8]

Internet resources:
~~~~~~~~~~~~~~~~~~~

The NUT (Network UPS Tools) home page: https://www.networkupstools.org/
