Big Brother - Help
A Web-based Systems and Network Monitoring and Notification System
Order of severity
Serious Trouble
No report lately
May need attention
All is well
All connections are checked every 5 minutes
Pager Codes
The administrator will be notified when conditions merit. The numeric message
is formatted as follows: [3 DIGIT CODE] [IP-ADDRESS]
- 100 - Disk Error. Disk is over 95% full...
- 200 - CPU Error. CPU load average is unacceptably high.
- 300 - Process Error. An important processes has died.
- 400 - Message file contains a serious error.
- 500 - Network error, can't connect to that IP address.
- 600 - Web server HTTP error - server is down.
- 7-- - Generic server error - 7 + server port number i.e. 721 = ftp down
- 800 - DNS server on that machine is down
- 911 - User Page. Message is phone number to call back.
- 999 - The host reporting an error could not be found in the etc/bb-hosts file.
Severe Conditions
Most severe conditions result in the administrator being notified. These include loss of
network connectivity, loss of HTTP access, and disk conditions over 95% full, since these
can result in a system hang. Furthermore, any "NOTICE" messages in
the message file causes a notification since this may signal a disk fault.
Under these circumstances, the screen should turn red. Click on the corresponding red dot
for additional information about the condition.
If a severe situation is occurring that is not being noticed by Big Brother, use the PAGE/ACK
button on the main screen to notify the administrator manually.
Warning Conditions
These include HTTP server errors, disks 90-94% full, the death of important
processes, and "WARNING" messages in the system logs.
The screen should turn yellow if this is the most severe situation at the time.
Click on the corresponding yellow dot for additional information, and notify
the administrator manually if necessary.
No Report Warnings
Each report is checked for freshness. If any report is more than 30 minutes old, it is marked
with a purple dot, and the screen turns purple, assuming that it is the most serious situation
at the time.
These may be the result of heavily loaded systems, but may also indicate a more
serious loss of communication within the Big Brother system itself.
System Information
Click on any server name for additional details about the machine. Information
about all components are available, including serial numbers, partition sizes,
SCSI addresses, and the physical locations of the devices. This
information lives in the www/notes directory.
General Information
The current status of any individual component is always available by clicking the appropriate
dot in the display matrix. You may have to hit Reload to get the most recent entry.
Occasionally the screen changes color for CPU or HTTP warnings. These can usually be disregarded
since Big Brother has been instructed to be very sensitive during this initial test. Similarly, internet connections may turn yellow when the network
is heavily loaded. Although it should be checked out, this is usually not a
problem unless the whole Internet section goes yellow.
Big Brother Column Information
conn
The conn column denotes the ping check performed periodically. This
code is located in bb-network.sh.
nntp
The nntp column denotes the nntp check performed periodically. This
code is located in bb-network.sh. It makes sure the news server is
alive and well.
cpu
The cpu column denotes the cpu check performed periodically. This figure
is based on the 5 minute load average as reported by the 'uptime' command,
in the second column. The code for this test is located in bb-local.sh.
disk
The disk column denotes the disk check performed periodically. This
test is just the 'df' command with the disk most full being reported.
The warning amount is 90% by default, and the system is set to panic at
95%. These values are set in $BBHOME/etc/bbdef.sh and may be changed.
The code for the disk test lives in bb-local.sh. You may also set
warning/panic level individually in the etc/bb-dftab file. See the
etc/bb-dftab.INFO.
dns
The dns column verifies the status of the DNS server on that machine.
The test is basically an nslookup with the server name and IP address
as arguments.
ftp
The ftp column denotes the ftp check performed periodically. This
code is located in bb-network.sh. It is part of the new group of
generic server tests performed. To test this service on a given
machine, just include 'ftp' on the line in the bb-hosts file.
http
The http column denotes the http check performed periodically. This
code is located in bb-network.sh. It will return OK if the server is
there and does not return a string containing the word 'Error'. It
should be more rigourous. Note that password-protected pages return
an error when they shouldn't.
msgs
The msgs column denotes the msgs check performed periodically. This
code is located in bb-local.sh. Only NOTICE and WARNING conditions are
considered. Note that a NOTICE condition will cause a notification (code red)
whereas a WARNING just turns the screen yellow. There is no way to
turn these messages off, short of clearing out the messages file
manually or modifying the tags from WARNING to wARNING and NOTICE to
nOTICE. You may also introduce tags in the etc/bbdef.sh file in
the PAGEMSG and MSGS variables.
pop3
The pop3 column denotes the pop3 check performed periodically. This
is part of the generic test code in bb-network.sh. It checks that
the pop3 server is alive and well. To test a machine for the pop3
server, put the word 'pop3' on that server's line in the bb-hosts file.
You may have to put pop-3 instead on certain platforms. Check /etc/services
for the correct spelling.
procs
The procs column denotes the procs check performed periodically. This
code is located in bb-local.sh. It makes sure that the processes defined
in etc/bbdef.sh in the PROCS variable exist on the local machine. If
a process does not exist, and it has been defined in the PAGEPROCS
variable, then the code is red and a notification is sent out. The ps command
is used to get a current process listing.
smtp
The smtp column denotes the smtp check performed periodically. This
is part of the generic server test code located in bb-network.sh. It
makes sure that the SMTP process (usually sendmail) is alive and well.
Copyright © 1997-1999 The MacLawran Group Inc - All Rights Reserved