Friday 30 August 2013

Nagios: Monitor your Network Infrastructure

You can monitor your firewall, routers, network switches etc. using Nagios.  These days most of the switches and routers supports SNMP, and you can monitor port status with the check_snmp plugin and bandwidth using MRTG with the check_mrtgtraf plugin. You need to install plugin if you want to monitor your firewall.

I assume you have already installed and configured Nagios on the Nagios monitoring server. If not follow the instructions here. Once your Nagios server is ready you 'll need to follow these steps to monitor your network infrastructure.

1. Enable Switch configuration file in Nagios.cfg 

Edit the nagios configuration file, unckeck switch.cfg.

# vim /usr/local/nagios/etc/nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/switch.cfg

2. Define hosts for Switch/Router/Firewall

Open the configuration file and change the host_name, alias, and address fields to appropriate values for the switch.

# vim /usr/local/nagios/etc/objects/switch.cfg

# Define the switch that we'll be monitoring

define host{
        use                    generic-switch                      ; Inherit default values from a template
        host_name       catalyst-4500                       ; The name we're giving to this switch
        alias                  Cisco Catalyst 4500 Switch  ; A longer name associated with the switch
        address             192.168.1.195                         ; IP address of the switch
        hostgroups       switches                                 ; Host groups this switch is associated with

Open the configuration file and change the host_name, alias, and address fields to appropriate values for the firewall as well as router.

3. Monitoring services for Switch/Router/Firewall

Add the following service definition to monitor packet loss and round trip average between the Nagios host and the switch every 5 minutes under normal conditions.

# Create a service to PING to switch

define service{
        use                             generic-service ; Inherit values from a template
        host_name               catalyst-4500   ; The name of the host the service is associated with
        service_description  PING                ; The service description
        check_command     check_ping!200.0,20%!600.0,60%  ; The command used to monitor the service
        normal_check_interval   5               ; Check the service every 5 minutes under normal conditions
        retry_check_interval       1               ; Re-check the service every minute until its final/hard state is determined
        }

This service will be:
CRITICAL if the round trip average (RTA) is greater than 600 milliseconds or the packet loss is 60% or more 
WARNING if the RTA is greater than 200 ms or the packet loss is 20% or more
OK if the RTA is less than 200 ms and the packet loss is less than 20%

# Monitor uptime via SNMP

        host_name                  catalyst-4500
        service_description     Uptime
        check_command         check_snmp!-C public -o sysUpTime.0
        }

# Monitor Port 1 status via SNMP

define service{
        use                               generic-service ; Inherit values from a template
        host_name                 catalyst-4500
        service_description     Port 1 Link Status
        check_command        check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
        }

Repeat this procedure for router as well. To monitor firewall you'll need to download the  appropriate plugin and define the services. If you are using Cisco ASA you can download the plugin from here.

4. Monitor your Bandwidth

You need to install MRTG if you want to monitor bandwidth usage on your switches or routers. You can set the alert when traffic rates exceed thresholds you specify. You need to use check_mrtgtraf plugin for this. The MRTG log file mentioned below should point to the MRTG log file on your system.

# Monitor bandwidth via MRTG logs

define service{
        use                             generic-service ; Inherit values from a template
        host_name               catalyst-4500
        service_description  Port 1 Bandwidth Usage
        check_command                                check_local_mrtgtraf!/var/lib/mrtg/192.168.1.195_1.log!AVG!1000000,1000000!5000000,5000000!10
        }

In the example above, the "/var/lib/mrtg/192.168.1.195_1.log" option that gets passed to the check_local_mrtgtraf command tells the plugin which MRTG log file to read from. The "AVG" option tells it that it should use average bandwidth statistics. The "1000000,2000000" options are the warning thresholds (in bytes) for incoming traffic rates. The "5000000,5000000" are critical thresholds (in bytes) for outgoing traffic rates. The "10" option causes the plugin to return a CRITICAL state if the MRTG log file is older than 10 minutes (it should be updated every 5 minutes).

5. Verify configuration and restart Nagios.

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

# /etc/init.d/nagios restart
Stopping nagios:                                           [  OK  ]
Starting nagios:                                            [  OK  ]

Note 1:
If you want to monitor all the ports of the switch then make an entry of all the ports while defining the services.

check_command         check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB, -o ifOperStatus.2 -r 1 -m RFC1213-MIB, -o ifOperStatus.3 -r 1 -m RFC1213-MIB ...

Note 2:
You can monitor your router/firewall using SNMP if you know the object  identifier (OID) for the router/firewall, which you can find using snmpwalk.

snmpwalk -v1 -c public 192.168.1.205 -m ALL .1, where 192.168.1.205 is the ip address of your router/firewall.

Note 3:
You can monitor your remote linux/windows host using SNMP, but I'm not sure of reliability of SNMP. One reason is SNMP is based on less secure UDP and the other is there is no acknowledgement defined for snmp traps.

Note 4:
There are few occasions we prefer UDP over TCP, especially when we don't require any acknowledgement or few packet loss doesn't make any difference.

1. used for broadcast and multicast, as TCP doesn't support broadcast/multicast.

2. faster, there is no acknowledgement defined, and no need to resend  the lost packets makes UDP faster and is widely used for videoconferencing.

Monday 5 August 2013

Nagios: Monitor your remote Windows host using NSClient++

We can monitor private services like CPU load, Memory, Hard disk usage etc. of remote Windows systems using NSClient++.  Public services like HTTP, SMTP, FTP etc. can be monitored by already installed plugins on the Nagios server. The  check_nt plugin which was already installed on the Nagios server is used to communicate with the NSClient++.

I assume you have already installed and configured Nagios on the Nagios monitoring server. If not follow the instructions here. Once your Nagios server is ready you 'll need to follow these steps to monitor your Windows system.

1. Prerequisites.

2. Install NSClient++ on your Windows system.

3. Define new hosts and services for your Windows system.

4. Restart Nagios services

1. Prerequisites.

a) Open the Nagios configuration file, remove the leading pound (#) sign from the following line, save the file and exit.

# vim /usr/local/nagios/etc/nagios.cfg

  cfg_file=/usr/local/nagios/etc/objects/windows.cfg
# wq!

b) Make sure the windows-server template is enabled 

# vim /usr/local/nagios/etc/objects/templates.cfg

# Windows host definition template - This is NOT a real host, just a template!

define host{
                name                                  windows-server               ; The name of this host template
                use                                     generic-host       ; Inherit default values from the generic-host template
                check_period                   24x7                        ; By default, Windows servers are monitored round the clock
                check_interval                 5                               ; Actively check the server every 5 minutes
                retry_interval                   1                               ; Schedule host check retries at 1 minute intervals
                max_check_attempts     10                          ; Check each server 10 times (max)
                check_command             check-host-alive; Default command to check if servers are "alive"
                notification_period         24x7                       ; Send notification out at any time - day or night
                notification_interval       30                           ; Resend notifications every 30 minutes
                notification_options       d,r                          ; Only send notifications for specific host states
                contact_groups               admins                   ; Notifications get sent to the admins by default
                hostgroups                       windows-servers ; Host groups that Windows servers should be a member of
                register                              0                              ; DONT REGISTER THIS - ITS JUST A TEMPLATE
        }

c) Make sure that the check_nt is enabled under /usr/local/nagios/etc/objects/commands.cfg

# vim /usr/local/nagios/etc/objects/commands.cfg

# 'check_nt' command definition
define command{
        command_name    check_nt
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
        }

2. Install NSClient++ on your Windows system.

To monitor private services of Windows system, you’ll need to install NSClient++ addon.

a) Download and install the appropriate NSCP-0.4.1 from NSClient++ Project.

b) Make sure the NSClient++ service is allowed to interact with the desktop 

Run "services.msc". Double click on the NSClient++ service,  select 'Log On' tab and then select the check-box that says “Allow service to interact with desktop” as shown below. Apply the changes.



c) Modify nsclient.ini
  • Edit the nsclient.ini file (C:\Program Files\NSClient++\nsclient.ini) and make the following changes.
  • Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll.
  • Uncomment the ’allowed_hosts’ option in the [/settings/default] section. Specify the IP address of the Nagios server to this line, or leave it blank to allow all hosts to connect.
  • You can also specify a password for clients by changing the ’password’ option in the [/settings/default] section.
          ; PASSWORD - Password used to authenticate against server
          password = ******
  • Make sure the ’port’ option in the [/settings/NSClient/server] section is uncommented and set to ’12489’ (the default port). 
         ; PORT NUMBER - Port to use for check_nt.
         port = 12489

d) Start the NSClient++ service

Run C:\Program Files\NSClient++\nscp.exe /start. When ever you make any changes to nsclient.ini file you need to restart NSClient++ service.

3. Define new hosts and services for your Windows system.

Now it’s time to define some object definitions in your Nagios configuration files in order to monitor the
new Windows machine.  Open the windows.cfg file for editing. 

# vim /usr/local/nagios/etc/objects/windows.cfg
Add a new host definition for the Windows machine that you’re going to monitor. If this is the *first*
Windows machine you’re monitoring, you can simply modify the sample host definition in windows.cfg.
Change the host_name, alias, and address fields to appropriate values for the Windows box. 

# Define a host for the Windows machine we'll be monitoring
# Change the host_name, alias, and address to fit your situation

define host{
        use             windows-server  ; Inherit default values from a template
        host_name       winserver       ; The name we're giving to this host
        alias           Windows Server  ; A longer name associated with the host
        address         192.168.1.2     ; IP address of the host
        }

Now you can add some service definitions (to the same configuration file) in order to tell Nagios
to monitor different aspects of the Windows machine. If this is the *first* Windows machine you’re
monitoring, you can simply modify the sample service definitions in windows.cfg

Add the following service definition to monitor the version of the NSClient++ addon that is running on the Windows server. This is useful when it comes time to upgrade your Windows servers to a newer version of the addon, as you’ll be able to tell which Windows machines still need to be upgraded to the latest version of NSClient++. 

Add the following service definition to monitor the uptime of the Windows server. 

# Create a service for monitoring the uptime of the server Change the host_name to match the name of the host you defined above

define service{
        use                             generic-service
        host_name                winserver
        service_description   Uptime
        check_command      check_nt!UPTIME
        }

Add the following service definition to monitor the CPU utilization on the Windows server and generate a CRITICAL alert if the 5-minute CPU load is 90% or more or a WARNING alert if the 5-minute load is 80% or greater.

# Create a service for monitoring CPU load Change the host_name to match the name of the host you defined above

define service{
        use                     generic-service
        host_name               winserver
        service_description     CPU Load
        check_command           check_nt!CPULOAD!-l 5,80,90
        }

Add the following service definition to monitor memory usage on the Windows server and generate a CRITICAL alert if memory usage is 90% or more or a WARNING alert if memory usage is 80% or greater

# Create a service for monitoring memory usage change the host_name to match the name of the host you defined above

define service{
        use                               generic-service
        host_name                 winserver
        service_description     Memory Usage
        check_command       check_nt!MEMUSE!-w 80 -c 90
        }

Add the following service definition to monitor usage of the C:\ drive on the Windows server and
generate a CRITICAL alert if disk usage is 90% or more or a WARNING alert if disk usage is 80% or greater.

# Create a service for monitoring C:\ disk usage change the host_name to match the name of the host you defined above

define service{
        use                     generic-service
        host_name               winserver
        service_description     C:\ Drive Space
        check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90
        }

That’s it for now. You’ve added some basic services that should be monitored on the Windows box. Save the configuration file. 

Password Protection

If you specified a password in the NSClient++ configuration file on the Windows machine, you’ll need to modify the  check_nt command definition to include the password. Open the  commands.cfg file for editing. 

# vim /usr/local/nagios/etc/objects/commands.cfg

Change the definition of the  check_nt command to include the "-s <PASSWORD>" argument (where PASSWORD is the password you specified on the Windows machine) like this: 

# 'check_nt' command definition
define command{
        command_name    check_nt
        command_line        $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s <PASSWORD> -v $ARG1$ $ARG2$
        }

4. Restart Nagios services

Verify the nagios configuration files as shown below.

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Restart nagios as shown below.

# service nagios restart

Note: Replace "winserver" in the example definitions below with the name you specified in the 
host_name directive of the host definition you just added.