sub-title

Also check Orama's Quora and Orama's GitHub
I shall not claim to know so much, but only that I learn new things everyday

Saturday, 30 April 2022

Developing a Network Monitoring and Data Analysis Tool using Python

Introduction

In an earlier post on System Administration Made Easy – Automating the Directory Service using Python and LDAP, I had mentioned that I once developed a Network Monitoring Tool similar to Nagios, but more versatile. More versatile because I have full control on what metrics I can monitor, and how I want to monitor them. In a nutshell, the system can do as you wish. It is bespoke in the true sense of the word. That is the beauty of programming.

It is always fun and rewarding when you see your system being used and appreciated. That, in itself, is motivating and gives you more energy to do more, even though the effort that goes into such a system is usually quite demanding.

With hindsight, and as I grow wiser everyday, I can now easily redevelop the system in a matter of days to conform to current standards and best practice. In addition, it is now possible to have a real-time system that updates without refreshing the page. Previously –  due to knowledge limitations it was only possible for the system to send status updates via email at designated times of the day. I can say with confidence that with the new developments, the system is now enterprise-ready to analyze the vast data available on a corporate network.


The Product

The final product looks like this with dummy (hypothetical) data




Once you load the above page on your browser, the metrics would automatically update every defined interval (e.g. 5 minutes). If there is an issue with a server (or a network device for that matter), then the row for that device can be flagged (in red for instance) to attract your attention. Issues such as when a server goes offline, the disks pace or memory is running out, etc are all monitored by the system.

The best part of it is if you have a large wall screen that displays the system. If the IT team sits in one room, then the screen will be very handy because the team will easily notice immediately if something goes wrong with any server or network device.

Some obvious use-cases of this approach are: filght schedules (arrivals/departures), forex rates by forex bureau, weather data by location, etc. The importance of this knowledge is enormous.


Implementation and Potential

To implement the monitoring system, knowledge of the following is assumed as a minimum: Python, Javascript, HTML, Network Administration. Any Python development framework such as Flask, Django, Bottle, etc would work. Obviously, adequate programming experience is a must.

I used Flask micro-framework in my case – meaning that the templates are designed using the Jinja2 templating engine. No big deal here … you can still properly package the code that I will share later and use it with any other MVC Framework. Knowledge of the MVC development pattern makes it easy to hop from one MVC framework to another.

Currently, only Windows networks are supported because I am using the Windows Management Instrumentation library. Extending to other OS environments should be a piece of cake if you have the requisite skills.

If integrated with LDAP, then we can unlock all possibilities, meaning that the questions in the mind of the Network Admin will be a breeze to answer.

The system can also be extended to monitor corporate databases. Imagine if you have 20 different databases that you need to monitor. Just define the metrics, and then leave the system to do the rest for you. Of course, you will have to invest in the development, which is quite easy once you master the game.


The Code

Before we proceed, note that the base data (only id and name) is currently stored in a JSON file (devices.json) as follows, though one can get the data dynamically directly from the Active Directory using LDAP.

{
	"devices": [
		{"id":"1", "record_name": "Server1"}
		, {"id":"2", "record_name": "Server2"}
		, {"id":"3", "record_name": "Server3"}
		, {"id":"4", "record_name": "Server4"}
		, {"id":"5", "record_name": "Server5"}
		, {"id":"6", "record_name": "Server6"}
		, {"id":"7", "record_name": "Server7"}
		, {"id":"8", "record_name": "Server8"}
		, {"id":"9", "record_name": "Server9"}
		, {"id":"10", "record_name": "Server10"}
		, {"id":"11", "record_name": "Server11"}
		, {"id":"12", "record_name": "Server12"}
		, {"id":"13", "record_name": "Server13"}
		, {"id":"14", "record_name": "Server14"}
	]
}


We then proceed to the Class definition (Model)

class Device:

    def __init__(self, id, record_name):
        self.id = id
        self.record_name = record_name

    def device_status(self, refresh):
        # dynamically create attributes; similar to setattr(object, name, value)
        # use Windows Management Instrumentation (WMI) logic inside

        #c = wmi.WMI() if (computer == localComputer) else wmi.WMI(computer=computer, user=user, password=password, find_classes=False)
        pythoncom.CoInitialize()
        c = wmi.WMI()

        #driveType = {0: 'Unknown', 1: 'No Root Directory', 2: 'Removable Disk', 3: 'Local', 4: 'Network Drive', 5: 'Compact Disc', 6: 'RAM Disk'}

        self.record_name = 'Server' + str(self.id) #c.query('SELECT * FROM Win32_ComputerSystem')[0].Name
        self.os_version = [(o.Caption, o.Version) for o in c.Win32_OperatingSystem()][0]
        self.uptime = None

        if not refresh: #refresh is False only for the first time, otherwise True
            self.disk_space, self.memory, self.processes, self.last_updated, self.notes, self.online = ('*')*6
        else:
            self.disk_space = [(disk.Caption, round(float(disk.FreeSpace)/pow(1024.0, 3), 2), round(float(disk.Size)/pow(1024.0, 3), 2)) for disk in sorted(c.Win32_LogicalDisk(DriveType=3), key=lambda x: x.Caption, reverse=False) if (disk.FreeSpace != None)][0:5] #only first 5 disks
            self.memory = [(mem.AvailableMBytes, str(mem.PercentCommittedBytesInUse)+'%') for mem in c.Win32_PerfFormattedData_PerfOS_Memory(['AvailableMBytes', 'PercentCommittedBytesInUse'])][0] #SystemCodeTotalBytes
            #self.memory = int([mem.AvailableBytes for mem in c.Win32_PerfFormattedData_PerfOS_Memory()][0]) #SystemCodeTotalBytes
            cpu_load = [cpu.LoadPercentage for cpu in c.Win32_Processor()]
            self.cpu_load = int(sum(cpu_load) / len(cpu_load))  # avg all cores/processors
            self.processes = '*' # [(process.Name, process.PageFileUsage) for process in sorted(c.Win32_Process(['Name', 'PageFileUsage']), key=lambda x: x.PageFileUsage, reverse=True)][0:2] # [s.Caption for s in c.Win32_Service(StartMode='Auto', State='Stopped') if not 'Delayed' in s.StartMode]
            self.last_updated = datetime.now().strftime('%H:%M:%S')
            self.notes = 'notes' + str(self.id) #todo
            self.online = False if int(self.id)%5==0 else True #todo

            secs_up = [int(w.SystemUpTime) for w in c.Win32_PerfFormattedData_PerfOS_System()][0]
            hours_up = secs_up / 3600
            #days_up = hours_up/24

            if (hours_up > 168 ): #1 week has 168 hrs
                uptime = str(int(hours_up/168)) + 'w, ' + str(int((hours_up%168)/24)) + 'd'
            elif (hours_up > 24 and hours_up <= 168): #days
                uptime = str(int(hours_up/24)) + 'd, ' + str(int(hours_up%24)) + 'h'
            else: #hours
                uptime = str(int(hours_up)) + 'h'

            self.uptime = uptime
            self.last_reboot = [w.LastBootUpTime[6:8] + '/' + w.LastBootUpTime[4:6] + '/' + w.LastBootUpTime[0:4] + ', ' + w.LastBootUpTime[8:10] + ':' + w.LastBootUpTime[10:12] for w in c.Win32_OperatingSystem()][0]

        #add to json file
        with open(os.path.join(basedir, app.config['DICTIONARY_FOLDER'], f'device_status.json'), 'r+') as file:
            # format - {"Server1": {"version": "version1", "uptime": 11}}
            device_dict = {self.record_name: {'version': self.os_version, 'uptime': self.uptime}}
            data = json.load(file) # same as data = json.loads(file.read()) #loads for loading string, load for loading file
            data.update(device_dict) #i think it either appends or updates
            file.seek(0)
            file.truncate() #without truncating an etra '}' is created at the very end of the file, leading t a malformed json file
            json.dump(data, file, indent=3)


    # Convert Python list of objects to JSON
    def dump(self):
        return {"DeviceList":
                    {'id': self.id,
                    'record_name': self.record_name,
                    'os_version': self.os_version,
                    'online': self.online,
                    'disk_space': self.disk_space,
                    'memory': self.memory,
                    'processes': self.processes,
                    'last_updated': self.last_updated,
                    'notes': self.notes,
                    'uptime': self.uptime,
                    'last_reboot': self.last_reboot,
                    'cpu_load': self.cpu_load
                    }
                }



Then we go to the Controller
def update_devices(refresh):

    #define devices object
    with open(os.path.join(basedir, app.config['DICTIONARY_FOLDER'], f'devices.json')) as file:
        device_dict = json.load(file) #deserialising json into dict

    devices = []
    for device in device_dict.get('devices'):
        devices.append(Device(device['id'], device['record_name']))

    for device in devices:
        device.device_status(refresh) #device_status is an instance method

    return devices


@app.route('/refresh_devices/', methods=['GET'])
@login_required
@logging
#@exception_handler
def refresh_devices():
    result = update_devices(True) #list of objects
    #### Convert single Python object to JSON
    # result = json.dumps(result.__dict__) # where object.__dict__ is the dictionary representation of Python object
    #### Convert Python list of objects to JSON
    result = json.dumps([r.dump() for r in result], indent=3) #dump() is an instance method (user-defined)
    return result


@app.route('/list_devices/', methods=['GET'])
@login_required
@logging
@exception_handler
@admin_only  # disable this for non-admin-restricted functions, and instead test for roles inside function
def list_devices():
    """
    List all returned records and display in template
    """
    devices = update_devices(False) #list of objects
    number_records = len(devices)

    return render_template('tableDevicesDiv.html', rows_per_page=session['rows_per_page'], whichList='Devices', title='eToolbox', devices=devices, numberRecords=number_records)



Then the View (template)
<!-- Copyright - Richard Orama, richardorama0@gmail.com - 2022 -->

{% extends "base.html" %}

{% block tableDevicesDiv %}

		<script type="text/javascript" charset="utf-8">
			window.onload = function() { //first wait till loaded, otherwise JQuery wont be available yet
				sessionStorage.setItem("whichList", "{{whichList}}");
				speechMessage("Devices page");
			}
		</script>

		<!-- load columns here sparingly. it can slow down system if you load unnecessarily -->

		<div class="table-responsive listTableDiv" id="mainTableDiv">
			<!-- table behavior below is erratic. specifying class="table-lg"  makes a small table instead -->
			<table class="table-sm listTable" id='mainTableDevice'>
				<thead class='highlight'>
					<tr class="noSelectedRow">
						<th style='width: 50px;' align='right'> <a id='btnAddDevice' class="fa fa-plus noSpinner" onclick="addDevice();" style='cursor: pointer; color: blue; text-decoration: none;' title='Add Device'></a> </th>
						<th style="width: 50px;">ID</th>
						<th>Name</th>
						<th style="width: 15%;">OS Version</th>
						<th style="width: 15%;">Disk Space</th>
						<th>Memory</th>
						<th>Processes</th>
						<th>Last Updated</th>
						<th>Notes</th>
						<th style='display: none;'></th>
						<th id='actions' style='width: 40px; text-align: center;' colspan='1'>Actions</th> <!-- set  and colspan to match number of VISIBLE columns in actions below -->
					</tr>
				</thead>
				<tbody id="mainTableTbody" class="listTableTbody">
					{% for device in devices %}
						{% set id=device['id'] %}
						{% set record_name=device['record_name'] %}
						{% set os_version=device['os_version'] %}
						{% set online=device['online'] %}
						{% set disk_space=device['disk_space'] %}
						{% set memory=device['memory'] %}
						{% set processes=device['processes'] %}
						{% set uptime=device['uptime'] %}
						{% set last_reboot=device['last_reboot'] %}
						{% set cpu_load=device['cpu_load'] %}
						{% set last_updated=device['last_updated'] %}
						{% set notes=device['notes'] %}
						{% set enabled=True %}

						{# note: to suppress None fom appearing in table, use "value if value" as below #}

						<tr data-id={{id}} data-desc="{{record_name}}" data-model="Device" last_updated="{{last_updated}}" data-enabled="{{enabled}}" data-online="{{online}}" data-uptime="{{uptime}}" data-last_reboot="{{last_reboot}}" data-cpu_load="{{cpu_load}}" {% if not online %} style="color: red;" {% endif %} ondblclick="if (!loggedIn('{{current_user.email}}')) return false; viewEditDevice($(this).index(), {{id}}, '{{current_user.email}}', 'Edit', false)" title='{{uptime}}--{{last_reboot}} Double-click to edit {{id}}'>
							<td style='width: 50px;' align='right'> {{ loop.index }} </td>
							<td name="id" style="width: 50px;"> {{ id if id }}</td>
							<td name="record_name"> {{ record_name if record_name }}</td>
							<td name="os_version" style="width: 15%;"> {{ os_version if os_version }} </td>
							<td name="disk_space" style="width: 15%;"> {{ disk_space if disk_space }} </td>
							<td name="memory"> {{ memory if memory }} </td>
							<td name="processes"> {{ processes if processes }} </td>
							<td name="last_updated"> {{ last_updated if last_updated }} </td>
							<td name="notes"> {{ notes if notes }} </td>
							<!-- all links with prompts s.a. Sure to delete, etc must not start spinning immediately (in case it is cancelled). Therefore add class noSpinner -->
							<td style="width: 40px; text-align: center;"><a class="fa fa-eye" style="color:blue; cursor: pointer;" data-id={{id}} onclick="viewEditDevice($(this).parent().parent().index(), {{id}}, '{{current_user.email}}', 'View', false)" style='cursor: pointer;' title="View id: {{id}} {{record_name}}"></a></td>
						</tr>

					{% endfor %}
				</tbody>
			</table>
		</div>


{% endblock %}


and finally the Javascript handler

document.addEventListener("DOMContentLoaded", (event) => {  //$(document).ready(function() {

	var timeInterval = 60000 * 5; //every 5 minutes

	intervalLogout = setInterval(function() {
    	refresh_devices(); //refresh devices
	}, timeInterval);
});	


const refresh_devices = () => {

	fetch(`/refresh_devices`, {
		method: 'GET', // or 'PUT'
		headers: {'Content-Type': 'application/json', 'X-Source-From-Fetch': 'True', 'X-Exempt-Spinner': 'True', },
		//body: JSON.stringify(data), //required for POST
	})
	.then(response => response.json()) //de-serialize/parse json to javascript object, can now access using response["key"] or response.key
	.then(data => {
		try {
			// update the device values without refreshing page
			// access attributes like data[0]["DeviceList"]["memory"];
			//loop over table rows and update row based on data

			var tbl = dqsa("#mainTableDevice tbody tr td:nth-child(2)");
			tbl.forEach(e => {

				//find object with the Id (content of the given cell - second column)

				// let myObj = data.find(obj => {
				// 	// Returns the object where the given property has some value
				// 	return parseInt(obj["DeviceList"]["id"]) === parseInt(e.textContent.toString());
				// })

				let myObj = data.find(obj => parseInt(obj["DeviceList"]["id"]) === parseInt(e.textContent));
				//let myObj = data.find(item => parseInt(item["DeviceList"]["id"]) === parseInt(e.textContent))
				parentRow = e.parentElement;

				parentRow.setAttribute("data-online", myObj["DeviceList"]["online"]); //alert(parentRow.getAttribute("data-online"));
				parentRow.setAttribute("data-uptime", myObj["DeviceList"]["uptime"]);
				parentRow.setAttribute("data-last_reboot", myObj["DeviceList"]["last_reboot"]);
				parentRow.setAttribute("data-cpu_load", myObj["DeviceList"]["cpu_load"]);

				//if (parentRow.getAttribute("data-online") == "True") parentRow.style.color = "black";
				if (myObj["DeviceList"]["online"]) parentRow.style.color = "black";
				else parentRow.style.color = "red";

				//parentRow.classList.add("selectedRow");
				// use innerHTML instead of textContent to allow for html formatting
				parentRow.cells[1].innerHTML = myObj["DeviceList"]["id"];
				parentRow.cells[2].innerHTML = myObj["DeviceList"]["record_name"];
				parentRow.cells[3].innerHTML = myObj["DeviceList"]["os_version"];
				parentRow.cells[4].innerHTML = myObj["DeviceList"]["disk_space"];
				parentRow.cells[5].innerHTML = myObj["DeviceList"]["memory"];
				parentRow.cells[6].innerHTML = myObj["DeviceList"]["processes"];
				parentRow.cells[7].innerHTML = myObj["DeviceList"]["last_updated"];
				parentRow.cells[8].innerHTML = myObj["DeviceList"]["notes"];
				//dqsa("#mainTableDevice tbody tr").forEach(e => { e.classList.remove("selectedRow") });
			});
			speechMessage("Devices ", "refreshed");
		}
		catch(error) {alert(`File: ${error.fileName} \n Line No: ${error.lineNumber} \n Message: ${error.message}`);
		} //string is not JSON object
	})
	.catch((error) => {
		console.error('Error:', error);
	});
}


The above code, when put together in a proper way, including frontend stuff like Bootstrap, will generate the product earlier shown above. A proper understanding of the WMI code in the instance methods within the class definition is paramount.