9
CREATING AN ASSET AND VULNERABILITY DATABASE

Image

In this chapter, you’ll learn to get the outputs from OpenVAS and Nmap into your Mongo database. You’ll start by exploring some general practices for working with data. Next, you’ll look at the XML output formats of both tools and learn to select the specific data fields you’re interested in. Then you’ll walk through a few Python scripts that will collect all this data, generate Mongo documents, and insert those documents into the database.

Preparing the Database

To design a database, you’ll need to understand your desired outcomes and the analysis you’ll have to do to achieve those outcomes. Then you can think about the data you’ll have to collect and the model you need to make those analyses possible.

In this situation, you want a more secure environment to improve your vulnerability posture. To make that improvement, you need information about the hosts you’re trying to secure, and that information comes in two varieties: persistent and dynamic.

Persistent information doesn’t change (or rarely changes); dynamic information changes frequently. Your environment dictates how a data point is categorized. For example, in some networks, IP addresses are stat­ically assigned, but in others, dynamic host configuration protocol (DHCP) might assign a different address after every reboot or even daily. Persistent data will be collected once and updated as needed. Dynamic data associated with a device will be updated every time a vulnerability scan is run against that device.

Table 9-1 describes the host-based data that we’ll collect from the Nmap and OpenVAS scans and how it’s categorized.

Table 9-1: Relevant Host Data

Data type

Notes

Persistent

Hostname

If available. In some cases, multiple hostnames might be reported.

MAC address

If available.

IP address

IPv4 address. If your environment uses IPv6, you can modify the scripts accordingly to capture this information.

OS / CPE

The OS version that was detected, including CPE if available.

Dynamic

Vulnerabilities

Includes details reported by OpenVAS and a reference to the cve-search CVE entry.

Ports

Ports that are open (listening for incoming data) including number, protocol, and detected service.

Last scan date

Automatically generated.

In the record (document) for each host, you’ll include a vulnerability identifier for each vulnerability discovered on that host. You can use this identifier to relate hosts to specific vulnerability and exploit information. A good deal of data related to each vulnerability is stored in its own dataset (collection, in Mongo parlance). This means that you’ll have a collection of hosts and a collection of vulnerabilities with mappings from the former to the latter.

In other words, vulnerability information is orthogonal to hosts: one host might have one or more vulnerabilities, but each vulnerability is a data item, which you can relate to one or more hosts. The same is true of exploits. Table 9-2 contains the vulnerability data you’ll collect in the scripts later in the chapter.

Table 9-2: Relevant Vulnerability Data

Data type

Notes

CVE/BID ID

CVE or Bugtraq ID for the vulnerability. This is an industry ­standard identifier.

Date reported

When the vulnerability was first reported—by the vendor, by a third party, or by active exploitation.

Affected software

Names and CPEs of software (or OS) affected by the vulnerability.

CVSS

CVSS score for the vulnerability.

Description

Free-form text description of the vulnerability.

Advisory URLs

URLs pointing to advisories about the vulnerability that might ­contain more information.

Update URLs

URLs pointing to update information for addressing the vulnerability.

Understanding the Database Structure

Although MongoDB can accept unstructured data, your scripting and data analysis will be much easier if you have some idea of the data types you’re trying to capture and how you want to structure them in your Mongo documents. If you’re using a relational database, such as SQL, this step is absolutely essential, because you can’t insert data into an empty database with no structure.

RELATIONAL VS. NON-RELATIONAL DATABASES

A full discussion of the difference between database types could be its own substantial volume and is beyond the scope of this book. But I’ll provide a crash course.

A relational database is what most people think of when you talk about databases: it’s a database that contains tables consisting of rows and columns of structured data. Each row is identified by a unique key. Connections between tables are made by sharing key values, so a value in one table might point to an entire row of data in another table—hence the name “relational.”

Databases that don’t share this structure are non-relational databases. A wide variety of databases exist under this heading, including MongoDB. These non-relational databases might be as simple as a list of key-value pairs, but they can also allow you to arbitrarily structure data.

Let’s look at an example. A relational database might have a table labeled NAME with columns FirstName, MiddleName, LastName, like so:

FirstName

MiddleName

LastName

Andrew

Philip

Magnusson

Jorge

Luis

Borges

But a rigid structure like this doesn’t always make sense: the standard first-middle-last paradigm can be difficult to map onto names in other cultures. For instance, Jorge Luis Borges, the Argentine author, had several more names: Jorge Francisco Isidoro Luis Borges Acevedo. In MongoDB, you might have a collection (roughly analogous to a table) called NAME that can include all sorts of name structures in their own ­documents (roughly analogous to data rows), for instance:

{
    "FirstName":"Jorge",
    "MiddleNames": ["Francisco","Isidoro","Luis"],
    "LastNames":["Borges","Acevedo"]
}

A simpler name might just have FirstName and a single value in LastNames:

{
    "FirstName":"Alexander",
    "LastNames":"Lovelace"
}

Each type of database has its advantages. Relational databases have a predefined data structure that makes queries, indexing, and database maintenance very fast but at a cost: they have a rigid structure that cannot be changed without major effort, especially once the database is being used in a production environment. Non-relational databases let you define your data structure more loosely and change it as you go, offering flexibility. But they’ll never be quite as fast, and you can’t count on specific data fields existing in your documents. In the preceding NAME data example, you’ll need to make sure your code is sufficiently robust to not crash when a MiddleNames field isn’t present

Now that you have an idea of the kinds of data you want to collect, you can start building your database structure. In this section, we’ll look at how you might represent this data in Mongo or in SQL formats.

Listing 9-1 shows an example of host data in JavaScript Object Notation (JSON) format, which is representative of the MongoDB internal data structure. Strictly speaking, Mongo stores its data in binary JSON (BSON), a more compact way of representing the same data as JSON. But for the purposes of interacting with Mongo, you’ll use JSON.

{
  "_id" : ObjectId("57d734bf1d41c8b71daaee0e"),
  "mac" : {
        "vendor" : "Apple",
        "addr" : "6C:70:9F:D0:31:6F"
    },
    "ip" : "10.0.1.1",
  "ports" : [
        {
            "state" : "open",
            "port" : "53",
            "proto" : "tcp",
            "service" : "domain"
        },
--snip--
   ],
   "updated" : ISODate("2020-01-05T02:19:11.966Z"),
  "hostnames" : [ 
"airport", 
"airport.local" 
],
    "os" : [
        {
            "cpe" : [
                "cpe:/o:netbsd:netbsd:5"
            ],
            "osname" : "NetBSD 5.0 – 5.99.5",
            "accuracy" : "100"
        }
    ],
  "oids" : [
        {
            "proto" : "tcp",
            "oid" : "1.3.6.1.4.1.25623.1.0.80091",
            "port" : "general"
        }
    ]
}

Listing 9-1: An example JSON host description document

NOTE

In the JSON format, keys and values are delineated by double quotes. A key is a unique string that labels the following value. A value can be a simple string, a nested JSON document (delineated by curly brackets), or a list of strings or nested documents (surrounded by square brackets). This format lets you build a sophisticated data structure that is easy to parse and traverse.

In Listing 9-1, the _id field , auto-generated by Mongo, uniquely identifies the document within the database. The value of the mac field is a nested document that contains the MAC address and the MAC vendor. The ports key contains a list of documents that each contain information about an open port. Because a host often has different hostnames depending on which one you ask—domain name system (DNS) servers might use one name and a NetBIOS lookup something else—hostnames is a list instead of a single value. The oids key contains a list of documents containing an OID, a protocol, and a port that OID was detected on. The OID is a unique vulnerability identifier generated by OpenVAS that you’ll use to map vulnerabilities to hosts. In the vulnerabilities collection, there will be one unique document (representing a specific vulnerability) for each unique OID.

SQL TABLE STRUCTURE

If you’re using SQL, you’ll need to know the data type you’ll be storing for each data field to define your database tables. Here are example table definitions you might use in SQL. Keep in mind that the definitions in Listing 9-2 are not perfectly optimized and there are ways to improve the structure for large databases—see “Customize It” on page 86 for more. (The following definitions are for MySQL; you might need to adjust for other SQL flavors.)

 CREATE TABLE hosts
       (macid CHAR(17), macvendor VARCHAR(25),
       ip VARCHAR(15), hostname VARCHAR(100),
     updated DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
       id INT AUTO_INCREMENT PRIMARY KEY);
 CREATE TABLE ports
       (id INT AUTO_INCREMENT PRIMARY KEY,
     host_id INT NOT NULL, state VARCHAR(6),
       port INT, protocol VARCHAR(3), service VARCHAR(25),
     FOREIGN KEY(host_id) REFERENCES hosts(id));
 CREATE TABLE os
       (id INT AUTO_INCREMENT PRIMARY KEY,
       cpe VARCHAR(50), osname VARCHAR (50), accuracy INT,
       FOREIGN KEY(host_id) REFERENCES hosts(id));
 CREATE TABLE hostoid
       (id INT AUTO_INCREMENT PRIMARY KEY,
       FOREIGN KEY(oid_id) REFERENCES oids(id),
       FOREIGN KEY(host_id) REFERENCES hosts(id));

Listing 9-2: MySQL table definitions for host data

These commands will create tables in an existing SQL database. Because SQL doesn’t nest data directly, as you can do in Mongo, you’ll need to split your data into multiple tables. Here you have table definitions for hosts , ports , and OS information and a table mapping hosts to OIDs . You’ll need to use keys to bring it all together.

A key in SQL identifies an individual record in a specific table, mostly by using foreign keys in other tables that refer back to the original table. For example, in the ports table is a field (host_id ) that is explicitly defined as a foreign key : the id field in the hosts table. This key lets you query the database and find port information for a specific host. The same lines are in the os definition, and all together the three tables are linked to provide direct access to all the persistent host information you need. An updated field is automatically changed every time a hosts record is changed.

Now let’s look at the vulnerability data in Listing 9-3. The JSON for a vulnerability document contains all the information listed in Table 9-2, as well as some extra fields reported by the OpenVAS scanner. If space is at a premium, you might not need to record all this information in your database. But if you have the space, it can’t hurt to keep it around for future use.

{
    "_id" : ObjectId("57fc2c891d41c805cf22111b"),
  "oid" : "1.3.6.1.4.1.25623.1.0.105354",
    "summary" : "The remote GSA is prone to a default account authentication 
                bypass vulnerability.",
    "cvss" : 10,
    "vuldetect" : "Try to login with default credentials.",
    "solution_type" : "Workaround",
    "qod_type" : "exploit",
    "bid" : "NOBID",
    "threat" : "High",
    "description" : null,
    "proto" : "tcp",
    "insight" : "It was possible to login with default 
                credentials: admin/admin",
    "family" : "Default Accounts",
    "solution" : "Change the password.",
    "xref" : "NOXREF",
    "port" : "443",
    "impact" : "This issue may be exploited by a remote attacker to gain 
               access to sensitive information or modify system 
               configuration.",
  "cve" : [
        "NOCVE"
    ],
    "name" : "GSA Default Admin Credentials",
    "updated" : ISODate("2016-10-11T00:04:25.596Z"),
    "cvss_base_vector" : "AV:N/AC:L/Au:N/C:C/I:C/A:C"
}

Listing 9-3: An example JSON vulnerability description document

As with Listing 9-1, much of this data is fairly self-explanatory, but there are a few important points to note. The oid value can be added directly to the list of oids in the host document: each vulnerability that OpenVAS finds on a host will be given a separate OID. So the OID will be put into the host document, and the details of the OID will be in the vulnerability collection. When you need to report on vulnerabilities found on a given host, you’ll first retrieve the host record and then retrieve the records associated with any OIDs recorded for that host. The cve key has a list as its value, because individual vulnerabilities are often associated with more than one CVE. In this example, the only CVE reported is NOCVE, which is a standard placeholder when MITRE has not assigned a CVE ID to a vulnerability.

As you proceed through the examples that follow, consider how the scripts build documents in Mongo and make any necessary adjustments for your own needs.

Customize It

Think carefully about your needs and customize the information you collect accordingly. For example, let’s say your organization has specific VLANs set up for different purposes. You can add a key-value pair in your Mongo database to specify which VLAN a host is located on, or customize your scripts to determine the VLAN from the host’s IP address, to facilitate analysis of which hosts are on which network segments. Depending on your network configuration, this might require consulting external systems or databases.

Return to your data definitions as you think about the new information you’re gathering and how you want to store it. If you’re using an unstructured database like Mongo, it’s easy enough—just add key-value pairs. But if you’re using SQL, you’ll need to reconfigure your database to define those new data fields.

If you’re using SQL, you can optimize the table structure to save some space when handling large datasets. For example, instead of a ports table that maps directly to hosts records using records like port, service, ­protocol, host ID, you could have a third table, host-to-port mappings, so any one port record could map to any number of hosts, and vice versa. Make a hosts table with host ID and other fields; a ports table with port, service, protocol records; and a host-port table with port ID and host ID records. In small environments, the difference is minimal, but in larger organizations, the space savings could be substantial.

Getting Nmap into the Database

Now you’ll start connecting the tools you’ve deployed. First, you’ll need to write a script to input your Nmap scan data into the Mongo database.

Defining the Requirements

As with any script, decide what you want your ingestion script to accomplish: namely, collecting the host data laid out in Table 9-1. We’ll start this process by looking at the Nmap XML output to define which portions of it are important.

Listing 9-4 shows a segment of XML output from Nmap (run with the OS detection flag -O).

  --snip--
  <host starttime="1473621609" endtime="1473627403"><status state="up"
  reason="arp-response" reason_ttl="0"/>
 <address addr="10.0.1.4" addrtype="ipv4"/>
 <address addr="B8:E8:56:15:68:20" addrtype="mac" vendor="Apple"/>
 <hostnames>
  </hostnames>
 <ports><extraports state="filtered" count="997">
  <extrareasons reason="no-responses" count="997"/>
  </extraports>
  <port protocol="tcp" portid="22"><state state="open" reason="syn-ack" 
  reason_ttl="64"/><service name="ssh" method="table" conf="3"/></port>
  --snip--
  </ports>
 <os><portused state="open" proto="tcp" portid="22"/>
  <osmatch name="Apple Mac OS X 10.10.2 (Darwin 14.1.0)" 
  accuracy="100" line="4734">
  <osclass type="general purpose" vendor="Apple" osfamily="Mac OS X" osgen="10.10.X" accuracy="100"><cpe>cpe:/o:apple:mac_os_x:10.10.2</cpe>
  </osclass>
  </osmatch>
  <osmatch name="Apple Mac OS X 10.7.0 (Lion) - 10.10 (Yosemite) 
  or iOS 4.1 - 8.3 (Darwin 10.0.0 - 14.5.0)" accuracy="100" line="6164">
  --snip--
  <osclass type="phone" vendor="Apple" osfamily="iOS" osgen="4.X" 
 accuracy="100"><cpe>cpe:/o:apple:iphone_os:4</cpe></osclass>
  <osclass type="phone" vendor="Apple" osfamily="iOS" osgen="5.X" 
  accuracy="100"><cpe>cpe:/o:apple:iphone_os:5</cpe></osclass>
  --snip--

Listing 9-4: Excerpt from Nmap XML output

We need to parse out the IP address ; MAC address ; hostname (if it’s available—it’s not here); open ports with protocol, port number, its state (open or closed), and a guess at the service running on that port; and OS matches . The MAC address also depends on availability: if the destination host is more than one hop from the scanner, the MAC address likely belongs to a router or switch rather than the actual host.

We record all the returned osmatch values along with corresponding CPE labels and accuracy tags as a list to reflect Nmap’s uncertainty about the match. In this example, multiple CPEs are reported as a 100 percent accuracy match; when you produce a report for this host, you’ll have to report all, none, or choose one based on other criteria.

You need to associate all this information with a single host document and distinguish that host document using a field that’s present in every scan result. The hostname and MAC addresses might not be present or accurate, so we use the IP address. If the IP changes regularly in your DHCP environment, a Windows NetBIOS name might be a better choice, that is, if you can guarantee continuity, because NetBIOS names must be unique per Windows domain.

You must also decide whether you want to create new host documents for each scan or update existing documents. The reason is that in most use cases, only some of the data changes from scan to scan—most prominently, the list of vulnerabilities. It will save time and effort to update an existing document with the new information you collect.

Building the Script

In Listing 9-5, IP addresses are authoritative, and new data for existing hosts will overwrite old data. Of course, your requirements might differ. The script iterates through an Nmap output XML file and inserts relevant information into a Mongo database.

All the required information for a host is contained within a host tag, so you need a simple loop to find each host tag, pull the appropriate subtags, and then do a Mongo document insert for each host into the Mongo hosts database.

  #!/usr/bin/env python3

 from xml.etree.cElementTree import iterparse
   from pymongo import MongoClient
   import datetime, sys

 client = MongoClient('mongodb://localhost:27017')
 db = client['vulnmgt']

   def usage():
       print ('''
   Usage: $ nmap-insert.py <infile>
       ''')

 def main():
       if (len(sys.argv) < 2): # no files
          usage()
          exit(0)

      infile = open(sys.argv[1], 'r')

      for event, elem in iterparse(infile):
          if elem.tag == "host":
                # add some defaults in case these come up empty
                macaddr = {}
                hostnames = []
                os = []
                addrs = elem.findall("address")
                # all addresses, IPv4, v6 (if exists), MAC
                for addr in addrs:
                    type = addr.get("addrtype")
                    if (type == "ipv4"):
                        ipaddr = addr.get("addr")
                    if (type == "mac"): # there are two useful things to get here
                        macaddr = {"addr": addr.get("addr"),
                                    "vendor": addr.get("vendor")}

                hostlist = elem.findall("hostname")
                for host in hostlist:
                    hostnames += [{"name": host.get("name"),
                                    "type": host.get("type")}]
   
                # OS detection
                # We will be conservative and put it all in there.
                oslist = elem.find("os").findall("osmatch")
                for oseach in oslist:
                    cpelist = []
                    for cpe in oseach.findall("osclass"):
                        cpelist += {cpe.findtext("cpe")}
                    os += [{"osname": oseach.get("name"),
                            "accuracy": oseach.get("accuracy"),
                            "cpe": cpelist}]

                portlist = elem.find("ports").findall("port")
                ports = []
                for port in portlist:
                    ports += [{"proto": port.get("protocol"),
                            "port": port.get("portid"),
                            "state": port.find("state").get("state"),
                            "service": port.find("service").get("name")
                                }]
                elem.clear()
   
             host = {"ip": ipaddr,
                        "hostnames": hostnames,
                        "mac": macaddr,
                        "ports": ports,
                        "os": os,
                        "updated": datetime.datetime.utcnow()
                      }
             if db.hosts.count({'ip': ipaddr}) > 0:
                   db.hosts.update_one(
                           {"ip": ipaddr},
                           {"$set": host}
                           )
                else:
                    db.hosts.insert(host)
  infile.close() # We're done.

main()

Listing 9-5: The nmap-insert.py code listing for Nmap database insertion

We import iterparse from the xml library for the XML parsing, MongoClient from pymongo to interact with the database, and datetime and sys for generating the current date and file read/writes, respectively. Fill in your Mongo server IP and database information.

We encapsulate the main logic in a main() function that we call at the end of the script listing. This function first opens the input file , which is passed as an argument to the script, loops through every XML element , and gathers details from each host element . Then it inserts or updates a Mongo document with that information for each host . The script takes the IP address as the canonical identifier of the host, creates a new document if the IP doesn’t exist yet, and updates an existing document if that IP is associated with it. Once the script runs out of XML to parse, it closes the input file and exits .

Customize It

If your organization uses IPv6, or if IPv6 has finally taken over when you’re reading this, record the IPv6 address instead of ignoring it as Listing 9-5 does. Keep in mind that IPv6 is no more authoritative than IPv4; a single host might have multiple (and changing!) IPv6 addresses.

You can modify the script in Listing 9-5 to capture more (or all) of the Nmap output for the database. Say you want to track Nmap’s run statistics. Listing 9-6 shows a Python snippet example to parse the runstats XML block, which for clarity you might place before the main if elem.tag == "host": statement.

         if elem.tag == "runstats":
             finished = elem.find("finished")
             hosts = elem.find("hosts")
             elapsed = finished.get("elapsed")
             summary = finished.get("summary")
             hostsUp = hosts.get("up")
             hostsDown = hosts.get("down")
             hostsTotal = hosts.get("total")

Listing 9-6: Python snippet to parse runstats block

You can also add your own key-value pairs to the Mongo document. For example, a high-value host tag will help prioritize vulnerabilities, because vulnerabilities on high-value hosts are more pressing than those on normal or low-value systems.

You can write one script to collect the data and insert it into the database. Although this is more complicated to build, it simplifies scheduling because you only need to run one script rather than several. (Hint: use the -oX - flags to send the Nmap output to STDOUT, which you can then pipe into the input to the insertion script.)

Instead of manually parsing the XML, you can use libraries. Two Python modules are available to control Nmap and parse its results: python-nmap and python-libnmap. Try both to see which you prefer.

BUILD OR BORROW?

A well-known dilemma you might face is whether you should write your own code or use existing Python modules to do much of the work for you. Using existing modules eliminates a lot of manual work, but it leads to more software packages to think about and keep up-to-date. In this script, I chose manual coding and more control, but both options are perfectly legitimate.

Getting OpenVAS into the Database

Once you’ve extracted the relevant Nmap data and inserted it into the database, the next step is to do the same for OpenVAS.

Defining the Requirements

OpenVAS formats its output by result, which for OpenVAS means any findings, from service detection to specific vulnerabilities. Listing 9-7 shows a result for CVE-2016-2183 and CVE-2016-6329 on host 10.0.1.21.

--snip--
<result id="a3e8107e-0e6c-49b0-998b-739ef8ae0949">
      <name>SSL/TLS: Report Vulnerable Cipher Suites for HTTPS</name>
        <comment/>
        <creation_time>2017-12-29T19:06:23Z</creation_time>
        <modification_time>2017-12-29T19:06:23Z</modification_time>
        <user_tags>
          <count>0</count>
        </user_tags>
      <Host>10.0.1.21<asset asset_id="5b8d8ed0-e0b1-42e0-
        b164-d464bc779941"/></host>
      <port>4000/tcp</port>
        <nvt oid="1.3.6.1.4.1.25623.1.0.108031">
          <type>nvt</type>
        <name>SSL/TLS: Report Vulnerable Cipher Suites for HTTPS</name>
          <family>SSL and TLS</family>
        <cvss_base>5.0</cvss_base>
        <cve>CVE-2016-2183, CVE-2016-6329</cve>
        <bid>NOBID</bid>
        <xref>URL:https://bettercrypto.org/, URL:https://mozilla.github.io
          /server-side-tls/ssl-config-generator/, URL:https://sweet32.info/
          </xref>
        <tags>cvss_base_vector=AV:N/AC:L/Au:N/C:P/I:N/A:N|summary=This 
          routine reports all SSL/TLS cipher suites accepted by a 
          service where attack vectors exists only on HTTPS services.
          |solution=The configuration of this service should be changed so 
          that it does not accept the listed cipher suites anymore.
         
Please see the references for more resources supporting you with this task.
|insight=These rules are applied for the evaluation of the vulnerable 
cipher suites:

  - 64-bit block cipher 3DES vulnerable to the SWEET32 attack (CVE-2016-2183).
  |affected=Services accepting vulnerable SSL/TLS cipher suites via HTTPS.
  |solution_type=Mitigation|qod_type=remote_app</tags>
        <cert>
           <cert_ref id="CB-K17/1980" type="CERT-Bund"/>
           <cert_ref id="CB-K17/1871" type="CERT-Bund"/>
           <cert_ref id="CB-K17/1803" type="CERT-Bund"/>
           <cert_ref id="CB-K17/1753" type="CERT-Bund"/>
--snip--
     </result>
--snip--
</results>

Listing 9-7: Example result block from an OpenVAS XML scan report

So what information do you want from this scan? Recall that Table 9-2 identified relevant vulnerability data: cve , bid , date reported, affected software, CVSS , description ➊➍➒, advisory URLs ➑➓, and update URLs. OpenVAS reports most of this information, as well as the host and port associated with the finding.

The cert section includes links to known computer emergency response team (CERT) advisories. Although the sample script in Listing 9-8 ignores this section, parse this data if it’s important to you.

Mapping Vulnerabilities to Hosts

Most important is how you’ll structure all the data. The two different sets of data, vulnerabilities and hosts, have implicit mappings between them: host A has vulnerabilities X, Y, Z. Vulnerability X is on hosts A, B, C. There are two obvious ways to represent this mapping. Each host can have a list of vulnerabilities it’s subject to: host A would have a vulnerabilities list of X, Y, Z within its structure. Alternatively, you could use the same mapping on the vulnerability side. Vulnerability X would have a host tag that contains the list of hosts A, B, C.

Both options are valid, and both are one-sided. If you store the data with the vulnerabilities, host reporting is difficult: you have to look for all the places that host A appears across the entire vulnerability database. The reverse is true if you store all the vulnerability IDs with the hosts. In addition, if you store both mappings in both places, you risk ending up with stale or orphaned mappings. Choose one depending on whether you want to more easily report on vulnerabilities (and which hosts are affected) or hosts (and which vulnerabilities they have).

The script in Listing 9-8 goes with option 1: embedding vulnerability identifiers in each host document. Host documents are likely to be longer-lived than vulnerability documents. If your hosts are subject to a regular patching regimen (which I understand is a tall order in some organizations), the host documents will remain for the long term. But the vulnerability documents, because they’re patched on a regular basis, will age out of the database, using scripts you’ll see in Chapter 10. If this assumption doesn’t hold in your organization, you might want to use option 2.

MAPPINGS IN SQL

There is a third mapping option that makes sense when you’re using SQL. If you store the mappings in a third table with 1:1 mappings of hosts and vulnerabilities, you only need to search in one place for both kinds of reporting. For example, “host A has vulnerability X” would be one record. Another would be “host A has vulnerability Y” and so on. When reporting, you’d first find the mappings you’re interested in (“all vulnerabilities on host A”) and then use the other two tables to flesh out the details of host A and its vulnerabilities X, Y, and Z. In a schematic, it would look something like this:

  1. Query mapping table: find all records pertaining to host A and gather associated vulnerabilities in set B.
  2. Query host table: find details of host A.
  3. Query vulnerabilities table: find all records pertaining to vulnerability set B.

Experienced SQL users do this with JOIN statements in a single query; amateurs like me find it easier to run a few queries in sequence. This third solution is an example of database normalization. For more information, consult Wikipedia or your nearest computer bookshelf.

Building the Script

The script in Listing 9-8 iterates over the result tags, pulls the relevant data, and then sends that data to the database, keying off the OID as an authoritative vulnerability identifier, as discussed earlier in the section “Understanding the Database Structure” on page 81.

To build the vulnerability mapping, you must parse through the entire set of returned documents and build a list of which vulnerabilities apply to which hosts. Then replace the previous vulnerability list for each host with the new list.

  #!/usr/bin/env python3

  from xml.etree.cElementTree import iterparse
  from pymongo import MongoClient
  import datetime, sys

  client = MongoClient('mongodb://localhost:27017')
  db = client['vulnmgt']

  # host - OIDs map
 oidList = {}

  def usage():
      print ('''
  Usage: $ openvas-insert.py <infile>
      ''')

  def main():
      if (len(sys.argv) < 2): # no files
          usage()
          exit(0)

      infile = open(sys.argv[1], 'r')

      for event, elem in iterparse(infile):
          if elem.tag == "result":
              result = {}

            ipaddr = elem.find("host").text
              (port, proto) = elem.find("port").text.split('/')
              result['port'] = port
              result['proto'] = proto
              nvtblock = elem.find("nvt") # a bunch of stuff is in here

            oid = nvtblock.get("oid")
              result['oid'] = oid
              result['name'] = nvtblock.find("name").text
              result['family'] = nvtblock.find("family").text

            cvss = float(nvtblock.find("cvss_base").text)
              if (cvss == 0):
                  continue
              result['cvss'] = cvss

              # these fields might contain one or more comma-separated values.
              result['cve'] = nvtblock.find("cve").text.split(", ")
              result['bid'] = nvtblock.find("bid").text.split(", ")
              result['xref'] = nvtblock.find("xref").text.split(", ")

            tags = nvtblock.find("tags").text.split("|")
              for item in tags:
                  (tagname, tagvalue) = item.split("=", 1)
                  result[tagname] = tagvalue
              result['threat'] = elem.find("threat").text
              result['updated'] = datetime.datetime.utcnow()
              elem.clear()

            if db.vulnerabilities.count({'oid': oid}) == 0:
                  db.vulnerabilities.insert(result)


            if ipaddr not in oidList.keys():
                  oidList[ipaddr] = []
              oidList[ipaddr].append({'proto': proto, 'port': port, 'oid': oid})


           for ipaddress in oidList.keys():
              if db.hosts.count({'ip': ipaddress}) == 0:
                  db.hosts.insert({'ip': ipaddress,
                                      'mac': { 'addr': "", 'vendor': "Unknown"
},
                                      'ports': [],
                                      'hostnames': [],
                                      'os': [],
                                      'updated': datetime.datetime.utcnow(),
                                      'oids': oidList[ipaddress]})
              else:
                  db.hosts.update_one({'ip': ipaddress},
                                      {'$set': {  'updated': 
                                      datetime.datetime.utcnow(),
                                          'oids': oidList[ipaddress]}})

      infile.close() # we're done

  main()

Listing 9-8: The openvas-insert.py code listing for OpenVAS database insertion

As with nmap-insert.py (Listing 9-5), you iterate over each result, collecting the information you require. First, you get the vulnerable host’s IP address . Next, from subtags of the nvt tag, you get the OID (to identify the vulnerability) ; the CVSS score (ignoring any vulnerabilities with a CVSS score of 0) ; and the cve, bid, and xref fields (which contain one or more comma-separated values). Then you get the key-value pairs from tags, a free-form section in each vulnerability record that separates keys and values using a pipe character (|). Because you can’t know ahead of time what will or won’t be in that field, this script simply parses all key-value pairs and adds them as is into the Mongo vulnerabilities document along with the other data . If the vulnerability already exists in the vulnerabilities database, the script doesn’t insert anything.

Then you add or update an entry in the host-to-vulnerability map oidList for the host with information on each vulnerability found on that host . Once you’re finished going through all the vulnerabilities, you can use that map to add OIDs to each affected host document in hosts by looping through the dictionary you created previously.

Customize It

If you find other information in the OpenVAS scan results useful, store that too. You could even store the entirety of the scan report data in a Mongo document. But you’d probably want to parse out items like the tags section into separate sections first. If you choose to go this route, it will take up a lot more space!

Because there’s a lot of overlap in OpenVAS with the Nmap results, I skipped importing any results (like open ports) that would duplicate Nmap. You might want to supplement (or overwrite) the Nmap results or only use OpenVAS.

If you’re interested in searching for specific vulnerability categories, you can expand the cvss_base_vector tag before creating the Mongo document (for example, "Access vector": "remote", "confidentiality impact": high" and so on) by parsing this field in a similar way to the tags field, separating keys and values by the : and / characters.

Listing 9-8 uses OIDs as a unique identifier rather than BID/CVE IDs because not all scan results have the latter IDs but every result has an OID. Unfortunately, this leads to another problem: OpenVAS uses the same OID to track multiple instances of the same test on a host. For instance, it might run the “service detection” test on every open port on a host and report them all with the same OID but different descriptions. By only storing one document per OID, this script will overwrite these colliding reports. But it appears that this only occurs in low-importance (0.0 severity) tests, so I ignored those results entirely. Doing so might not suit your use case. I also considered (but eventually decided against) creating a hash from certain values—say, OID, summary, port, and description—and using that as the unique identifier. That way the script would only store a single instance of any given test result but not lose any data. If these low-severity test results are important in your environment, consider replacing OID with hash as the authoritative vulnerability identifier.

Similarly, my solution to the issue of stale vulnerability mappings—deleting all old mappings per host and replacing them—might not work in your environment, especially if you’re pulling vulnerability data from multiple scanners. If you add a scanner tag to the vulnerability mappings in your host documents, you can delete only the appropriate mappings when importing new scan results.

Summary

In this chapter, you took your first steps toward building a working vulnerability management system. Congratulations! With the Nmap and OpenVAS data coming into your database, you can start generating simple reports that provide insight into your organization’s current vulnerability status.

But before creating reports, you need to pause and do a few maintenance tasks. In the next chapter, you’ll explore ways to improve your database structure and search time with indexes. You’ll also write a script that automatically ages out old data to ensure that your reporting contains only fresh, actionable vulnerability information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset