3 HTTP CLIENTS AND REMOTE INTERACTION WITH TOOLS

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3
HTTP CLIENTS AND REMOTE INTERACTION WITH TOOLS

In Chapter 2, you learned how to harness the power of TCP with various techniques for creating usable clients and servers. This is the first in a series of chapters that explores a variety of protocols on higher layers of the OSI model. Because of its prevalence on networks, its affiliation with relaxed egress controls, and its general flexibility, let’s begin with HTTP.

This chapter focuses on the client side. It will first introduce you to the basics of building and customizing HTTP requests and receiving their responses. Then you’ll learn how to parse structured response data so the client can interrogate the information to determine actionable or relevant data. Finally, you’ll learn how to apply these fundamentals by building HTTP clients that interact with a variety of security tools and resources. The clients you develop will query and consume the APIs of Shodan, Bing, and Metasploit and will search and parse document metadata in a manner similar to the metadata search tool FOCA.

HTTP Fundamentals with Go

Although you don’t need a comprehensive understanding of HTTP, you should know some fundamentals before you get started.

First, HTTP is a stateless protocol: the server doesn’t inherently maintain state and status for each request. Instead, state is tracked through a variety of means, which may include session identifiers, cookies, HTTP headers, and more. The client and servers have a responsibility to properly negotiate and validate this state.

Second, communications between clients and servers can occur either synchronously or asynchronously, but they operate on a request/response cycle. You can include several options and headers in the request in order to influence the behavior of the server and to create usable web applications. Most commonly, servers host files that a web browser renders to produce a graphical, organized, and stylish representation of the data. But the endpoint can serve arbitrary data types. APIs commonly communicate via more structured data encoding, such as XML, JSON, or MSGRPC. In some cases, the data retrieved may be in binary format, representing an arbitrary file type for download.

Finally, Go contains convenience functions so you can quickly and easily build and send HTTP requests to a server and subsequently retrieve and process the response. Through some of the mechanisms you’ve learned in previous chapters, you’ll find that the conventions for handling structured data prove extremely convenient when interacting with HTTP APIs.

Calling HTTP APIs

Let’s begin the HTTP discussion by examining basic requests. Go’s net/http standard package contains several convenience functions to quickly and easily send POST, GET, and HEAD requests, which are arguably the most common HTTP verbs you’ll use. These functions take the following forms:

Get(url string) (resp *Response, err error)
Head(url string) (resp *Response, err error)
Post(url string, bodyType string, body io.Reader) (resp *Response, err error)

Each function takes—as a parameter—the URL as a string value and uses it for the request’s destination. The Post() function is slightly more complex than the Get() and Head() functions. Post() takes two additional parameters: bodyType, which is a string value that you use for the Content-Type HTTP header (commonly application/x-www-form-urlencoded) of the request body, and an io.Reader, which you learned about in Chapter 2.

You can see a sample implementation of each of these functions in Listing 3-1. (All the code listings at the root location of / exist under the provided github repo https://github.com/blackhat-go/bhg/.) Note that the POST request creates the request body from form values and sets the Content-Type header. In each case, you must close the response body after you’re done reading data from it.

r1, err := http.Get("http://www.google.com/robots.txt")
// Read response body. Not shown.
defer r1.Body.Close()
r2, err := http.Head("http://www.google.com/robots.txt")
// Read response body. Not shown.
defer r2.Body.Close()
form := url.Values{}
form.Add("foo", "bar")
r3, err = http.Post❶(
    "https://www.google.com/robots.txt",
 ❷ "application/x-www-form-urlencoded",
    strings.NewReader(form.Encode()❸),
)
// Read response body. Not shown.
defer r3.Body.Close()

Listing 3-1: Sample implementations of the Get(), Head(), and Post() functions (/ch-3/basic/main.go)

The POST function call ❶ follows the fairly common pattern of setting the Content-Type to application/x-www-form-urlencoded ❷, while URL-encoding form data ❸.

Go has an additional POST request convenience function, called PostForm(), which removes the tediousness of setting those values and manually encoding every request; you can see its syntax here:

func PostForm(url string, data url.Values) (resp *Response, err error)

If you want to substitute the PostForm() function for the Post() implementation in Listing 3-1, you use something like the bold code in Listing 3-2.

form := url.Values{}
form.Add("foo", "bar")
r3, err := http.PostForm("https://www.google.com/robots.txt", form)
// Read response body and close.

Listing 3-2: Using the PostForm() function instead of Post() (/ch-3/basic/main.go)

Unfortunately, no convenience functions exist for other HTTP verbs, such as PATCH, PUT, or DELETE. You’ll use these verbs mostly to interact with RESTful APIs, which employ general guidelines on how and why a server should use them; but nothing is set in stone, and HTTP is like the Old West when it comes to verbs. In fact, we’ve often toyed with the idea of creating a new web framework that exclusively uses DELETE for everything. we’d call it DELETE.js, and it would be a top hit on Hacker News for sure. By reading this, you’re agreeing not to steal this idea!

Generating a Request

To generate a request with one of these verbs, you can use the NewRequest() function to create the Request struct, which you’ll subsequently send using the Client function’s Do() method. We promise that it’s simpler than it sounds. The function prototype for http.NewRequest() is as follows:

func NewRequest(❶method, ❷url string, ❸body io.Reader) (req *Request, err error)

You need to supply the HTTP verb ❶ and destination URL ❷ to NewRequest() as the first two string parameters. Much like the first POST example in Listing 3-1, you can optionally supply the request body by passing in an io.Reader as the third and final parameter ❸.

Listing 3-3 shows a call without an HTTP body—a DELETE request.

req, err := http.NewRequest("DELETE", "https://www.google.com/robots.txt", nil)
var client http.Client
resp, err := client.Do(req)
// Read response body and close.

Listing 3-3: Sending a DELETE request (/ch-3/basic/main.go)

Now, Listing 3-4 shows a PUT request with an io.Reader body (a PATCH request looks similar).

form := url.Values{}
form.Add("foo", "bar")
var client http.Client
req, err := http.NewRequest(
    "PUT",
    "https://www.google.com/robots.txt",
    strings.NewReader(form.Encode()),
)
resp, err := client.Do(req)
// Read response body and close.

Listing 3-4: Sending a PUT request (/ch-3/basic/main.go)

The standard Go net/http library contains several functions that you can use to manipulate the request before it’s sent to the server. You’ll learn some of the more relevant and applicable variants as you work through practical examples throughout this chapter. But first, we’ll show you how to do something meaningful with the HTTP response that the server receives.

Using Structured Response Parsing

In the previous section, you learned the mechanisms for building and sending HTTP requests in Go. Each of those examples glossed over response handling, essentially ignoring it for the time being. But inspecting various components of the HTTP response is a crucial aspect of any HTTP-related task, like reading the response body, accessing cookies and headers, or simply inspecting the HTTP status code.

Listing 3-5 refines the GET request in Listing 3-1 to display the status code and response body—in this case, Google’s robots.txt file. It uses the ioutil.ReadAll() function to read data from the response body, does some error checking, and prints the HTTP status code and response body to stdout.

❶ resp, err := http.Get("https://www.google.com/robots.txt")
   if err != nil {
       log.Panicln(err)
   }
   // Print HTTP Status
   fmt.Println(resp.Status❷)

   // Read and display response body
   body, err := ioutil.ReadAll(resp.Body❸)
   if err != nil {
       log.Panicln(err)
   }
   fmt.Println(string(body))
❹ resp.Body.Close()

Listing 3-5: Processing the HTTP response body (/ch-3/basic/main.go)

Once you receive your response, named resp ❶ in the above code, you can retrieve the status string (for example, 200 OK) by accessing the exported Status parameter ❷; not shown in our example, there is a similar StatusCode parameter that accesses only the integer portion of the status string.

The Response type contains an exported Body parameter ❸, which is of type io.ReadCloser. An io.ReadCloser is an interface that acts as an io.Reader as well as an io.Closer, or an interface that requires the implementation of a Close() function to close the reader and perform any cleanup. The details are somewhat inconsequential; just know that after reading the data from an io.ReadCloser, you’ll need to call the Close() function ❹ on the response body. Using defer to close the response body is a common practice; this will ensure that the body is closed before you return it.

Now, run the script to see the error status and response body:

$ go run main.go
200 OK
User-agent: *
Disallow: /search
Allow: /search/about
Disallow: /sdch
Disallow: /groups
Disallow: /index.html?
Disallow: /?
Allow: /?hl=
Disallow: /?hl=*&
Allow: /?hl=*&gws_rd=ssl$
Disallow: /?hl=*&*&gws_rd=ssl
--snip--

If you encounter a need to parse more structured data—and it’s likely that you will—you can read the response body and decode it by using the conventions presented in Chapter 2. For example, imagine you’re interacting with an API that communicates using JSON, and one endpoint—say, /ping—returns the following response indicating the server state:

{"Message":"All is good with the world","Status":"Success"}

You can interact with this endpoint and decode the JSON message by using the program in Listing 3-6.

   package main

   import {
       encoding/json"
       log
       net/http
   }
❶ type Status struct {
       Message string
       Status  string
   }

   func main() {
    ❷ res, err := http.Post(
           "http://IP:PORT/ping",
           "application/json",
           nil,
       )
       if err != nil {
           log.Fatalln(err)
       }

       var status Status
    ❸ if err := json.NewDecoder(res.Body).Decode(&status); err != nil {
           log.Fatalln(err)
       }
       defer res.Body.Close()
       log.Printf("%s -> %s
", status.Status❹, status.Message❺)
   }

Listing 3-6: Decoding a JSON response body (/ch-3/basic-parsing/main.go)

The code begins by defining a struct called Status ❶, which contains the expected elements from the server response. The main() function first sends the POST request ❷ and then decodes the response body ❸. After doing so, you can query the Status struct as you normally would—by accessing exported data types Status ❹ and Message ❺.

This process of parsing structured data types is consistent across other encoding formats, like XML or even binary representations. You begin the process by defining a struct to represent the expected response data and then decoding the data into that struct. The details and actual implementation of parsing other formats will be left up to you to determine.

The next sections will apply these fundamental concepts to assist you in building tools to interact with third-party APIs for the purpose of enhancing adversarial techniques and reconnaissance.

Building an HTTP Client That Interacts with Shodan

Prior to performing any authorized adversarial activities against an organization, any good attacker begins with reconnaissance. Typically, this starts with passive techniques that don’t send packets to the target; that way, detection of the activity is next to impossible. Attackers use a variety of sources and services—including social networks, public records, and search engines—to gain potentially useful information about the target.

It’s absolutely incredible how seemingly benign information becomes critical when environmental context is applied during a chained attack scenario. For example, a web application that discloses verbose error messages may, alone, be considered low severity. However, if the error messages disclose the enterprise username format, and if the organization uses single-factor authentication for its VPN, those error messages could increase the likelihood of an internal network compromise through password-guessing attacks.

Maintaining a low profile while gathering the information ensures that the target’s awareness and security posture remains neutral, increasing the likelihood that your attack will be successful.

Shodan (https://www.shodan.io/), self-described as “the world’s first search engine for internet-connected devices,” facilitates passive reconnaissance by maintaining a searchable database of networked devices and services, including metadata such as product names, versions, locale, and more. Think of Shodan as a repository of scan data, even if it does much, much more.

Reviewing the Steps for Building an API Client

In the next few sections, you’ll build an HTTP client that interacts with the Shodan API, parsing the results and displaying relevant information. First, you’ll need a Shodan API key, which you get after you register on Shodan’s website. At the time of this writing, the fee is fairly nominal for the lowest tier, which offers adequate credits for individual use, so go sign up for that. Shodan occasionally offers discounted pricing, so monitor it closely if you want to save a few bucks.

Now, get your API key from the site and set it as an environment variable. The following examples will work as-is only if you save your API key as the variable SHODAN_API_KEY. Refer to your operating system’s user manual, or better yet, look at Chapter 1 if you need help setting the variable.

Before working through the code, understand that this section demonstrates how to create a bare-bones implementation of a client—not a fully featured, comprehensive implementation. However, the basic scaffolding you’ll build now will allow you to easily extend the demonstrated code to implement other API calls as you may need.

The client you build will implement two API calls: one to query subscription credit information and the other to search for hosts that contain a certain string. You use the latter call for identifying hosts; for example, ports or operating systems matching a certain product.

Luckily, the Shodan API is straightforward, producing nicely structured JSON responses. This makes it a good starting point for learning API interaction. Here is a high-level overview of the typical steps for preparing and building an API client:

Review the service’s API documentation.
Design a logical structure for the code in order to reduce complexity and repetition.
Define request or response types, as necessary, in Go.
Create helper functions and types to facilitate simple initialization, authentication, and communication to reduce verbose or repetitive logic.
Build the client that interacts with the API consumer functions and types.

We won’t explicitly call out each step in this section, but you should use this list as a map to guide your development. Start by quickly reviewing the API documentation on Shodan’s website. The documentation is minimal but produces everything needed to create a client program.

Designing the Project Structure

When building an API client, you should structure it so that the function calls and logic stand alone. This allows you to reuse the implementation as a library in other projects. That way, you won’t have to reinvent the wheel in the future. Building for reusability slightly changes a project’s structure. For the Shodan example, here’s the project structure:

$ tree github.com/blackhat-go/bhg/ch-3/shodan
github.com/blackhat-go/bhg/ch-3/shodan
|---cmd
|   |---shodan
|       |---main.go
|---shodan
    |---api.go
    |---host.go
    |---shodan.go

The main.go file defines package main and is used primarily as a consumer of the API you’ll build; in this case, you use it primarily to interact with your client implementation.

The files in the shodan directory—api.go, host.go, and shodan.go—define package shodan, which contains the types and functions necessary for communication to and from Shodan. This package will become your stand-alone library that you can import into various projects.

Cleaning Up API Calls

When you perused the Shodan API documentation, you may have noticed that every exposed function requires you to send your API key. Although you certainly can pass that value around to each consumer function you create, that repetitive task becomes tedious. The same can be said for either hardcoding or handling the base URL (https://api.shodan.io/). For example, defining your API functions, as in the following snippet, requires you to pass in the token and URL to each function, which isn’t very elegant:

func APIInfo(token, url string) { --snip-- }
func HostSearch(token, url string) { --snip-- }

Instead, opt for a more idiomatic solution that allows you to save keystrokes while arguably making your code more readable. To do this, create a shodan.go file and enter the code in Listing 3-7.

   package shodan

❶ const BaseURL = "https://api.shodan.io"

❷ type Client struct {
       apiKey string
   }

❸ func New(apiKey string) *Client {
       return &Client{apiKey: apiKey}
   }

Listing 3-7: Shodan Client definition (/ch-3/shodan/shodan/shodan.go)

The Shodan URL is defined as a constant value ❶; that way, you can easily access and reuse it within your implementing functions. If Shodan ever changes the URL of its API, you’ll have to make the change at only this one location in order to correct your entire codebase. Next, you define a Client struct, used for maintaining your API token across requests ❷. Finally, the code defines a New() helper function, taking the API token as input and creating and returning an initialized Client instance ❸. Now, rather than creating your API code as arbitrary functions, you create them as methods on the Client struct, which allows you to interrogate the instance directly rather than relying on overly verbose function parameters. You can change your API function calls, which we’ll discuss momentarily, to the following:

func (s *Client) APIInfo() { --snip-- }
func (s *Client) HostSearch() { --snip-- }

Since these are methods on the Client struct, you can retrieve the API key through s.apiKey and retrieve the URL through BaseURL. The only prerequisite to calling the methods is that you create an instance of the Client struct first. You can do this with the New() helper function in shodan.go.

Querying Your Shodan Subscription

Now you’ll start the interaction with Shodan. Per the Shodan API documentation, the call to query your subscription plan information is as follows:

https://api.shodan.io/api-info?key={YOUR_API_KEY}

The response returned resembles the following structure. Obviously, the values will differ based on your plan details and remaining subscription credits.

{
 "query_credits": 56,
 "scan_credits": 0,
 "telnet": true,
 "plan": "edu",
 "https": true,
 "unlocked": true,
}

First, in api.go, you’ll need to define a type that can be used to unmarshal the JSON response to a Go struct. Without it, you won’t be able to process or interrogate the response body. In this example, name the type APIInfo:

type APIInfo struct {
    QueryCredits int    `json:"query_credits"`
    ScanCredits  int    `json:"scan_credits"`
    Telnet       bool   `json:"telnet"`
    Plan         string `json:"plan"`
    HTTPS        bool   `json:"https"`
    Unlocked     bool   `json:"unlocked"`
}

The awesomeness that is Go makes that structure and JSON alignment a joy. As shown in Chapter 1, you can use some great tooling to “automagically” parse JSON—populating the fields for you. For each exported type on the struct, you explicitly define the JSON element name with struct tags so you can ensure that data is mapped and parsed properly.

Next you need to implement the function in Listing 3-8, which makes an HTTP GET request to Shodan and decodes the response into your APIInfo struct:

func (s *Client) APIInfo() (*APIInfo, error) {
    res, err := http.Get(fmt.Sprintf("%s/api-info?key=%s", BaseURL, s.apiKey))❶
    if err != nil {
        return nil, err
    }
    defer res.Body.Close()

    var ret APIInfo
    if err := json.NewDecoder(res.Body).Decode(&ret)❷; err != nil {
        return nil, err
    }
    return &ret, nil
}

Listing 3-8: Making an HTTP GET request and decoding the response (/ch-3/shodan/shodan/api.go)

The implementation is short and sweet. You first issue an HTTP GET request to the /api-info resource ❶. The full URL is built using the BaseURL global constant and s.apiKey. You then decode the response into your APIInfo struct ❷ and return it to the caller.

Before writing code that utilizes this shiny new logic, build out a second, more useful API call—the host search—which you’ll add to host.go. The request and response, according to the API documentation, is as follows:

https://api.shodan.io/shodan/host/search?key={YOUR_API_KEY}&query={query}&facets={facets}

{
    "matches": [
    {
        "os": null,
        "timestamp": "2014-01-15T05:49:56.283713",
        "isp": "Vivacom",
        "asn": "AS8866",
        "hostnames": [ ],
        "location": {
            "city": null,
            "region_code": null,
            "area_code": null,
            "longitude": 25,
            "country_code3": "BGR",
            "country_name": "Bulgaria",
            "postal_code": null,
            "dma_code": null,
            "country_code": "BG",
            "latitude": 43
        },
        "ip": 3579573318,
        "domains": [ ],
        "org": "Vivacom",
        "data": "@PJL INFO STATUS CODE=35078 DISPLAY="Power Saver" ONLINE=TRUE",
        "port": 9100,
        "ip_str": "213.91.244.70"
    },
    --snip--
    ],
    "facets": {
        "org": [
        {
            "count": 286,
            "value": "Korea Telecom"
        },
        --snip--
        ]
    },
    "total": 12039
}

Compared to the initial API call you implemented, this one is significantly more complex. Not only does the request take multiple parameters, but the JSON response contains nested data and arrays. For the following implementation, you’ll ignore the facets option and data, and instead focus on performing a string-based host search to process only the matches element of the response.

As you did before, start by building the Go structs to handle the response data; enter the types in Listing 3-9 into your host.go file.

type HostLocation struct {
    City         string  `json:"city"`
    RegionCode   string  `json:"region_code"`
    AreaCode     int     `json:"area_code"`
    Longitude    float32 `json:"longitude"`
    CountryCode3 string  `json:"country_code3"`
    CountryName  string  `json:"country_name"`
    PostalCode   string  `json:"postal_code"`
    DMACode      int     `json:"dma_code"`
    CountryCode  string  `json:"country_code"`
    Latitude     float32 `json:"latitude"`
}

type Host struct {
    OS        string       `json:"os"`
    Timestamp string       `json:"timestamp"`
    ISP       string       `json:"isp"`
    ASN       string       `json:"asn"`
    Hostnames []string     `json:"hostnames"`
    Location  HostLocation `json:"location"`
    IP        int64        `json:"ip"`
    Domains   []string     `json:"domains"`
    Org       string       `json:"org"`
    Data      string       `json:"data"`
    Port      int          `json:"port"`
    IPString  string       `json:"ip_str"`
}

type HostSearch struct {
    Matches []Host `json:"matches"`
}

Listing 3-9: Host search response data types (/ch-3/shodan/shodan/host.go)

The code defines three types:

HostSearch Used for parsing the matches array

Host Represents a single matches element

HostLocation Represents the location element within the host

Notice that the types may not define all response fields. Go handles this elegantly, allowing you to define structures with only the JSON fields you care about. Therefore, our code will parse the JSON just fine, while reducing the length of your code by including only the fields that are most relevant to the example. To initialize and populate the struct, you’ll define the function in Listing 3-10, which is similar to the APIInfo() method you created in Listing 3-8.

func (s *Client) HostSearch(q string❶) (*HostSearch, error) {
    res, err := http.Get( ❷
        fmt.Sprintf("%s/shodan/host/search?key=%s&query=%s", BaseURL, s.apiKey, q),
    )
    if err != nil {
        return nil, err
    }
    defer res.Body.Close()

    var ret HostSearch
    if err := json.NewDecoder(res.Body).Decode(&ret)❸; err != nil {
        return nil, err
    }

    return &ret, nil
}

Listing 3-10: Decoding the host search response body (/ch-3/shodan/shodan/host.go)

The flow and logic is exactly like the APIInfo() method, except that you take the search query string as a parameter ❶, issue the call to the /shodan/host/search endpoint while passing the search term ❷, and decode the response into the HostSearch struct ❸.

You repeat this process of structure definition and function implementation for each API service you want to interact with. Rather than wasting precious pages here, we’ll jump ahead and show you the last step of the process: creating the client that uses your API code.

Creating a Client

You’ll use a minimalistic approach to create your client: take a search term as a command line argument and then call the APIInfo() and HostSearch() methods, as in Listing 3-11.

func main() {
    if len(os.Args) != 2 {
        log.Fatalln("Usage: shodan searchterm")
    }
    apiKey := os.Getenv("SHODAN_API_KEY")❶
    s := shodan.New(apiKey)❷
    info, err := s.APIInfo()❸
    if err != nil {
        log.Panicln(err)
    }
    fmt.Printf(
        "Query Credits: %d
Scan Credits:  %d

",
        info.QueryCredits,
        info.ScanCredits)

    hostSearch, err := s.HostSearch(os.Args[1])❹
    if err != nil {
        log.Panicln(err)
    }
 ❺ for _, host := range hostSearch.Matches {
        fmt.Printf("%18s%8d
", host.IPString, host.Port)
    }
}

Listing 3-11: Consuming and using the shodan package (/ch-3/shodan/cmd/shodan/main.go)

Start by reading your API key from the SHODAN_API_KEY environment variable ❶. Then use that value to initialize a new Client struct ❷, s, subsequently using it to call your APIInfo() method ❸. Call the HostSearch() method, passing in a search string captured as a command line argument ❹. Finally, loop through the results to display the IP and port values for those services matching the query string ❺. The following output shows a sample run, searching for the string tomcat:

$ SHODAN_API_KEY=YOUR-KEY go run main.go tomcat
Query Credits: 100
Scan Credits:  100

    185.23.138.141    8081
   218.103.124.239    8080
     123.59.14.169    8081
      177.6.80.213    8181
    142.165.84.160   10000
--snip--

You’ll want to add error handling and data validation to this project, but it serves as a good example for fetching and displaying Shodan data with your new API. You now have a working codebase that can be easily extended to support and test the other Shodan functions.

Interacting with Metasploit

Metasploit is a framework used to perform a variety of adversarial techniques, including reconnaissance, exploitation, command and control, persistence, lateral network movement, payload creation and delivery, privilege escalation, and more. Even better, the community version of the product is free, runs on Linux and macOS, and is actively maintained. Essential for any adversarial engagement, Metasploit is a fundamental tool used by penetration testers, and it exposes a remote procedure call (RPC) API to allow remote interaction with its functionality.

In this section, you’ll build a client that interacts with a remote Metasploit instance. Much like the Shodan code you built, the Metasploit client you develop won’t cover a comprehensive implementation of all available functionality. Rather, it will be the foundation upon which you can extend additional functionality as needed. We think you’ll find the implementation more complex than the Shodan example, making the Metasploit interaction a more challenging progression.

Setting Up Your Environment

Before you proceed with this section, download and install the Metasploit community edition if you don’t already have it. Start the Metasploit console as well as the RPC listener through the msgrpc module in Metasploit. Then set the server host—the IP on which the RPC server will listen—and a password, as shown in Listing 3-12.

$ msfconsole
msf > load msgrpc Pass=s3cr3t ServerHost=10.0.1.6
[*] MSGRPC Service:  10.0.1.6:55552
[*] MSGRPC Username: msf
[*] MSGRPC Password: s3cr3t
[*] Successfully loaded plugin: msgrpc

Listing 3-12: Starting Metasploit and the msgrpc server

To make the code more portable and avoid hardcoding values, set the following environment variables to the values you defined for your RPC instance. This is similar to what you did for the Shodan API key used to interact with Shodan in “Creating a Client” on page 58.

$ export MSFHOST=10.0.1.6:55552
$ export MSFPASS=s3cr3t

You should now have Metasploit and the RPC server running.

Because the details on exploitation and Metasploit use are beyond the scope of this book,¹ let’s assume that through pure cunning and trickery you’ve already compromised a remote Windows system and you’ve leveraged Metasploit’s Meterpreter payload for advanced post-exploitation activities. Here, your efforts will instead focus on how you can remotely communicate with Metasploit to list and interact with established Meterpreter sessions. As we mentioned before, this code is a bit more cumbersome, so we’ll purposely pare it back to the bare minimum—just enough for you to take the code and extend it for your specific needs.

Follow the same project roadmap as the Shodan example: review the Metasploit API, lay out the project in library format, define data types, implement client API functions, and, finally, build a test rig that uses the library.

First, review the Metasploit API developer documentation at Rapid7’s official website (https://metasploit.help.rapid7.com/docs/rpc-api/). The functionality exposed is extensive, allowing you to do just about anything remotely that you could through local interaction. Unlike Shodan, which uses JSON, Metasploit communicates using MessagePack, a compact and efficient binary format. Because Go doesn’t contain a standard MessagePack package, you’ll use a full-featured community implementation. Install it by executing the following from the command line:

$ go get gopkg.in/vmihailenco/msgpack.v2

In the code, you’ll refer to the implementation as msgpack. Don’t worry too much about the details of the MessagePack spec. You’ll see shortly that you’ll need to know very little about MessagePack itself to build a working client. Go is great because it hides a lot of these details, allowing you to instead focus on business logic. What you need to know are the basics of annotating your type definitions in order to make them “MessagePack-friendly.” Beyond that, the code to initiate encoding and decoding is identical to other formats, such as JSON and XML.

Next, create your directory structure. For this example, you use only two Go files:

$ tree github.com/blackhat-go/bhg/ch-3/metasploit-minimal
github.com/blackhat-go/bhg/ch-3/metasploit-minimal
|---client
|   |---main.go
|---rpc
    |---msf.go

The msf.go file resides within the rpc package, and you’ll use client/main.go to implement and test the library you build.

Defining Your Objective

Now, you need to define your objective. For the sake of brevity, implement the code to interact and issue an RPC call that retrieves a listing of current Meterpreter sessions—that is, the session.list method from the Metasploit developer documentation. The request format is defined as follows:

[ "session.list", "token" ]

This is minimal; it expects to receive the name of the method to implement and a token. The token value is a placeholder. If you read through the documentation, you’ll find that this is an authentication token, issued upon successful login to the RPC server. The response returned from Metasploit for the session.list method follows this format:

{
"1" => {
    'type' => "shell",
    "tunnel_local" => "192.168.35.149:44444",
    "tunnel_peer" => "192.168.35.149:43886",
    "via_exploit" => "exploit/multi/handler",
    "via_payload" => "payload/windows/shell_reverse_tcp",
    "desc" => "Command shell",
    "info" => "",
    "workspace" => "Project1",
    "target_host" => "",
    "username" => "root",
    "uuid" => "hjahs9kw",
    "exploit_uuid" => "gcprpj2a",
    "routes" => [ ]
    }
}

This response is returned as a map: the Meterpreter session identifiers are the keys, and the session detail is the value.

Let’s build the Go types to handle both the request and response data. Listing 3-13 defines the sessionListReq and SessionListRes.

❶ type sessionListReq struct {
    ❷ _msgpack struct{} `msgpack:",asArray"`
       Method   string
       Token    string
   }

❸ type SessionListRes struct {
       ID          uint32 `msgpack:",omitempty"`❹
       Type        string `msgpack:"type"`
       TunnelLocal string `msgpack:"tunnel_local"`
       TunnelPeer  string `msgpack:"tunnel_peer"`
       ViaExploit  string `msgpack:"via_exploit"`
       ViaPayload  string `msgpack:"via_payload"`
       Description string `msgpack:"desc"`
       Info        string `msgpack:"info"`
       Workspace   string `msgpack:"workspace"`
       SessionHost string `msgpack"session_host"`
       SessionPort int    `msgpack"session_port"`
       Username    string `msgpack:"username"`
       UUID        string `msgpack:"uuid"`
       ExploitUUID string `msgpack:"exploit_uuid"`
}

Listing 3-13: Metasploit session list type definitions (/ch-3/metasploit-minimal/rpc/msf.go)

You use the request type, sessionListReq ❶, to serialize structured data to the MessagePack format in a manner consistent with what the Metasploit RPC server expects—specifically, with a method name and token value. Notice that there aren’t any descriptors for those fields. The data is passed as an array, not a map, so rather than expecting data in key/value format, the RPC interface expects the data as a positional array of values. This is why you omit annotations for those properties—no need to define the key names. However, by default, a structure will be encoded as a map with the key names deduced from the property names. To disable this and force the encoding as a positional array, you add a special field named _msgpack that utilizes the asArray descriptor ❷, to explicitly instruct an encoder/decoder to treat the data as an array.

The SessionListRes type ❸ contains a one-to-one mapping between response field and struct properties. The data, as shown in the preceding example response, is essentially a nested map. The outer map is the session identifier to session details, while the inner map is the session details, represented as key/value pairs. Unlike the request, the response isn’t structured as a positional array, but each of the struct properties uses descriptors to explicitly name and map the data to and from Metasploit’s representation. The code includes the session identifier as a property on the struct. However, because the actual value of the identifier is the key value, this will be populated in a slightly different manner, so you include the omitempty descriptor ❹ to make the data optional so that it doesn’t impact encoding or decoding. This flattens the data so you don’t have to work with nested maps.

Retrieving a Valid Token

Now, you have only one thing outstanding. You have to retrieve a valid token value to use for that request. To do so, you’ll issue a login request for the auth.login() API method, which expects the following:

["auth.login", "username", "password"]

You need to replace the username and password values with what you used when loading the msfrpc module in Metasploit during initial setup (recall that you set them as environment variables). Assuming authentication is successful, the server responds with the following message, which contains an authentication token you can use for subsequent requests.

{ "result" => "success", "token" => "a1a1a1a1a1a1a1a1" }

An authentication failure produces the following response:

{
    "error" => true,
    "error_class" => "Msf::RPC::Exception",
    "error_message" => "Invalid User ID or Password"
}

For good measure, let’s also create functionality to expire the token by logging out. The request takes the method name, the authentication token, and a third optional parameter that you’ll ignore because it’s unnecessary for this scenario:

[ "auth.logout", "token", "logoutToken"]

A successful response looks like this:

{ "result" => "success" }

Defining Request and Response Methods

Much as you structured the Go types for the session.list() method’s request and response, you need to do the same for both auth.login() and auth.logout() (see Listing 3-14). The same reasoning applies as before, using descriptors to force requests to be serialized as arrays and for the responses to be treated as maps:

type loginReq struct {
    _msgpack struct{} `msgpack:",asArray"`
    Method   string
    Username string
    Password string
}

type loginRes struct {
    Result       string `msgpack:"result"`
    Token        string `msgpack:"token"`
    Error        bool   `msgpack:"error"`
    ErrorClass   string `msgpack:"error_class"`
    ErrorMessage string `msgpack:"error_message"`
}

type logoutReq struct {
    _msgpack    struct{} `msgpack:",asArray"`
    Method      string
    Token       string
    LogoutToken string
}

type logoutRes struct {
    Result string `msgpack:"result"`
}

Listing 3-14: Login and logout Metasploit type definition (/ch-3/metasploit-minimal/rpc/msf.go)

It’s worth noting that Go dynamically serializes the login response, populating only the fields present, which means you can represent both successful and failed logins by using a single struct format.

Creating a Configuration Struct and an RPC Method

In Listing 3-15, you take the defined types and actually use them, creating the necessary methods to issue RPC commands to Metasploit. Much as in the Shodan example, you also define an arbitrary type for maintaining pertinent configuration and authentication information. That way, you won’t have to explicitly and repeatedly pass in common elements such as host, port, and authentication token. Instead, you’ll use the type and build methods on it so that data is implicitly available.

type Metasploit struct {
    host  string
    user  string
    pass  string
    token string
}

func New(host, user, pass string) *Metasploit {
    msf := &Metasploit{
        host: host,
        user: user,
        pass: pass,
    }

    return msf
}

Listing 3-15: Metasploit client definition (/ch-3/metasploit-minimal/rpc/msf.go)

Now you have a struct and, for convenience, a function named New() that initializes and returns a new struct.

Performing Remote Calls

You can now build methods on your Metasploit type in order to perform the remote calls. To prevent extensive code duplication, in Listing 3-16, you start by building a method that performs the serialization, deserialization, and HTTP communication logic. Then you won’t have to include this logic in every RPC function you build.

func (msf *Metasploit) send(req interface{}, res interface{})❶ error {
    buf := new(bytes.Buffer)
 ❷ msgpack.NewEncoder(buf).Encode(req)
 ❸ dest := fmt.Sprintf("http://%s/api", msf.host)
    r, err := http.Post(dest, "binary/message-pack", buf)❹
    if err != nil {
        return err
    }
    defer r.Body.Close()

    if err := msgpack.NewDecoder(r.Body).Decode(&res)❺; err != nil {
        return err
    }

    return nil
}

Listing 3-16: Generic send() method with reusable serialization and deserialization (/ch-3/metasploit-minimal/rpc/msf.go)

The send() method receives request and response parameters of type interface{} ❶. Using this interface type allows you to pass any request struct into the method, and subsequently serialize and send the request to the server. Rather than explicitly returning the response, you’ll use the res interface{} parameter to populate its data by writing a decoded HTTP response to its location in memory.

Next, use the msgpack library to encode the request ❷. The logic to do this matches that of other standard, structured data types: first create an encoder via NewEncoder() and then call the Encode() method. This populates the buf variable with MessagePack-encoded representation of the request struct. Following the encoding, you build the destination URL by using the data within the Metasploit receiver, msf ❸. You use that URL and issue a POST request, explicitly setting the content type to binary/message-pack and setting the body to the serialized data ❹. Finally, you decode the response body ❺. As alluded to earlier, the decoded data is written to the memory location of the response interface that was passed into the method. The encoding and decoding of data is done without ever needing to explicitly know the request or response struct types, making this a flexible, reusable method.

In Listing 3-17, you can see the meat of the logic in all its glory.

func (msf *Metasploit) Login()❶ error {
    ctx := &loginReq{
        Method:   "auth.login",
        Username: msf.user,
        Password: msf.pass,
    }
    var res loginRes
    if err := msf.send(ctx, &res)❷; err != nil {
        return err
    }
    msf.token = res.Token
    return nil
}

func (msf *Metasploit) Logout()❸ error {
    ctx := &logoutReq{
        Method:      "auth.logout",
        Token:       msf.token,
        LogoutToken: msf.token,
    }
    var res logoutRes
    if err := msf.send(ctx, &res)❹; err != nil {
        return err
    }
    msf.token = ""
    return nil
}

func (msf *Metasploit) SessionList()❺ (map[uint32]SessionListRes, error) {
    req := &SessionListReq{Method: "session.list", Token: msf.token}
 ❻ res := make(map[uint32]SessionListRes)
    if err := msf.send(req, &res)❼; err != nil {
        return nil, err
    }

 ❽ for id, session := range res {
        session.ID = id
        res[id] = session
    }
    return res, nil
}

Listing 3-17: Metasploit API calls implementation (/ch-3/metasploit-minimal/rpc/msf.go)

You define three methods: Login() ❶, Logout() ❸, and SessionList() ❺. Each method uses the same general flow: create and initialize a request struct, create the response struct, and call the helper function ❷❹❼ to send the request and receive the decoded response. The Login() and Logout() methods manipulate the token property. The only significant difference between method logic appears in the SessionList() method, where you define the response as a map[uint32]SessionListRes ❻ and loop over that response to flatten the map ❽, setting the ID property on the struct rather than maintaining a map of maps.

Remember that the session.list() RPC function requires a valid authentication token, meaning you have to log in before the SessionList() method call will succeed. Listing 3-18 uses the Metasploit receiver struct to access a token, which isn’t a valid value yet—it’s an empty string. Since the code you’re developing here isn’t fully featured, you could just explicitly include a call to your Login() method from within the SessionList() method, but for each additional authenticated method you implement, you’d have to check for the existence of a valid authentication token and make an explicit call to Login(). This isn’t great coding practice because you’d spend a lot of time repeating logic that you could write, say, as part of a bootstrapping process.

You’ve already implemented a function, New(), designed to be used for bootstrapping, so patch up that function to see what a new implementation looks like when including authentication as part of the process (see Listing 3-18).

func New(host, user, pass string) (*Metasploit, error)❶ {
    msf := &Metasploit{
        host: host,
        user: user,
        pass: pass,
    }

    if err := msf.Login()❷; err != nil {
        return nil, err
    }

    return msf, nil
}

Listing 3-18: Initializing the client with embedding Metasploit login (/ch-3/metasploit-minimal/rpc/msf.go)

The patched-up code now includes an error as part of the return value set ❶. This is to alert on possible authentication failures. Also, added to the logic is an explicit call to the Login() method ❷. As long as the Metasploit struct is instantiated using this New() function, your authenticated method calls will now have access to a valid authentication token.

Creating a Utility Program

Nearing the end of this example, your last effort is to create the utility program that implements your shiny new library. Enter the code in Listing 3-19 into client/main.go, run it, and watch the magic happen.

package main

import (
    "fmt"
    "log"

    "github.com/blackhat-go/bhg/ch-3/metasploit-minimal/rpc"
)

func main() {
    host := os.Getenv("MSFHOST")
    pass := os.Getenv("MSFPASS")
    user := "msf"

    if host == "" || pass == "" {
        log.Fatalln("Missing required environment variable MSFHOST or MSFPASS")
    }
    msf, err := rpc.New(host, user, pass)❶
    if err != nil {
        log.Panicln(err)
    }
 ❷ defer msf.Logout()

    sessions, err := msf.SessionList()❸
    if err != nil {
        log.Panicln(err)
    }
    fmt.Println("Sessions:")
 ❹ for _, session := range sessions {
        fmt.Printf("%5d  %s
", session.ID, session.Info)
    }
}

Listing 3-19: Consuming our msfrpc package (/ch-3/metasploit-minimal/client/main.go)

First, bootstrap the RPC client and initialize a new Metasploit struct ❶. Remember, you just updated this function to perform authentication during initialization. Next, ensure you do proper cleanup by issuing a deferred call to the Logout() method ❷. This will run when the main function returns or exits. You then issue a call to the SessionList() method ❸ and iterate over that response to list out the available Meterpreter sessions ❹.

That was a lot of code, but fortunately, implementing other API calls should be substantially less work since you’ll just be defining request and response types and building the library method to issue the remote call. Here’s sample output produced directly from our client utility, showing one established Meterpreter session:

$ go run main.go
Sessions:
    1 WIN-HOMEjsmith @ WIN-HOME

There you have it. You’ve successfully created a library and client utility to interact with a remote Metasploit instance to retrieve the available Meterpreter sessions. Next, you’ll venture into search engine response scraping and document metadata parsing.

Parsing Document Metadata with Bing Scraping

As we stressed in the Shodan section, relatively benign information—when viewed in the correct context—can prove to be critical, increasing the likelihood that your attack against an organization succeeds. Information such as employee names, phone numbers, email addresses, and client software versions are often the most highly regarded because they provide concrete or actionable information that attackers can directly exploit or use to craft attacks that are more effective and highly targeted. One such source of information, popularized by a tool named FOCA, is document metadata.

Applications store arbitrary information within the structure of a file saved to disk. In some cases, this can include geographical coordinates, application versions, operating system information, and usernames. Better yet, search engines contain advanced query filters that allow you to retrieve specific files for an organization. The remainder of this chapter focuses on building a tool that scrapes—or as my lawyer calls it, indexes—Bing search results to retrieve a target organization’s Microsoft Office documents, subsequently extracting relevant metadata.

Setting Up the Environment and Planning

Before diving into the specifics, we’ll start by stating the objectives. First, you’ll focus solely on Office Open XML documents—those ending in xlsx, docx, pptx, and so on. Although you could certainly include legacy Office data types, the binary formats make them exponentially more complicated, increasing code complexity and reducing readability. The same can be said for working with PDF files. Also, the code you develop won’t handle Bing pagination, instead only parsing initial page search results. We encourage you to build this into your working example and explore file types beyond Open XML.

Why not just use the Bing Search APIs for building this, rather than doing HTML scraping? Because you already know how to build clients that interact with structured APIs. There are practical use cases for scraping HTML pages, particularly when no API exists. Rather than rehashing what you already know, we’ll take this as an opportunity to introduce a new method of extracting data. You’ll use an excellent package, goquery, which mimics the functionality of jQuery, a JavaScript library that includes an intuitive syntax to traverse HTML documents and select data within. Start by installing goquery:

$ go get github.com/PuerkitoBio/goquery

Fortunately, that’s the only prerequisite software needed to complete the development. You’ll use standard Go packages to interact with Open XML files. These files, despite their file type suffix, are ZIP archives that, when extracted, contain XML files. The metadata is stored in two files within the docProps directory of the archive:

$ unzip test.xlsx
$ tree
--snip--
|---docProps
|   |---app.xml
|   |---core.xml
--snip—

The core.xml file contains the author information as well as modification details. It’s structured as follows:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata
/core-properties"
                   xmlns:dc="http://purl.org/dc/elements/1.1/"
                   xmlns:dcterms="http://purl.org/dc/terms/"
                   xmlns:dcmitype="http://purl.org/dc/dcmitype/"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <dc:creator>Dan Kottmann</dc:creator>❶
    <cp:lastModifiedBy>Dan Kottmann</cp:lastModifiedBy>❷
    <dcterms:created xsi:type="dcterms:W3CDTF">2016-12-06T18:24:42Z</dcterms:created>
    <dcterms:modified xsi:type="dcterms:W3CDTF">2016-12-06T18:25:32Z</dcterms:modified>
</cp:coreProperties>

The creator ❶ and lastModifiedBy ❷ elements are of primary interest. These fields contain employee or usernames that you can use in a social-engineering or password-guessing campaign.

The app.xml file contains details about the application type and version used to create the Open XML document. Here’s its structure:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"
            xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
    <Application>Microsoft Excel</Application>❶
    <DocSecurity>0</DocSecurity>
    <ScaleCrop>false</ScaleCrop>
    <HeadingPairs>
        <vt:vector size="2" baseType="variant">
            <vt:variant>
                <vt:lpstr>Worksheets</vt:lpstr>
            </vt:variant>
            <vt:variant>
                <vt:i4>1</vt:i4>
            </vt:variant>
        </vt:vector>
    </HeadingPairs>
    <TitlesOfParts>
        <vt:vector size="1" baseType="lpstr">
            <vt:lpstr>Sheet1</vt:lpstr>
        </vt:vector>
    </TitlesOfParts>
    <Company>ACME</Company>❷
    <LinksUpToDate>false</LinksUpToDate>
    <SharedDoc>false</SharedDoc>
    <HyperlinksChanged>false</HyperlinksChanged>
    <AppVersion>15.0300</AppVersion>❸
</Properties>

You’re primarily interested in just a few of those elements: Application ❶, Company ❷, and AppVersion ❸. The version itself doesn’t obviously correlate to the Office version name, such as Office 2013, Office 2016, and so on, but a logical mapping does exist between that field and the more readable, commonly known alternative. The code you develop will maintain this mapping.

Defining the metadata Package

In Listing 3-20, define the Go types that correspond to these XML datasets in a new package named metadata and put the code in a file named openxml.go—one type for each XML file you wish to parse. Then add a data mapping and convenience function for determining the recognizable Office version that corresponds to the AppVersion.

type OfficeCoreProperty struct {
    XMLName        xml.Name `xml:"coreProperties"`
    Creator        string   `xml:"creator"`
    LastModifiedBy string   `xml:"lastModifiedBy"`
}

type OfficeAppProperty struct {
    XMLName     xml.Name `xml:"Properties"`
    Application string   `xml:"Application"`
    Company     string   `xml:"Company"`
    Version     string   `xml:"AppVersion"`
}

var OfficeVersions❶ = map[string]string{
    "16": "2016",
    "15": "2013",
    "14": "2010",
    "12": "2007",
    "11": "2003",
}

func (a *OfficeAppProperty) GetMajorVersion()❷ string {
    tokens := strings.Split(a.Version, ".")❸

    if len(tokens) < 2 {
        return "Unknown"
    }
    v, ok := OfficeVersions❹ [tokens[0]]
    if !ok {
        return "Unknown"
    }
    return v
}

Listing 3-20: Open XML type definition and version mapping (/ch-3/bing-metadata/metadata/openxml.go)

After you define the OfficeCoreProperty and OfficeAppProperty types, define a map, OfficeVersions, that maintains a relationship of major version numbers to recognizable release years ❶. To use this map, define a method, GetMajorVersion(), on the OfficeAppProperty type ❷. The method splits the XML data’s AppVersion value to retrieve the major version number ❸, subsequently using that value and the OfficeVersions map to retrieve the release year ❹.

Mapping the Data to Structs

Now that you’ve built the logic and types to work with and inspect the XML data of interest, you can create the code that reads the appropriate files and assigns the contents to your structs. To do this, define NewProperties() and process() functions, as shown in Listing 3-21.

func NewProperties(r *zip.Reader) (*OfficeCoreProperty, *OfficeAppProperty, error) {❶
    var coreProps OfficeCoreProperty
    var appProps OfficeAppProperty

    for _, f := range r.File {❷
        switch f.Name {❸
        case "docProps/core.xml":
            if err := process(f, &coreProps)❹; err != nil {
                return nil, nil, err
            }
        case "docProps/app.xml":
            if err := process(f, &appProps)❺; err != nil {
                return nil, nil, err
            }
        default:
            continue
        }
    }
    return &coreProps, &appProps, nil
}

func process(f *zip.File, prop interface{}) error {❻
    rc, err := f.Open()
    if err != nil {
        return err
    }
    defer rc.Close()

    if err := ❼xml.NewDecoder(rc).Decode(&prop); err != nil {
        return err
    }
    return nil
}

Listing 3-21: Processing Open XML archives and embedded XML documents (/ch-3/bing-metadata/metadata/openxml.go)

The NewProperties() function accepts a *zip.Reader, which represents an io.Reader for ZIP archives ❶. Using the zip.Reader instance, iterate through all the files in the archive ❷, checking the filenames ❸. If a filename matches one of the two property filenames, call the process() function ❹❺, passing in the file and the arbitrary structure type you wish to populate—either OfficeCoreProperty or OfficeAppProperty.

The process() function accepts two parameters: a *zip.File and an interface{} ❻. Similar to the Metasploit tool you developed, this code accepts a generic interface{} type to allow for the file contents to be assigned into any data type. This increases code reuse because there’s nothing type-specific within the process() function. Within the function, the code reads the contents of the file and unmarshals the XML data into the struct ❼.

Searching and Receiving Files with Bing

You now have all the code necessary to open, read, parse, and extract Office Open XML documents, and you know what you need to do with the file. Now, you need to figure out how to search for and retrieve files by using Bing. Here’s the plan of action you should follow:

Submit a search request to Bing with proper filters to retrieve targeted results.
Scrape the HTML response, extracting the HREF (link) data to obtain direct URLs for documents.
Submit an HTTP request for each direct document URL
Parse the response body to create a zip.Reader
Pass the zip.Reader into the code you already developed to extract metadata.

The following sections discuss each of these steps in order.

The first order of business is to build a search query template. Much like Google, Bing contains advanced query parameters that you can use to filter search results on numerous variables. Most of these filters are submitted in a filter_type: value format. Without explaining all the available filter types, let’s instead focus on what helps you achieve your goal. The following list contains the three filters you’ll need. Note that you could use additional filters, but at the time of this writing, they behave somewhat unpredictably.

site Used to filter the results to a specific domain

filetype Used to filter the results based off resource file type

instreamset Used to filter the results to include only certain file extensions

An example query to retrieve docx files from nytimes.com would look like this:

site:nytimes.com && filetype:docx && instreamset:(url title):docx

After submitting that query, take a peek at the resulting URL in your browser. It should resemble Figure 3-1. Additional parameters may appear after this, but they’re inconsequential for this example, so you can ignore them.

Now that you know the URL and parameter format, you can see the HTML response, but first you need to determine where in the Document Object Model (DOM) the document links reside. You can do this by viewing the source code directly, or limit the guesswork and just use your browser’s developer tools. The following image shows the full HTML element path to the desired HREF. You can use the element inspector, as in Figure 3-1, to quickly select the link to reveal its full path.

Figure 3-1: A browser developer tool showing the full element path

With that path information, you can use goquery to systematically pull all data elements that match an HTML path. Enough talk! Listing 3-22 puts it all together: retrieving, scraping, parsing, and extracting. Save this code to main.go.

❶ func handler(i int, s *goquery.Selection) {
       url, ok := s.Find("a").Attr("href")❷
       if !ok {
           return
       }

       fmt.Printf("%d: %s
", i, url)
       res, err := http.Get(url)❸
       if err != nil {
           return
       }
       buf, err := ioutil.ReadAll(res.Body)❹
       if err != nil {
           return
       }
       defer res.Body.Close()

       r, err := zip.NewReader(bytes.NewReader(buf)❺, int64(len(buf)))
       if err != nil {
           return
       }

       cp, ap, err := metadata.NewProperties(r)❻
       if err != nil {
           return
       }

       log.Printf(
           "%25s %25s - %s %s
",
           cp.Creator,
           cp.LastModifiedBy,
           ap.Application,
           ap.GetMajorVersion())
   }

   func main() {
       if len(os.Args) != 3 {
           log.Fatalln("Missing required argument. Usage: main.go domain ext")
       }
       domain := os.Args[1]
       filetype := os.Args[2]

    ❼ q := fmt.Sprintf(
           "site:%s && filetype:%s && instreamset:(url title):%s",
           domain,
           filetype,
           filetype)
    ❽ search := fmt.Sprintf("http://www.bing.com/search?q=%s", url.QueryEscape(q))
       doc, err := goquery.NewDocument(search)❾
       if err != nil {
           log.Panicln(err)
       }

       s := "html body div#b_content ol#b_results li.b_algo div.b_title h2"
    ❿ doc.Find(s).Each(handler)
  }

Listing 3-22: Scraping Bing results and parsing document metadata (/ch-3/bing-metadata/client/main.go)

You create two functions. The first, handler(), accepts a goquery.Selection instance ❶ (in this case, it will be populated with an anchor HTML element) and finds and extracts the href attribute ❷. This attribute contains a direct link to the document returned from the Bing search. Using that URL, the code then issues a GET request to retrieve the document ❸. Assuming no errors occur, you then read the response body ❹, leveraging it to create a zip.Reader ❺. Recall that the function you created earlier in your metadata package, NewProperties(), expects a zip.Reader. Now that you have the appropriate data type, pass it to that function ❻, and properties are populated from the file and printed to your screen.

The main() function bootstraps and controls the whole process; you pass it the domain and file type as command line arguments. The function then uses this input data to build the Bing query with the appropriate filters ❼. The filter string is encoded and used to build the full Bing search URL ❽. The search request is sent using the goquery.NewDocument() function, which implicitly makes an HTTP GET request and returns a goquery-friendly representation of the HTML response document ❾. This document can be inspected with goquery. Finally, use the HTML element selector string you identified with your browser developer tools to find and iterate over matching HTML elements ❿. For each matching element, a call is made to your handler() function.

A sample run of the code produces output similar to the following:

$ go run main.go nytimes.com docx
0: http://graphics8.nytimes.com/packages/pdf/2012NAIHSAnnualHIVReport041713.docx
2020/12/21 11:53:50     Jonathan V. Iralu     Dan Frosch - Microsoft Macintosh Word 2010
1: http://www.nytimes.com/packages/pdf/business/Announcement.docx
2020/12/21 11:53:51     agouser               agouser - Microsoft Office Outlook 2007
2: http://www.nytimes.com/packages/pdf/business/DOCXIndictment.docx
2020/12/21 11:53:51     AGO                   Gonder, Nanci - Microsoft Office Word 2007
3: http://www.nytimes.com/packages/pdf/business/BrownIndictment.docx
2020/12/21 11:53:51     AGO                   Gonder, Nanci - Microsoft Office Word 2007
4: http://graphics8.nytimes.com/packages/pdf/health/Introduction.docx
2020/12/21 11:53:51     Oberg, Amanda M       Karen Barrow - Microsoft Macintosh Word 2010

You can now search for and extract document metadata for all Open XML files while targeting a specific domain. I encourage you to expand on this example to include logic to navigate multipage Bing search results, to include other file types beyond Open XML, and to enhance the code to concurrently download the identified files.

Summary

This chapter introduced to you fundamental HTTP concepts in Go, which you used to create usable tools that interacted with remote APIs, as well as to scrape arbitrary HTML data. In the next chapter, you’ll continue with the HTTP theme by learning to create servers rather than clients.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3 HTTP CLIENTS AND REMOTE INTERACTION WITH TOOLS

Create new playlist

Sign In

Sign Up

3HTTP CLIENTS AND REMOTE INTERACTION WITH TOOLS