Automating Penetration Testing in a CI/CD Pipeline (Part 2)

Continuous Security: Security in the Continuous Delivery Pipeline is a series of articles addressing security concerns and testing in the Continuous Delivery pipeline. This is the sixth article in the series.

In the first post, we discussed what OWASP ZAP is, how it’s installed and automating that installation process with Ansible. This second article of three will drill down into how to use the ZAP server, created in Part 1 for penetration testing your web-based application.

Penetration Test Script

If you recall the flow diagram (below) from the first post, we will need a way to talk to ZAP so that it can trigger a test against our application. To do this we’ll use the available ZAP API and wrap up the API in a Python script. The script will allow us to specify our ZAP server, target application server, trigger each phase of the penetration test and report our results.

ZAP-Basic-CI_CD-Flow - New Page (1)

The core of the ZAP API is to open our proxy, access the target application, spider the application, run an automated scan against it and fetch the results. This can be accomplished with just a handful of commands; however, our goal is to eventually get this bound into a CI/CD environment, so the script will have to be more versatile than a handful of commands.

The Python ZAP API can be easily installed via pip:

pip install python-owasp-zap-v2.4

We’ll start by breaking down what was outlined in the above paragraph. For learning purposes, these can be easily ran from the Python command line.

from zapv2 import ZAPv2

target = "http://" % target_application_url
zap = ZAPv2(proxies={'http': "http://%s" %zap_hostname_or_ip,
                     'https': "https://%s" %zap_hostname_or_ip}
# when status is >= 100, the spider has completed and we can run our scan
# when status is >= 100, the scan has completed and we can fetch results
print zap.core.alerts()

This snippet will print our results straight to STDOUT in a mostly human readable format. To wrap all this up so that we can easily integrate this into an automated environment we can easily change our output to JSON, accept incoming parameters for our ZAP host names and target url. The following script takes the above commands and adds the features just mentioned.

The script can be called as follows:

./ --zap-host --target

Take note, the server that is launching our penetration test does not need to run ZAP itself, nor does it need to run the application we wish to run our pen test against.

Lets set up a very simple web-based application that we can use to test against. This isn’t a real-world example but it works well for the scope of this article. We’ll utilize Flask, a simple Python-based http server and allow it run a basic application that will simply display what was typed into the form field once submitted. The script can be downloaded here.

First Flask needs to be installed and the server started with the following:

pip install flask

The server will run on port 5000 over http. Using the example command above, we’ll run our ZAP penetration test against it as so:

/ --zap-host --target
Spider completed
Info: Scan completed; writing results.

Please note that the ZAP host is simply a url and a port, while the target must specify the protocol, either ‘http’ or ‘https’.

The ‘’ script is just an example of one of the many ways OWASP ZAP can be used in an automated manner. Tests can also be written to integrate FireFox (with ZAP as its proxy) and Selenium to mimic user interaction with your application. This could also be ran from the same script in addition to the existing tests.

Scan and Report the Results

The ZAP API will return results to the ‘’ script which in turns will write them to a JSON file, ‘results.json’. These results could be easily scanned for risk severities such as “grep -ie ‘high’ -e ‘medium’ results.json”. This does not give us much granularity in determining which tests are reporting errors nor if they critical enough to fail an entire build pipeline.

This is where a tool called Behave comes into play. Behave is a Gerkin-based language that allows the user to write test scenarios in a very human readable format.

Behave can be easily installed with pip:

pip install behave

Once installed our test scenarios are placed into a feature file. For this example we can create a file called ‘pen_test.feature’ and create a scenario.

Feature: Pen test the Application
  Scenario: The application should not contain Cross Domain Scripting vulnerabilities
    Given we have valid json alert output
    When there is a cross domain source inclusion vulnerability
    Then none of these risk levels should be present
      | risk |
      | Medium |
      | High |

The above scenario gets broken down into steps. The ‘Given’, ‘When’ and ‘Then’ will each correlate to a portion of Python code that will test each statement. The ‘risk’ portion is a table, that will be passed to our ‘Then’ statement. This can be read as “If the scanner produced valid JSON, succeed if there are no CSX vulnerabilities or only ones with ‘Low’ severity.

With the feature file in place, each step must now be written. A directory must be created called ‘steps’. Inside the ‘steps’ directory we create a file with the same name as the feature file but with a ‘.py’ extension instead of a ‘.feature’ extension. The following example contains the code for each step above to produce a valid test scenario.

import json
import re
import sys

from behave import *

results_file = 'results.json'

@given('we have valid json alert output')
def step_impl(context):
    with open(results_file, 'r') as f:
            context.alerts = json.load(f)
        except Exception as e:
            sys.stdout.write('Error: Invalid JSON in %s: %s\n' %
                             (results_file, e))
            assert False

@when('there is a cross domain source inclusion vulnerability')
def step_impl(context):
    pattern = re.compile(r'cross(?:-|\s+)(?:domain|site)', re.IGNORECASE)
    matches = list()

    for alert in context.alerts:
        if pattern.match(alert['alert']) is not None:
    context.matches = matches
    assert True

@then('none of these risk levels should be present')
def step_impl(context):
    high_risks = list()

    risk_list = list()
    for row in context.table:

    for alert in context.matches:
         if alert['risk'] in risk_list:
             if not any(n['alert'] == alert['alert'] for n in high_risks):
                 high_risks.append(dict({'alert': alert['alert'],
                                          'risk': alert['risk']}))

    if len(high_risks) > 0:
        sys.stderr.write("The following alerts failed:\n")
    for risk in high_risks:
        sys.stderr.write("\t%-5s: %s\n" % (risk['alert'], risk['risk']))
        assert False

    assert True

To run the above test simply type ‘behave’ from the command line.

Feature: Pen test the Application # pen_test.feature:1

  Scenario: The application should not contain Cross Domain Scripting vulnerabilities # pen_test.feature:7
    Given we have valid json alert output # steps/ 0.001s
    When there is a cross domain source inclusion vulnerability # steps/ 0.000s
    Then none of these risk levels should be present # steps/ 0.000s
      | risk |
      | Medium |
      | High |

1 feature passed, 0 failed, 0 skipped
1 scenario passed, 0 failed, 0 skipped
3 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.001s

We can clearly see what was ran and each result. If this was ran from a Jenkins server, the return code will be read and the job will succeed. If a step fails, behave will return non-zero, triggering Jenkins to fail the job. If the job fails, it’s up to the developer to investigate the pipeline, find the point it failed, login to the Jenkins server and view the console output to see which test failed. This may not be the most ideal method. We can tell behave that we want our output in JSON so that another script can consume the JSON, reformat it into something an existing reporting mechanism could use and upload it to a central location.

To change behave’s behavior to dump JSON:

behave --no-summary --format json.pretty > behave_results.json

A reporting script can either read the behave_results, json file or read the STDIN pipe directly from behave. We’ll discuss more regarding this in the followup post.


If you’ve been following along since the first post, we have learned how to set up our own ZAP service, have the ZAP service penetration test a target web application and examine the results. This may be a suitable scenario for many systems. However, integrating this into a full CI/CD pipeline would be the optimal and most efficient use of this.

In part three we will delve into how to fully integrate ZAP so that not only will your application involve user, acceptance and capacity testing, it will now pass through security testing before reaching your end users.

Stelligent is hiring! Do you enjoy working on complex problems like security in the CD pipeline? Do you believe in the “everything-as-code” mantra? If your skills and interests lie at the intersection of DevOps automation and the AWS cloud, check out the careers page on our website.

One thought on “Automating Penetration Testing in a CI/CD Pipeline (Part 2)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s