Comprehensive Recon Guide

A practitioner’s reference for web reconnaissance — attack surface discovery, subdomain enumeration, live host probing, content discovery, JS mining, cloud asset hunting, automation, and continuous monitoring. Compiled from 23 research sources.


Table of Contents

  1. Fundamentals
  2. Scope & Target Profiling
  3. Subdomain Enumeration
  4. DNS Brute Force & Permutation
  5. Live Host Discovery & HTTP Probing
  6. Port Scanning
  7. URL & Endpoint Crawling
  8. JavaScript Analysis
  9. Content & Directory Discovery
  10. Parameter Discovery
  11. Technology Fingerprinting
  12. Cloud Asset Discovery
  13. GitHub & Code Leak Hunting
  14. ASN & Infrastructure Expansion
  15. Wordlist Resources
  16. Automation Pipelines
  17. Continuous Monitoring
  18. Real-World Recon Wins
  19. Quick Reference

1. Fundamentals

Recon is 80% of offensive security. The researchers who earn six figures aren’t running more tools than everyone else — they’re running them in smarter pipelines, feeding the output of one into the next, and manually reviewing the long tail that automation misses. Every hour spent deepening the asset inventory pays off when hunting begins: more subdomains means more parameters, more endpoints, more code paths, more chances for a bug nobody else has seen.

The three classes of recon:

ClassDescriptionExample
PassiveNo packets sent to target — only public data sourcescrt.sh, Shodan, Chaos, Wayback, Google dorks
ActiveDirect interaction with target infrastructureDNS brute force, HTTP probing, port scans, content fuzzing
Semi-activeTargets third-parties that hold target dataGitHub scraping, pastebin scraping, archive.org

Passive first, then active. Passive sources give you free intel with zero detection risk and zero scope violations. Active enumeration should only begin after passive has been exhausted — you use passive subdomains as seeds for permutation, passive URLs as seeds for parameter mining, and passive tech stack data to choose the right active wordlist.

The recon pipeline (end-to-end):

Seed domains
   passive + active enumeration
Subdomains
   dnsx resolve + httpx probe
Live hosts
   naabu/masscan port scan
Open services
   katana + waybackurls + gau crawl
URLs
   unfurl / gf / LinkFinder
Parameters, endpoints, secrets
   ffuf / nuclei / manual review
Attack surface map

Mindset rules:

  • Scope is not a limit — it’s a filter. Always map the entire organization first, then filter to what’s in-scope.
  • Everything is resumable. State your recon in files, diff against yesterday, and alert on new assets.
  • Automation finds the obvious; manual review finds the bounty. Eyeball every new subdomain at least once.
  • Save raw outputs. The dataset you enumerate today is worth re-running against tomorrow’s wordlists.

2. Scope & Target Profiling

Before any enumeration, you need to know what you’re looking at and what you’re allowed to touch.

Scope intake checklist

ItemWhy it matters
In-scope domains (exact vs wildcard)Determines which subdomains are eligible
Out-of-scope carveoutsAvoid N/A submissions and bans
Allowed testing typesActive scanning forbidden on many programs
Rate limiting rulesSaves you from getting blocked mid-recon
Accepted vulnerability classesDon’t hunt bugs that get auto-closed
Third-party service rulesSendGrid, Intercom, Zendesk often out of scope

Target profiling sources

  • crt.sh — certificate transparency, gives wildcard certs and sibling domains
  • BGPView / bgp.he.net — find company ASN and all IP ranges owned
  • SecurityTrails / DNSDumpster — historical DNS records
  • Whoxy / WhoisXML — reverse WHOIS lookup across TLDs
  • LinkedIn / Crunchbase — subsidiaries, acquisitions, product names that become subdomains
  • GitHub org page — gives you the company’s org name which feeds dorking
  • Trademark filings — sometimes leak internal project codenames

Seed expansion

A single apex domain is rarely the whole story. Before enumeration, expand seeds via:

# Find other domains owned by the same org
amass intel -org "Target Corp"
amass intel -whois -d target.com
amass intel -asn 13335
amass intel -cidr 192.0.2.0/24

# Reverse whois via viewdns.info or whoxy
curl -s "https://api.whoxy.com/?key=$KEY&reverse=whois&email=admin@target.com"

Feed the resulting domain list into the rest of the pipeline as a single flat file (seeds.txt).


3. Subdomain Enumeration

Subdomain enumeration is the bedrock of recon. Every additional subdomain is a new host with its own code paths, its own auth model, its own tech stack, and its own chance of being forgotten by the dev team. Treat this phase like an exhaustive search — pull from as many independent sources as possible, dedupe, and re-resolve.

Passive tools

ToolStrengthsCommand
subfinderFast, 30+ passive sources, API-key awaresubfinder -d target.com -all -silent
amass (passive)Deepest source coverage, graph storageamass enum -passive -d target.com
assetfinderMinimal, fast, good for pipelinesassetfinder --subs-only target.com
chaos (ProjectDiscovery)ProjectDiscovery’s curated datasetchaos -d target.com -silent
crt.shCT log scraper, finds wildcard certscurl -s "https://crt.sh/?q=%25.target.com&output=json"
github-subdomainsScrapes subdomains from GitHub code searchgithub-subdomains -d target.com -t $TOKEN
bbotSwiss army knife, 80+ modulesbbot -t target.com -f subdomain-enum

Subfinder in depth

subfinder is the default passive tool — it’s fast, supports API key configuration for premium sources, and outputs one subdomain per line for easy piping.

# Basic
subfinder -d target.com -silent -o subs.txt

# All sources (slower but more thorough)
subfinder -d target.com -all -recursive -silent

# From multiple domains
subfinder -dL seeds.txt -all -silent -o subs.txt

# JSON output for enrichment
subfinder -d target.com -oJ -o subs.json

# Pipe directly into httpx
subfinder -d target.com -silent | httpx -silent

Configure API keys in ~/.config/subfinder/provider-config.yaml — Chaos, SecurityTrails, Censys, Shodan, VirusTotal, GitHub, BinaryEdge, and WhoisXML all materially improve coverage.

Amass in depth

amass is slower than subfinder but pulls from more sources and supports active enumeration, alteration generation, and graph-based asset tracking.

# Passive
amass enum -passive -d target.com -o amass.txt

# Active (adds DNS brute, zone walks, name alterations)
amass enum -active -d target.com -brute -o amass.txt

# Include unresolvable (internal leak hints)
amass enum -d target.com -include-unresolvable

# Bulk scan multiple domains
amass enum -df seeds.txt -o amass-multi.txt

# Track changes over time
amass track -d target.com -dir recon-data

# Visualize as graph
amass viz -d3 -dir recon-data

# Intel pivots
amass intel -org "Target Corp"
amass intel -asn 13335
amass intel -addr 192.0.2.10
amass intel -cidr 192.0.2.0/24
amass intel -whois -d target.com

Amass supports external datasource modules via config (Netlas, SecurityTrails, Shodan, Censys) — always populate the config for real coverage.

Certificate transparency

CT logs contain every TLS certificate ever issued, which means every subdomain that ever got a valid cert. This is one of the cleanest passive sources.

# crt.sh basic
curl -s "https://crt.sh/?q=%25.target.com&output=json" \
  | jq -r '.[].name_value' \
  | sed 's/\*\.//g' \
  | sort -u > crtsh.txt

# Historical wildcards
curl -s "https://crt.sh/?q=%25.%25.target.com&output=json" | jq -r '.[].name_value'

# Find email addresses in CT
curl -s "https://crt.sh/?q=%25@target.com&output=json" | jq -r '.[].name_value'

# Alternative: Censys, Facebook CT, Google CT API

Merging sources

Always run multiple tools and merge — no single source has full coverage.

{
  subfinder -d target.com -all -silent
  assetfinder --subs-only target.com
  amass enum -passive -d target.com
  chaos -d target.com -silent
  curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g'
} | sort -u > all-subs.txt

4. DNS Brute Force & Permutation

Passive sources miss internal-only subdomains and anything that never got a public cert. Brute force and permutation fill the gap.

puredns

puredns wraps massdns with accurate wildcard filtering — the gold standard for brute forcing.

# Brute force with a wordlist
puredns bruteforce wordlist.txt target.com \
  --resolvers resolvers.txt \
  --write results.txt

# Resolve a list of guessed names
puredns resolve candidates.txt --resolvers resolvers.txt

# Resolvers file: public-dns.info/nameservers-all.txt

Permutation (alterations)

Given known subdomains, generate plausible variants and resolve them. Frequently surfaces dev-api, staging-portal, api-v2 variants that nobody lists publicly.

# dnsgen — pattern-based mutations
cat subs.txt | dnsgen - | puredns resolve -r resolvers.txt

# altdns
altdns -i subs.txt -o altdns.txt -w words.txt -r -s results.txt

# gotator — pattern permutator
gotator -sub subs.txt -perm words.txt -depth 1 -numbers 5 | puredns resolve

Good permutation wordlists: best-dns-wordlist.txt from Assetnote, dnsgen built-ins, and a custom list of your target’s product names.

Resolvers

Always use a curated, validated resolver list. Bad resolvers cause false positives.

# Fetch and validate resolvers
wget https://raw.githubusercontent.com/trickest/resolvers/main/resolvers.txt
dnsvalidator -tL resolvers.txt -threads 200 -o validated.txt

5. Live Host Discovery & HTTP Probing

A subdomain list is useless until you know which hosts are live and what they serve.

dnsx — DNS resolution at scale

# Resolve and drop dead entries
cat subs.txt | dnsx -silent -a -resp > resolved.txt

# Only return live subdomains
dnsx -l subs.txt -silent > live-dns.txt

# Wildcard detection
dnsx -l subs.txt -wd target.com -silent

# Retrieve CNAME chain (finds takeover candidates)
dnsx -l subs.txt -cname -resp -silent

# Grab multiple record types
dnsx -l subs.txt -a -aaaa -cname -mx -ns -txt -silent -json

httpx — HTTP/HTTPS probing

httpx is the bridge between DNS and HTTP-layer recon — it probes, fingerprints, and grabs metadata in a single pass.

# Basic alive check
cat live-dns.txt | httpx -silent > alive.txt

# Rich metadata (title, status, tech, IP, CDN)
httpx -l live-dns.txt -title -sc -td -ip -cdn -server -silent -o probe.txt

# Multiple ports
httpx -l live-dns.txt -p 80,443,8080,8443,8000,3000,5000,9000 -silent

# Screenshot every live host
httpx -l live-dns.txt -screenshot -silent

# Filter by status code
httpx -l live-dns.txt -mc 200,301,302,401,403 -silent

# Match content type
httpx -l live-dns.txt -mct "application/json"

# Tech detection (Wappalyzer dataset)
httpx -l live-dns.txt -td -silent -json | jq '.tech'

# Favicon hash for clustering (mmh3)
httpx -l live-dns.txt -favicon -silent

The favicon hash is particularly useful — Shodan indexes favicon hashes, so if httpx returns a hash you can pivot in Shodan to find every other host on the internet serving the same application.


6. Port Scanning

Web apps on port 80/443 are the obvious targets, but devs love to run admin panels, debug dashboards, and internal APIs on unusual ports.

naabu — fast SYN/CONNECT scanner

# Top 1000 ports
naabu -l alive.txt -top-ports 1000 -silent

# Full range
naabu -host target.com -p - -rate 5000

# Pipe into httpx
naabu -l alive.txt -top-ports 1000 -silent | httpx -silent

masscan — internet-scale SYN scanner

# Full port sweep on a CIDR
sudo masscan -p1-65535 192.0.2.0/24 --rate=10000 -oG masscan.out

# Specific high-value ports across large ranges
sudo masscan -p22,80,443,3306,5432,6379,8080,8443,9200,27017 \
  192.0.2.0/16 --rate=20000

nmap — deep service/version detection

After masscan/naabu give you the open ports, use nmap for service detection.

# Service version + default scripts on discovered ports
nmap -sV -sC -p 22,80,443,8080 -iL hosts.txt -oA nmap-scan

# Top ports with aggressive OS detection
nmap -A -T4 --top-ports 1000 -iL hosts.txt

# Vulnerability scripts
nmap -sV --script vuln -iL hosts.txt

rustscan

rustscan is a fast SYN scanner that pipes directly into nmap for service detection — good for single-target deep dives.

rustscan -a target.com --ulimit 5000 -- -sV -sC

Ports worth always scanning

22    SSH
80    HTTP
443   HTTPS
2375  Docker API (unauthenticated)
3000  Grafana, Node dev
3306  MySQL
3389  RDP
5000  Flask dev
5432  Postgres
5601  Kibana
6379  Redis
7001  WebLogic
8000  HTTP alt
8080  HTTP alt / Jenkins
8443  HTTPS alt
8888  Jupyter
9000  SonarQube / PHP-FPM
9200  Elasticsearch
9090  Prometheus
11211 Memcached
15672 RabbitMQ management
27017 MongoDB

7. URL & Endpoint Crawling

URLs are where the vulnerabilities live. You want three sources running in parallel: a passive archive (Wayback / Common Crawl), an active crawler (katana), and a URL extractor from JS.

katana — modern active crawler

# Standard crawl
katana -u https://target.com -d 5 -silent

# From a list
katana -list alive.txt -d 3 -silent -o urls.txt

# Headless with JS rendering
katana -u https://target.com -headless -system-chrome -silent

# Respect robots.txt, follow same-host only
katana -u https://target.com -iqp -fs rdn -silent

# Output JSON with request bodies
katana -u https://target.com -jc -silent -o urls.json

waybackurls — historical URLs

# Basic
echo target.com | waybackurls > wayback.txt

# From all subdomains
cat subs.txt | waybackurls | sort -u > wayback.txt

gau — getallurls (Wayback + CommonCrawl + OTX + URLScan)

echo target.com | gau --threads 10 > gau.txt

# Filter by extension
echo target.com | gau --blacklist png,jpg,gif,css,woff | sort -u

hakrawler

echo https://target.com | hakrawler -d 3 -subs

Merging crawl sources

{
  cat alive.txt | katana -silent -d 3
  cat subs.txt | waybackurls
  cat subs.txt | gau --threads 5
} | sort -u > all-urls.txt

# Filter to interesting URLs
cat all-urls.txt | grep -E "\.(json|xml|js|php|aspx|jsp|env|bak|config|sql)$"
cat all-urls.txt | grep -E "api/|admin|internal|debug|swagger|graphql"

gf — pattern classifier

gf applies named regex patterns to URL lists to quickly bucket potential bug candidates.

cat all-urls.txt | gf ssrf     > ssrf-candidates.txt
cat all-urls.txt | gf xss      > xss-candidates.txt
cat all-urls.txt | gf sqli     > sqli-candidates.txt
cat all-urls.txt | gf redirect > redirect-candidates.txt
cat all-urls.txt | gf lfi      > lfi-candidates.txt
cat all-urls.txt | gf idor     > idor-candidates.txt

8. JavaScript Analysis

Modern apps hide half their API surface inside JavaScript bundles. Every JS file is a map of internal endpoints, buried parameters, legacy routes, hardcoded tokens, and feature flags.

Extract all JS files

# From Wayback
echo target.com | waybackurls | grep -Ei "\.js(\?|$)" | sort -u > js.txt

# From katana
katana -u https://target.com -silent | grep -Ei "\.js(\?|$)" | sort -u >> js.txt

# Verify alive
cat js.txt | httpx -mc 200 -silent > live-js.txt

LinkFinder — endpoint extraction

# Single file
python3 linkfinder.py -i https://target.com/app.js -o cli

# Batch
while read url; do
  python3 linkfinder.py -i "$url" -o cli
done < live-js.txt | sort -u > endpoints.txt

xnLinkFinder

xnLinkFinder is LinkFinder’s successor — it handles minified bundles, recursive crawls, and depth-based discovery.

xnLinkFinder -i live-js.txt -sf target.com -d 3 -o endpoints.txt

SecretFinder — secret detection in JS

python3 SecretFinder.py -i https://target.com/app.js -o cli

while read url; do
  python3 SecretFinder.py -i "$url" -o cli 2>/dev/null
done < live-js.txt | tee secrets.txt

Manual grep patterns

Automation misses clever obfuscation. Fetch the JS and grep for the classics:

curl -s https://target.com/app.js | grep -oE \
  "(api[_-]?key|apikey|secret|token|password|passwd|bearer|aws_access|private_key)[\"':= ]+[a-zA-Z0-9/+=_-]{16,}"

# Hardcoded IPs and internal hosts
curl -s https://target.com/app.js | grep -oE "(https?://)?[a-z0-9.-]*\.(internal|corp|local|intranet)"

# Feature flags
curl -s https://target.com/app.js | grep -oE '"[a-z_]*_(enabled|flag|debug)"'

# Route definitions
curl -s https://target.com/app.js | grep -oE '"/api/[a-zA-Z0-9/_-]+"'

JSLuice

JSLuice is a newer tool that parses JS with a proper AST rather than regex, catching dynamic route construction that regex-based tools miss.

cat live-js.txt | while read url; do
  curl -s "$url" | jsluice urls
done | sort -u > jsluice-urls.txt

9. Content & Directory Discovery

Directory brute forcing finds the files that aren’t linked anywhere — backups, configs, admin pages, staging artifacts, .git directories, and the /old/ folder devs promised to delete.

ffuf — fuzzing swiss army knife

# Basic directory fuzz
ffuf -u https://target.com/FUZZ -w wordlist.txt -mc 200,301,302,403 -o ffuf.json

# With extensions
ffuf -u https://target.com/FUZZ -w raft-medium-words.txt \
  -e .php,.bak,.old,.zip,.tar.gz,.env,.config \
  -mc 200,301,302,403

# Virtual host fuzzing
ffuf -u https://target.com -H "Host: FUZZ.target.com" -w subs.txt -fs 1234

# Recursive
ffuf -u https://target.com/FUZZ -w wordlist.txt -recursion -recursion-depth 2

# Parameter fuzzing
ffuf -u https://target.com/api?FUZZ=test -w params.txt -fs 0

# Rate limited
ffuf -u https://target.com/FUZZ -w wordlist.txt -rate 50

feroxbuster

Recursive content discoverer with smart filtering — great for deep dives.

feroxbuster -u https://target.com -w wordlist.txt -x php,html,txt,bak -d 3

# From a list of targets
feroxbuster --stdin -w wordlist.txt < alive.txt

# Filter by response size
feroxbuster -u https://target.com -w raft-large.txt -S 0,1234

gobuster

gobuster dir -u https://target.com -w wordlist.txt -x php,html,txt
gobuster vhost -u https://target.com -w subs.txt
gobuster dns -d target.com -w dns-wordlist.txt

Discovery strategy

  1. Start small — run raft-small-words.txt before raft-large. Calibrates response baselines.
  2. Chain extensions — if you get hits at /backup, re-fuzz with backup-specific extensions.
  3. Recurse manually — only recurse into directories that return 200/301, not 403/404.
  4. Auto-calibrate — always use -ac in ffuf or filter by response length to dodge soft-404s.
  5. Match response words-fw in ffuf filters by word count, often tighter than length.

10. Parameter Discovery

Hidden parameters are one of the highest-ROI recon findings — they unlock debug modes, admin toggles, SSRF sinks, and auth bypass.

Arjun

# Basic
arjun -u https://target.com/api/users

# With wordlist
arjun -u https://target.com/api/users -w params.txt

# From URL list
arjun -i urls.txt

# Methods
arjun -u https://target.com/api -m GET,POST

# Output
arjun -u https://target.com -oJ arjun.json

ParamSpider

python3 paramspider.py -d target.com -o params.txt

x8

x8 is a fast parameter miner written in Rust with a large built-in wordlist.

x8 -u https://target.com/api -w params.txt

High-signal parameter names to always test

debug, test, admin, internal, trace, verbose
callback, jsonp, returnUrl, redirect, next, url, uri, path, file
id, userId, user_id, account, tenant, org, orgId
template, view, include, load, src, source, dest, target, action, cmd
token, auth, api_key, apikey, access_token, sso

11. Technology Fingerprinting

Knowing the stack narrows your attack surface — CVEs, default paths, framework-specific tricks, and auth bypass tricks all depend on what’s running.

httpx tech detect

httpx -l alive.txt -td -server -title -sc -silent -json | jq '.tech'

whatweb

whatweb -a 3 https://target.com
whatweb -i alive.txt --log-json=whatweb.json

Wappalyzer CLI

wappalyzer https://target.com

Custom fingerprinting via headers/favicon

IndicatorWhat it reveals
X-Powered-ByFramework (Express, PHP version, ASP.NET)
ServerWeb server (nginx version, Apache, IIS)
Set-Cookie namesPHPSESSID, JSESSIONID, XSRF-TOKEN, laravel_session
X-GeneratorCMS (Drupal, WordPress, TYPO3)
Favicon mmh3 hashPivot across Shodan to find every similar deploy
/robots.txtExposed paths, site generator hints
/sitemap.xmlContent structure
/security.txtBug bounty program contact
Error page fingerprintsStack traces leak versions

Shodan & Censys pivots

After fingerprinting, pivot via Shodan to find every host on the internet running the same application.

# Shodan by favicon hash
shodan search "http.favicon.hash:-1234567890"

# By HTTP title
shodan search 'http.title:"Jenkins"'

# By SSL cert subject
shodan search 'ssl.cert.subject.cn:target.com'

# Censys
censys search 'services.tls.certificates.leaf_data.subject.common_name: target.com'

12. Cloud Asset Discovery

Cloud storage misconfigurations remain one of the fastest paths to a critical-severity finding — exposed S3 buckets, readable Azure blobs, and world-writable GCS objects are still common.

S3 bucket discovery

# Generate permutations
cat <<EOF > bucket-perms.txt
target
target-dev
target-prod
target-staging
target-backup
target-backups
target-assets
target-media
target-uploads
target-logs
target-data
target-db
target-internal
target-private
target-public
target-test
target-qa
EOF

# Test each
while read b; do
  aws s3 ls "s3://$b" --no-sign-request 2>&1 | grep -v NoSuchBucket | grep -v AllAccessDisabled
done < bucket-perms.txt

S3Scanner

s3scanner scan --bucket-file bucket-perms.txt
s3scanner scan --bucket target-backups --dump

cloud_enum

Covers S3, Azure, and GCS in one pass.

python3 cloud_enum.py -k target -k targetcorp -k target-internal

Azure blob storage

# Azure blob URL pattern
https://<storage-account>.blob.core.windows.net/<container>/<blob>

# Enumerate containers
curl -s "https://target.blob.core.windows.net/?comp=list" | xmllint --format -

# List blobs in a container
curl -s "https://target.blob.core.windows.net/container?restype=container&comp=list"

GCP bucket enumeration

# Public read check
curl -s "https://storage.googleapis.com/storage/v1/b/target-bucket" | jq .

# List objects
curl -s "https://storage.googleapis.com/storage/v1/b/target-bucket/o" | jq .

CloudFail (DNS/database-based origin discovery)

CloudFail pulls old DNS records and database leaks to bypass CloudFlare and find the real origin IP of a proxied site.

python3 cloudfail.py -t target.com

Bucket wordlists

  • Assetnote wordlists: wordlists.assetnote.iocloud-s3-bucket-names.txt
  • SecLists: Discovery/Cloud/

13. GitHub & Code Leak Hunting

GitHub is a minefield of hardcoded credentials. The org’s public repos are the obvious place, but the real gold is in employee personal repos and in deleted-but-not-purged commits.

GitHub dorking

org:target password
org:target api_key
org:target aws_access_key_id
org:target BEGIN RSA
org:target smtp
org:target "internal-api"
org:target filename:.env
org:target filename:config
org:target extension:sql
org:target extension:pem

GitGot (Bishop Fox)

Semi-automated, feedback-driven code search — suppresses already-reviewed hits so you focus on new matches.

gitgot -q target.com
gitgot -q "target api_key" -o gitgot-results.json

trufflehog — high-entropy secret scanning

# Scan a repo
trufflehog git https://github.com/target/repo

# Scan an org
trufflehog github --org=target --token=$GITHUB_TOKEN

# Verified secrets only (no false positives)
trufflehog github --org=target --only-verified

gitleaks

gitleaks detect --source . --report-format json --report-path gitleaks.json
gitleaks detect --source https://github.com/target/repo

github-subdomains

Scrapes GitHub code for mentions of subdomains — a passive subdomain source most hunters skip.

github-subdomains -d target.com -t $GITHUB_TOKEN -o gh-subs.txt

What to look for

  • .env files (check for AWS, Stripe, Twilio, SendGrid keys)
  • config.yml / config.json / settings.py with DB connection strings
  • Dockerfile with ARG values hardcoding secrets
  • CI/CD YAML files (GitHub Actions, CircleCI) with plaintext tokens
  • Private keys in commit history
  • References to internal hostnames (*.corp.target.com)
  • Historical commits — secrets are often “fixed” in a later commit but still in history

14. ASN & Infrastructure Expansion

Most hunters stop at the subdomain list. Elite hunters expand into IP space and find the forgotten servers that nobody maps to DNS anymore.

Find the ASN

# From a known IP
whois -h whois.cymru.com 192.0.2.10

# Via bgpview
curl -s "https://api.bgpview.io/ip/192.0.2.10" | jq .

# Interactive: bgp.he.net

Enumerate all IP ranges for an ASN

# bgpview
curl -s "https://api.bgpview.io/asn/AS13335/prefixes" \
  | jq -r '.data.ipv4_prefixes[].prefix' > asn-ranges.txt

# amass intel
amass intel -asn 13335 > asn-hosts.txt

Probe everything in the range

# Port scan the range
sudo masscan -iL asn-ranges.txt -p80,443,8080,8443 --rate=10000 -oG masscan.out

# HTTP probe
awk '/open/{print $4}' masscan.out | httpx -silent -title -sc -ip

Reverse DNS across the range

# PTR lookups across a CIDR
prips 192.0.2.0/24 | dnsx -ptr -resp-only

Why this works

Corporate infra expansion happens faster than DNS hygiene. You’ll regularly find:

  • Acquired companies whose old infra still runs on the parent’s ASN
  • Staging servers assigned IPs but never given DNS
  • Legacy admin panels on forgotten boxes
  • Dev environments behind no auth, exposed via direct IP

15. Wordlist Resources

Wordlists are the difference between finding /admin and finding /admin-backup-2019.zip. Use multiple, rotate them, and grow your own.

Core wordlist repos

RepoWhat it contains
SecLists (danielmiessler/SecLists)The canonical source — subdomains, content, params, fuzzing, passwords, payloads
Assetnote wordlists (wordlists.assetnote.io)Data-mined from Common Crawl, hugely higher signal than generic lists
OneListForAllMerged, deduped megalist
fuzzdbLegacy but still has unique patterns
Jhaddix all.txtClassic subdomain brute list

Key SecLists files

Discovery/DNS/
  subdomains-top1million-5000.txt
  subdomains-top1million-110000.txt
  dns-Jhaddix.txt
  bitquark-subdomains-top100000.txt

Discovery/Web-Content/
  raft-small-words.txt
  raft-medium-words.txt
  raft-large-words.txt
  common.txt
  big.txt
  directory-list-2.3-medium.txt
  api/api-endpoints.txt

Discovery/Web-Content/CMS/
  wordpress.fuzz.txt
  joomla.fuzz.txt
  drupal.fuzz.txt

Fuzzing/
  LFI/
  XSS/
  SQLi/

Passwords/
  Common-Credentials/

Assetnote wordlist highlights

best-dns-wordlist.txt              # 9M real subdomains, mined from CT/CC
httparchive_directories_*.txt      # Directories seen in HTTP Archive
httparchive_files_*.txt            # Files seen in HTTP Archive
cloud-s3-bucket-names.txt          # Real S3 bucket names
parameters_top_1M.txt              # Real HTTP parameters

Building your own

After a few engagements, the best wordlist is your own cumulative corpus:

# Collect every URL you've ever crawled
cat recon/*/urls.txt | unfurl paths | awk -F/ '{for(i=2;i<=NF;i++)print $i}' \
  | sort | uniq -c | sort -rn > my-dirs.txt

16. Automation Pipelines

At some point you stop running commands and start running pipelines. Recon frameworks chain every tool above into a single invocation and output a structured asset inventory.

bbot (Black Box Operations Tool)

bbot is the most capable modern recon framework — 80+ modules, event-driven, produces graph output.

# Full subdomain enum
bbot -t target.com -f subdomain-enum

# Web spider + HTTP probe + tech detect
bbot -t target.com -f web-basic

# Everything (expensive)
bbot -t target.com -f subdomain-enum,web-basic,cloud-enum -o bbot-out/

recon-ng

Modular Metasploit-style recon framework with a marketplace of modules.

recon-ng
> marketplace install all
> workspaces create target
> modules load recon/domains-hosts/hackertarget
> options set SOURCE target.com
> run

ReconFTW

All-in-one bash pipeline that wraps subfinder, amass, httpx, nuclei, ffuf, and dozens more.

./reconftw.sh -d target.com -r    # recon only
./reconftw.sh -d target.com -a    # full scan

GarudRecon / subdomainx / Striker / ReconDog

Smaller curated wrappers around the ProjectDiscovery stack — useful as reference implementations when building your own.

Custom bash pipeline (minimal example)

#!/usr/bin/env bash
set -euo pipefail
TARGET=$1
OUT=recon/$TARGET
mkdir -p "$OUT"

# 1. Subdomains
{
  subfinder -d "$TARGET" -all -silent
  assetfinder --subs-only "$TARGET"
  chaos -d "$TARGET" -silent
  curl -s "https://crt.sh/?q=%25.$TARGET&output=json" \
    | jq -r '.[].name_value' 2>/dev/null | sed 's/\*\.//g'
} | sort -u > "$OUT/subs.txt"

# 2. Resolve
dnsx -l "$OUT/subs.txt" -silent > "$OUT/resolved.txt"

# 3. HTTP probe
httpx -l "$OUT/resolved.txt" -silent -title -sc -td -ip -json \
  > "$OUT/httpx.json"
jq -r '.url' "$OUT/httpx.json" > "$OUT/alive.txt"

# 4. Port scan
naabu -l "$OUT/alive.txt" -top-ports 1000 -silent > "$OUT/ports.txt"

# 5. Crawl
{
  katana -list "$OUT/alive.txt" -silent -d 3
  cat "$OUT/subs.txt" | waybackurls
  cat "$OUT/subs.txt" | gau --threads 5
} | sort -u > "$OUT/urls.txt"

# 6. JS files
grep -Ei "\.js(\?|$)" "$OUT/urls.txt" | httpx -mc 200 -silent > "$OUT/js.txt"

# 7. gf classification
for pattern in ssrf xss sqli redirect lfi idor; do
  gf "$pattern" < "$OUT/urls.txt" > "$OUT/gf-$pattern.txt" || true
done

# 8. Nuclei scan
nuclei -l "$OUT/alive.txt" -severity low,medium,high,critical -silent \
  -o "$OUT/nuclei.txt"

echo "Done. Results in $OUT/"

Pipeline principles

  1. Idempotent — rerunning should update, not duplicate
  2. Resumable — each step writes a file that the next step reads
  3. Rate-limited — respect program rules by default
  4. Diff-friendly — always sort -u output so you can diff across runs
  5. Resource-capped — run expensive tools in parallel with xargs -P or GNU parallel, but cap concurrency
  6. Logged — capture stdout and stderr per-tool for later debugging

17. Continuous Monitoring

Recon isn’t a one-shot operation. New subdomains, new endpoints, and new open ports appear daily. The hunters who earn the most set up continuous recon and get alerted when something new shows up.

Nightly diff pattern

#!/usr/bin/env bash
TARGET=$1
TODAY=$(date +%F)
OUT=recon/$TARGET/$TODAY
PREV=$(ls -1 recon/$TARGET | grep -v $TODAY | tail -1)

mkdir -p "$OUT"
subfinder -d "$TARGET" -all -silent | sort -u > "$OUT/subs.txt"

if [[ -f "recon/$TARGET/$PREV/subs.txt" ]]; then
  comm -13 "recon/$TARGET/$PREV/subs.txt" "$OUT/subs.txt" > "$OUT/new.txt"
  if [[ -s "$OUT/new.txt" ]]; then
    curl -X POST "$SLACK_WEBHOOK" -d "New subs for $TARGET: $(cat $OUT/new.txt)"
  fi
fi

Run via cron or GitHub Actions on a schedule (hourly for active programs, daily for slow-moving targets).

What to monitor

SignalAction
New subdomainImmediate triage — often a fresh deploy with bugs
New open port on known hostService enumeration, version check
New JS file or bundle hash changeRe-run LinkFinder, diff endpoint list
New nuclei findingTriage the specific template
Cert transparency alertCertstream-based live feed
GitHub new public repo in orgScan for secrets
DNS CNAME changeTakeover check
HTTP response hash change on login/admin pagesManual review

Tools for continuous monitoring

  • axiom — distribute recon across cloud workers
  • interlace — parallelize any CLI tool across a target list
  • notify (ProjectDiscovery) — multi-channel output for any pipeline
  • certstream — real-time CT log feed
  • GitHub webhooks — push notifications on new repos/commits

Notify example

subfinder -d target.com -all -silent \
  | anew subs.txt \
  | notify -silent -bulk -provider slack

anew only emits newly seen lines, so notify only fires on genuinely new subdomains.


18. Real-World Recon Wins

Actual case studies drawn from public writeups that hinged on recon quality, not exploit cleverness.

JavaScript endpoint → $25K

A researcher grepped an obfuscated webpack bundle and found a reference to /api/v2/internal/users. The endpoint was reachable without authentication and returned the full user database. Total time to bug: 45 minutes. The app had passed a third-party pentest six months earlier.

Lesson: always pull the source maps (.js.map) when present — they reverse the minification and hand you the original file structure.

ASN expansion → SQL injection bounty

Hunter started with 50 in-scope subdomains. Pulled the company ASN, enumerated IP ranges via bgpview, and probed with httpx. Found 500+ live hosts including a forgotten admin panel on an unlisted IP. The panel was vulnerable to classic SQL injection. Critical-severity payout.

Lesson: scope says “*.target.com” but the program owner owns entire IP blocks — check the program rules, many allow any asset owned by the company.

S3 bucket → 2M user PII leak

Generated bucket name permutations (company-backup, company-backups, company-backup-prod) and tested each with aws s3 ls --no-sign-request. company-backup-prod returned a directory listing. It contained a full user database dump in SQL format. $50K critical bounty.

Lesson: bucket name permutation is low-effort, high-reward. Always run it as part of initial recon.

Wayback Machine → auth bypass

Old Wayback snapshot from 2018 showed a /debug/users?bypass=1 endpoint. The endpoint was removed from the current site but the route handler was still mounted. Hitting it directly returned the admin UI. Critical severity.

Lesson: routes outlive the UI that references them. waybackurls + httpx on every historical URL is cheap and frequently pays.

Favicon hash pivot → exposed Jenkins

Researcher grabbed the favicon hash of the target’s build system, queried Shodan for the same hash, and found an additional Jenkins instance on a random IP outside the scope’s DNS. The instance had an anonymous build execution bug because it had never been upgraded. RCE, $20K.

Lesson: favicon pivoting finds assets that DNS never advertised.

GitHub commit history → AWS keys

Secret scanning in a current repo showed nothing — but git log --all on the repo history showed a commit from 2021 where a dev accidentally committed .env and “deleted” it the next day. The keys still worked. Full AWS account takeover. Max-severity report.

Lesson: current-state scanning misses historical secrets. Always scan the full git history.

Subdomain permutation → internal admin

admin.target.com was in scope. Permutation with dnsgen generated admin-legacy.target.com which resolved. It was the pre-migration admin panel, still running, still authenticating against the old LDAP, with a test account nobody had deleted. Full admin. $30K.

Lesson: dev-, -old, -legacy, -v1, -staging, -internal permutations consistently find forgotten infra.


19. Quick Reference

Install the core stack (Go tools)

# ProjectDiscovery toolkit
go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
go install github.com/projectdiscovery/httpx/cmd/httpx@latest
go install github.com/projectdiscovery/dnsx/cmd/dnsx@latest
go install github.com/projectdiscovery/naabu/v2/cmd/naabu@latest
go install github.com/projectdiscovery/katana/cmd/katana@latest
go install github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
go install github.com/projectdiscovery/chaos-client/cmd/chaos@latest
go install github.com/projectdiscovery/notify/cmd/notify@latest

# Other Go tools
go install github.com/tomnomnom/assetfinder@latest
go install github.com/tomnomnom/waybackurls@latest
go install github.com/tomnomnom/anew@latest
go install github.com/tomnomnom/gf@latest
go install github.com/tomnomnom/unfurl@latest
go install github.com/hakluke/hakrawler@latest
go install github.com/lc/gau/v2/cmd/gau@latest
go install github.com/ffuf/ffuf/v2@latest
go install github.com/OJ/gobuster/v3@latest

# Rust / Python
cargo install feroxbuster
pip install arjun xnLinkFinder

One-liner pipelines

# Subs → alive → screenshots
subfinder -d target.com -all -silent \
  | dnsx -silent \
  | httpx -silent -screenshot

# Subs → crawl → JS → endpoints
subfinder -d target.com -all -silent \
  | httpx -silent \
  | katana -silent -d 3 \
  | grep -E "\.js$" \
  | xargs -I{} python3 linkfinder.py -i {} -o cli

# Subs → params → SSRF candidates
subfinder -d target.com -all -silent \
  | httpx -silent \
  | waybackurls \
  | gf ssrf > ssrf-candidates.txt

# Every URL ever crawled on every sub
subfinder -d target.com -silent \
  | gau --threads 10 --blacklist png,jpg,gif,css,woff \
  | sort -u > urls.txt

Tool family tally

FamilyCanonical tool(s)
Subdomain passivesubfinder, amass, assetfinder, chaos, crt.sh
Subdomain brutepuredns, shuffledns
Permutationdnsgen, altdns, gotator
Resolutiondnsx, massdns
HTTP probehttpx
Port scannaabu, masscan, nmap, rustscan
Crawlingkatana, hakrawler, gospider
Archivewaybackurls, gau
JS analysisLinkFinder, xnLinkFinder, SecretFinder, jsluice
Content discoveryffuf, feroxbuster, gobuster, dirsearch
Parametersarjun, paramspider, x8
Fingerprintingwhatweb, wappalyzer, httpx -td
Cloudcloud_enum, s3scanner, CloudFail
GitHubtrufflehog, gitleaks, gitgot, github-subdomains
Vuln scannuclei
Orchestrationbbot, reconftw, recon-ng
Notificationnotify, anew

Recon checklist (per engagement)

[ ] Scope captured in a file
[ ] Seed domains expanded via reverse whois / ASN
[ ] Passive subdomain enum run (subfinder, amass, chaos, crt.sh)
[ ] Active subdomain enum run (puredns brute + permutation)
[ ] All subdomains resolved via dnsx
[ ] All live hosts probed via httpx with -td
[ ] Screenshots captured for visual triage
[ ] Port scan run (naabu top-1000 minimum)
[ ] Crawl complete (katana + waybackurls + gau)
[ ] JS files extracted and mined with LinkFinder
[ ] Secrets scan run on JS
[ ] Parameter discovery run (arjun) on high-value endpoints
[ ] gf patterns applied to URL corpus
[ ] Cloud buckets permuted and tested
[ ] GitHub org scanned with trufflehog
[ ] Nuclei baseline scan run
[ ] Nightly diff pipeline set up
[ ] All raw outputs archived for future re-use

Closing Notes

Recon compounds. Every engagement you do adds to your personal corpus — subdomains seen, parameters seen, directory names seen, tech stacks fingerprinted. The hunters at the top of the leaderboards aren’t running secret tools; they’re running the same tools on datasets refined by years of prior hunts. Start the corpus now, automate the nightly diff, and treat every new asset as a fresh mini-engagement.

The bug isn’t in the tool you ran. It’s in the asset you didn’t know existed.