Comprehensive Recon Guide#
A practitioner’s reference for web reconnaissance — attack surface discovery, subdomain enumeration, live host probing, content discovery, JS mining, cloud asset hunting, automation, and continuous monitoring. Compiled from 23 research sources.
Table of Contents#
- Fundamentals
- Scope & Target Profiling
- Subdomain Enumeration
- DNS Brute Force & Permutation
- Live Host Discovery & HTTP Probing
- Port Scanning
- URL & Endpoint Crawling
- JavaScript Analysis
- Content & Directory Discovery
- Parameter Discovery
- Technology Fingerprinting
- Cloud Asset Discovery
- GitHub & Code Leak Hunting
- ASN & Infrastructure Expansion
- Wordlist Resources
- Automation Pipelines
- Continuous Monitoring
- Real-World Recon Wins
- Quick Reference
1. Fundamentals#
Recon is 80% of offensive security. The researchers who earn six figures aren’t running more tools than everyone else — they’re running them in smarter pipelines, feeding the output of one into the next, and manually reviewing the long tail that automation misses. Every hour spent deepening the asset inventory pays off when hunting begins: more subdomains means more parameters, more endpoints, more code paths, more chances for a bug nobody else has seen.
The three classes of recon:
| Class | Description | Example |
|---|---|---|
| Passive | No packets sent to target — only public data sources | crt.sh, Shodan, Chaos, Wayback, Google dorks |
| Active | Direct interaction with target infrastructure | DNS brute force, HTTP probing, port scans, content fuzzing |
| Semi-active | Targets third-parties that hold target data | GitHub scraping, pastebin scraping, archive.org |
Passive first, then active. Passive sources give you free intel with zero detection risk and zero scope violations. Active enumeration should only begin after passive has been exhausted — you use passive subdomains as seeds for permutation, passive URLs as seeds for parameter mining, and passive tech stack data to choose the right active wordlist.
The recon pipeline (end-to-end):
Seed domains
↓ passive + active enumeration
Subdomains
↓ dnsx resolve + httpx probe
Live hosts
↓ naabu/masscan port scan
Open services
↓ katana + waybackurls + gau crawl
URLs
↓ unfurl / gf / LinkFinder
Parameters, endpoints, secrets
↓ ffuf / nuclei / manual review
Attack surface map
Mindset rules:
- Scope is not a limit — it’s a filter. Always map the entire organization first, then filter to what’s in-scope.
- Everything is resumable. State your recon in files, diff against yesterday, and alert on new assets.
- Automation finds the obvious; manual review finds the bounty. Eyeball every new subdomain at least once.
- Save raw outputs. The dataset you enumerate today is worth re-running against tomorrow’s wordlists.
2. Scope & Target Profiling#
Before any enumeration, you need to know what you’re looking at and what you’re allowed to touch.
Scope intake checklist#
| Item | Why it matters |
|---|---|
| In-scope domains (exact vs wildcard) | Determines which subdomains are eligible |
| Out-of-scope carveouts | Avoid N/A submissions and bans |
| Allowed testing types | Active scanning forbidden on many programs |
| Rate limiting rules | Saves you from getting blocked mid-recon |
| Accepted vulnerability classes | Don’t hunt bugs that get auto-closed |
| Third-party service rules | SendGrid, Intercom, Zendesk often out of scope |
Target profiling sources#
- crt.sh — certificate transparency, gives wildcard certs and sibling domains
- BGPView / bgp.he.net — find company ASN and all IP ranges owned
- SecurityTrails / DNSDumpster — historical DNS records
- Whoxy / WhoisXML — reverse WHOIS lookup across TLDs
- LinkedIn / Crunchbase — subsidiaries, acquisitions, product names that become subdomains
- GitHub org page — gives you the company’s org name which feeds dorking
- Trademark filings — sometimes leak internal project codenames
Seed expansion#
A single apex domain is rarely the whole story. Before enumeration, expand seeds via:
# Find other domains owned by the same org
amass intel -org "Target Corp"
amass intel -whois -d target.com
amass intel -asn 13335
amass intel -cidr 192.0.2.0/24
# Reverse whois via viewdns.info or whoxy
curl -s "https://api.whoxy.com/?key=$KEY&reverse=whois&email=admin@target.com"
Feed the resulting domain list into the rest of the pipeline as a single flat file (seeds.txt).
3. Subdomain Enumeration#
Subdomain enumeration is the bedrock of recon. Every additional subdomain is a new host with its own code paths, its own auth model, its own tech stack, and its own chance of being forgotten by the dev team. Treat this phase like an exhaustive search — pull from as many independent sources as possible, dedupe, and re-resolve.
Passive tools#
| Tool | Strengths | Command |
|---|---|---|
| subfinder | Fast, 30+ passive sources, API-key aware | subfinder -d target.com -all -silent |
| amass (passive) | Deepest source coverage, graph storage | amass enum -passive -d target.com |
| assetfinder | Minimal, fast, good for pipelines | assetfinder --subs-only target.com |
| chaos (ProjectDiscovery) | ProjectDiscovery’s curated dataset | chaos -d target.com -silent |
| crt.sh | CT log scraper, finds wildcard certs | curl -s "https://crt.sh/?q=%25.target.com&output=json" |
| github-subdomains | Scrapes subdomains from GitHub code search | github-subdomains -d target.com -t $TOKEN |
| bbot | Swiss army knife, 80+ modules | bbot -t target.com -f subdomain-enum |
Subfinder in depth#
subfinder is the default passive tool — it’s fast, supports API key configuration for premium sources, and outputs one subdomain per line for easy piping.
# Basic
subfinder -d target.com -silent -o subs.txt
# All sources (slower but more thorough)
subfinder -d target.com -all -recursive -silent
# From multiple domains
subfinder -dL seeds.txt -all -silent -o subs.txt
# JSON output for enrichment
subfinder -d target.com -oJ -o subs.json
# Pipe directly into httpx
subfinder -d target.com -silent | httpx -silent
Configure API keys in ~/.config/subfinder/provider-config.yaml — Chaos, SecurityTrails, Censys, Shodan, VirusTotal, GitHub, BinaryEdge, and WhoisXML all materially improve coverage.
Amass in depth#
amass is slower than subfinder but pulls from more sources and supports active enumeration, alteration generation, and graph-based asset tracking.
# Passive
amass enum -passive -d target.com -o amass.txt
# Active (adds DNS brute, zone walks, name alterations)
amass enum -active -d target.com -brute -o amass.txt
# Include unresolvable (internal leak hints)
amass enum -d target.com -include-unresolvable
# Bulk scan multiple domains
amass enum -df seeds.txt -o amass-multi.txt
# Track changes over time
amass track -d target.com -dir recon-data
# Visualize as graph
amass viz -d3 -dir recon-data
# Intel pivots
amass intel -org "Target Corp"
amass intel -asn 13335
amass intel -addr 192.0.2.10
amass intel -cidr 192.0.2.0/24
amass intel -whois -d target.com
Amass supports external datasource modules via config (Netlas, SecurityTrails, Shodan, Censys) — always populate the config for real coverage.
Certificate transparency#
CT logs contain every TLS certificate ever issued, which means every subdomain that ever got a valid cert. This is one of the cleanest passive sources.
# crt.sh basic
curl -s "https://crt.sh/?q=%25.target.com&output=json" \
| jq -r '.[].name_value' \
| sed 's/\*\.//g' \
| sort -u > crtsh.txt
# Historical wildcards
curl -s "https://crt.sh/?q=%25.%25.target.com&output=json" | jq -r '.[].name_value'
# Find email addresses in CT
curl -s "https://crt.sh/?q=%25@target.com&output=json" | jq -r '.[].name_value'
# Alternative: Censys, Facebook CT, Google CT API
Merging sources#
Always run multiple tools and merge — no single source has full coverage.
{
subfinder -d target.com -all -silent
assetfinder --subs-only target.com
amass enum -passive -d target.com
chaos -d target.com -silent
curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g'
} | sort -u > all-subs.txt
4. DNS Brute Force & Permutation#
Passive sources miss internal-only subdomains and anything that never got a public cert. Brute force and permutation fill the gap.
puredns#
puredns wraps massdns with accurate wildcard filtering — the gold standard for brute forcing.
# Brute force with a wordlist
puredns bruteforce wordlist.txt target.com \
--resolvers resolvers.txt \
--write results.txt
# Resolve a list of guessed names
puredns resolve candidates.txt --resolvers resolvers.txt
# Resolvers file: public-dns.info/nameservers-all.txt
Permutation (alterations)#
Given known subdomains, generate plausible variants and resolve them. Frequently surfaces dev-api, staging-portal, api-v2 variants that nobody lists publicly.
# dnsgen — pattern-based mutations
cat subs.txt | dnsgen - | puredns resolve -r resolvers.txt
# altdns
altdns -i subs.txt -o altdns.txt -w words.txt -r -s results.txt
# gotator — pattern permutator
gotator -sub subs.txt -perm words.txt -depth 1 -numbers 5 | puredns resolve
Good permutation wordlists: best-dns-wordlist.txt from Assetnote, dnsgen built-ins, and a custom list of your target’s product names.
Resolvers#
Always use a curated, validated resolver list. Bad resolvers cause false positives.
# Fetch and validate resolvers
wget https://raw.githubusercontent.com/trickest/resolvers/main/resolvers.txt
dnsvalidator -tL resolvers.txt -threads 200 -o validated.txt
5. Live Host Discovery & HTTP Probing#
A subdomain list is useless until you know which hosts are live and what they serve.
dnsx — DNS resolution at scale#
# Resolve and drop dead entries
cat subs.txt | dnsx -silent -a -resp > resolved.txt
# Only return live subdomains
dnsx -l subs.txt -silent > live-dns.txt
# Wildcard detection
dnsx -l subs.txt -wd target.com -silent
# Retrieve CNAME chain (finds takeover candidates)
dnsx -l subs.txt -cname -resp -silent
# Grab multiple record types
dnsx -l subs.txt -a -aaaa -cname -mx -ns -txt -silent -json
httpx — HTTP/HTTPS probing#
httpx is the bridge between DNS and HTTP-layer recon — it probes, fingerprints, and grabs metadata in a single pass.
# Basic alive check
cat live-dns.txt | httpx -silent > alive.txt
# Rich metadata (title, status, tech, IP, CDN)
httpx -l live-dns.txt -title -sc -td -ip -cdn -server -silent -o probe.txt
# Multiple ports
httpx -l live-dns.txt -p 80,443,8080,8443,8000,3000,5000,9000 -silent
# Screenshot every live host
httpx -l live-dns.txt -screenshot -silent
# Filter by status code
httpx -l live-dns.txt -mc 200,301,302,401,403 -silent
# Match content type
httpx -l live-dns.txt -mct "application/json"
# Tech detection (Wappalyzer dataset)
httpx -l live-dns.txt -td -silent -json | jq '.tech'
# Favicon hash for clustering (mmh3)
httpx -l live-dns.txt -favicon -silent
The favicon hash is particularly useful — Shodan indexes favicon hashes, so if httpx returns a hash you can pivot in Shodan to find every other host on the internet serving the same application.
6. Port Scanning#
Web apps on port 80/443 are the obvious targets, but devs love to run admin panels, debug dashboards, and internal APIs on unusual ports.
naabu — fast SYN/CONNECT scanner#
# Top 1000 ports
naabu -l alive.txt -top-ports 1000 -silent
# Full range
naabu -host target.com -p - -rate 5000
# Pipe into httpx
naabu -l alive.txt -top-ports 1000 -silent | httpx -silent
masscan — internet-scale SYN scanner#
# Full port sweep on a CIDR
sudo masscan -p1-65535 192.0.2.0/24 --rate=10000 -oG masscan.out
# Specific high-value ports across large ranges
sudo masscan -p22,80,443,3306,5432,6379,8080,8443,9200,27017 \
192.0.2.0/16 --rate=20000
nmap — deep service/version detection#
After masscan/naabu give you the open ports, use nmap for service detection.
# Service version + default scripts on discovered ports
nmap -sV -sC -p 22,80,443,8080 -iL hosts.txt -oA nmap-scan
# Top ports with aggressive OS detection
nmap -A -T4 --top-ports 1000 -iL hosts.txt
# Vulnerability scripts
nmap -sV --script vuln -iL hosts.txt
rustscan#
rustscan is a fast SYN scanner that pipes directly into nmap for service detection — good for single-target deep dives.
rustscan -a target.com --ulimit 5000 -- -sV -sC
Ports worth always scanning#
22 SSH
80 HTTP
443 HTTPS
2375 Docker API (unauthenticated)
3000 Grafana, Node dev
3306 MySQL
3389 RDP
5000 Flask dev
5432 Postgres
5601 Kibana
6379 Redis
7001 WebLogic
8000 HTTP alt
8080 HTTP alt / Jenkins
8443 HTTPS alt
8888 Jupyter
9000 SonarQube / PHP-FPM
9200 Elasticsearch
9090 Prometheus
11211 Memcached
15672 RabbitMQ management
27017 MongoDB
7. URL & Endpoint Crawling#
URLs are where the vulnerabilities live. You want three sources running in parallel: a passive archive (Wayback / Common Crawl), an active crawler (katana), and a URL extractor from JS.
katana — modern active crawler#
# Standard crawl
katana -u https://target.com -d 5 -silent
# From a list
katana -list alive.txt -d 3 -silent -o urls.txt
# Headless with JS rendering
katana -u https://target.com -headless -system-chrome -silent
# Respect robots.txt, follow same-host only
katana -u https://target.com -iqp -fs rdn -silent
# Output JSON with request bodies
katana -u https://target.com -jc -silent -o urls.json
waybackurls — historical URLs#
# Basic
echo target.com | waybackurls > wayback.txt
# From all subdomains
cat subs.txt | waybackurls | sort -u > wayback.txt
gau — getallurls (Wayback + CommonCrawl + OTX + URLScan)#
echo target.com | gau --threads 10 > gau.txt
# Filter by extension
echo target.com | gau --blacklist png,jpg,gif,css,woff | sort -u
hakrawler#
echo https://target.com | hakrawler -d 3 -subs
Merging crawl sources#
{
cat alive.txt | katana -silent -d 3
cat subs.txt | waybackurls
cat subs.txt | gau --threads 5
} | sort -u > all-urls.txt
# Filter to interesting URLs
cat all-urls.txt | grep -E "\.(json|xml|js|php|aspx|jsp|env|bak|config|sql)$"
cat all-urls.txt | grep -E "api/|admin|internal|debug|swagger|graphql"
gf — pattern classifier#
gf applies named regex patterns to URL lists to quickly bucket potential bug candidates.
cat all-urls.txt | gf ssrf > ssrf-candidates.txt
cat all-urls.txt | gf xss > xss-candidates.txt
cat all-urls.txt | gf sqli > sqli-candidates.txt
cat all-urls.txt | gf redirect > redirect-candidates.txt
cat all-urls.txt | gf lfi > lfi-candidates.txt
cat all-urls.txt | gf idor > idor-candidates.txt
8. JavaScript Analysis#
Modern apps hide half their API surface inside JavaScript bundles. Every JS file is a map of internal endpoints, buried parameters, legacy routes, hardcoded tokens, and feature flags.
Extract all JS files#
# From Wayback
echo target.com | waybackurls | grep -Ei "\.js(\?|$)" | sort -u > js.txt
# From katana
katana -u https://target.com -silent | grep -Ei "\.js(\?|$)" | sort -u >> js.txt
# Verify alive
cat js.txt | httpx -mc 200 -silent > live-js.txt
LinkFinder — endpoint extraction#
# Single file
python3 linkfinder.py -i https://target.com/app.js -o cli
# Batch
while read url; do
python3 linkfinder.py -i "$url" -o cli
done < live-js.txt | sort -u > endpoints.txt
xnLinkFinder#
xnLinkFinder is LinkFinder’s successor — it handles minified bundles, recursive crawls, and depth-based discovery.
xnLinkFinder -i live-js.txt -sf target.com -d 3 -o endpoints.txt
SecretFinder — secret detection in JS#
python3 SecretFinder.py -i https://target.com/app.js -o cli
while read url; do
python3 SecretFinder.py -i "$url" -o cli 2>/dev/null
done < live-js.txt | tee secrets.txt
Manual grep patterns#
Automation misses clever obfuscation. Fetch the JS and grep for the classics:
curl -s https://target.com/app.js | grep -oE \
"(api[_-]?key|apikey|secret|token|password|passwd|bearer|aws_access|private_key)[\"':= ]+[a-zA-Z0-9/+=_-]{16,}"
# Hardcoded IPs and internal hosts
curl -s https://target.com/app.js | grep -oE "(https?://)?[a-z0-9.-]*\.(internal|corp|local|intranet)"
# Feature flags
curl -s https://target.com/app.js | grep -oE '"[a-z_]*_(enabled|flag|debug)"'
# Route definitions
curl -s https://target.com/app.js | grep -oE '"/api/[a-zA-Z0-9/_-]+"'
JSLuice#
JSLuice is a newer tool that parses JS with a proper AST rather than regex, catching dynamic route construction that regex-based tools miss.
cat live-js.txt | while read url; do
curl -s "$url" | jsluice urls
done | sort -u > jsluice-urls.txt
9. Content & Directory Discovery#
Directory brute forcing finds the files that aren’t linked anywhere — backups, configs, admin pages, staging artifacts, .git directories, and the /old/ folder devs promised to delete.
ffuf — fuzzing swiss army knife#
# Basic directory fuzz
ffuf -u https://target.com/FUZZ -w wordlist.txt -mc 200,301,302,403 -o ffuf.json
# With extensions
ffuf -u https://target.com/FUZZ -w raft-medium-words.txt \
-e .php,.bak,.old,.zip,.tar.gz,.env,.config \
-mc 200,301,302,403
# Virtual host fuzzing
ffuf -u https://target.com -H "Host: FUZZ.target.com" -w subs.txt -fs 1234
# Recursive
ffuf -u https://target.com/FUZZ -w wordlist.txt -recursion -recursion-depth 2
# Parameter fuzzing
ffuf -u https://target.com/api?FUZZ=test -w params.txt -fs 0
# Rate limited
ffuf -u https://target.com/FUZZ -w wordlist.txt -rate 50
feroxbuster#
Recursive content discoverer with smart filtering — great for deep dives.
feroxbuster -u https://target.com -w wordlist.txt -x php,html,txt,bak -d 3
# From a list of targets
feroxbuster --stdin -w wordlist.txt < alive.txt
# Filter by response size
feroxbuster -u https://target.com -w raft-large.txt -S 0,1234
gobuster#
gobuster dir -u https://target.com -w wordlist.txt -x php,html,txt
gobuster vhost -u https://target.com -w subs.txt
gobuster dns -d target.com -w dns-wordlist.txt
Discovery strategy#
- Start small — run
raft-small-words.txtbeforeraft-large. Calibrates response baselines. - Chain extensions — if you get hits at
/backup, re-fuzz with backup-specific extensions. - Recurse manually — only recurse into directories that return 200/301, not 403/404.
- Auto-calibrate — always use
-acin ffuf or filter by response length to dodge soft-404s. - Match response words —
-fwin ffuf filters by word count, often tighter than length.
10. Parameter Discovery#
Hidden parameters are one of the highest-ROI recon findings — they unlock debug modes, admin toggles, SSRF sinks, and auth bypass.
Arjun#
# Basic
arjun -u https://target.com/api/users
# With wordlist
arjun -u https://target.com/api/users -w params.txt
# From URL list
arjun -i urls.txt
# Methods
arjun -u https://target.com/api -m GET,POST
# Output
arjun -u https://target.com -oJ arjun.json
ParamSpider#
python3 paramspider.py -d target.com -o params.txt
x8#
x8 is a fast parameter miner written in Rust with a large built-in wordlist.
x8 -u https://target.com/api -w params.txt
High-signal parameter names to always test#
debug, test, admin, internal, trace, verbose
callback, jsonp, returnUrl, redirect, next, url, uri, path, file
id, userId, user_id, account, tenant, org, orgId
template, view, include, load, src, source, dest, target, action, cmd
token, auth, api_key, apikey, access_token, sso
11. Technology Fingerprinting#
Knowing the stack narrows your attack surface — CVEs, default paths, framework-specific tricks, and auth bypass tricks all depend on what’s running.
httpx tech detect#
httpx -l alive.txt -td -server -title -sc -silent -json | jq '.tech'
whatweb#
whatweb -a 3 https://target.com
whatweb -i alive.txt --log-json=whatweb.json
Wappalyzer CLI#
wappalyzer https://target.com
Custom fingerprinting via headers/favicon#
| Indicator | What it reveals |
|---|---|
X-Powered-By | Framework (Express, PHP version, ASP.NET) |
Server | Web server (nginx version, Apache, IIS) |
Set-Cookie names | PHPSESSID, JSESSIONID, XSRF-TOKEN, laravel_session |
X-Generator | CMS (Drupal, WordPress, TYPO3) |
| Favicon mmh3 hash | Pivot across Shodan to find every similar deploy |
/robots.txt | Exposed paths, site generator hints |
/sitemap.xml | Content structure |
/security.txt | Bug bounty program contact |
| Error page fingerprints | Stack traces leak versions |
Shodan & Censys pivots#
After fingerprinting, pivot via Shodan to find every host on the internet running the same application.
# Shodan by favicon hash
shodan search "http.favicon.hash:-1234567890"
# By HTTP title
shodan search 'http.title:"Jenkins"'
# By SSL cert subject
shodan search 'ssl.cert.subject.cn:target.com'
# Censys
censys search 'services.tls.certificates.leaf_data.subject.common_name: target.com'
12. Cloud Asset Discovery#
Cloud storage misconfigurations remain one of the fastest paths to a critical-severity finding — exposed S3 buckets, readable Azure blobs, and world-writable GCS objects are still common.
S3 bucket discovery#
# Generate permutations
cat <<EOF > bucket-perms.txt
target
target-dev
target-prod
target-staging
target-backup
target-backups
target-assets
target-media
target-uploads
target-logs
target-data
target-db
target-internal
target-private
target-public
target-test
target-qa
EOF
# Test each
while read b; do
aws s3 ls "s3://$b" --no-sign-request 2>&1 | grep -v NoSuchBucket | grep -v AllAccessDisabled
done < bucket-perms.txt
S3Scanner#
s3scanner scan --bucket-file bucket-perms.txt
s3scanner scan --bucket target-backups --dump
cloud_enum#
Covers S3, Azure, and GCS in one pass.
python3 cloud_enum.py -k target -k targetcorp -k target-internal
Azure blob storage#
# Azure blob URL pattern
https://<storage-account>.blob.core.windows.net/<container>/<blob>
# Enumerate containers
curl -s "https://target.blob.core.windows.net/?comp=list" | xmllint --format -
# List blobs in a container
curl -s "https://target.blob.core.windows.net/container?restype=container&comp=list"
GCP bucket enumeration#
# Public read check
curl -s "https://storage.googleapis.com/storage/v1/b/target-bucket" | jq .
# List objects
curl -s "https://storage.googleapis.com/storage/v1/b/target-bucket/o" | jq .
CloudFail (DNS/database-based origin discovery)#
CloudFail pulls old DNS records and database leaks to bypass CloudFlare and find the real origin IP of a proxied site.
python3 cloudfail.py -t target.com
Bucket wordlists#
- Assetnote wordlists:
wordlists.assetnote.io—cloud-s3-bucket-names.txt - SecLists:
Discovery/Cloud/
13. GitHub & Code Leak Hunting#
GitHub is a minefield of hardcoded credentials. The org’s public repos are the obvious place, but the real gold is in employee personal repos and in deleted-but-not-purged commits.
GitHub dorking#
org:target password
org:target api_key
org:target aws_access_key_id
org:target BEGIN RSA
org:target smtp
org:target "internal-api"
org:target filename:.env
org:target filename:config
org:target extension:sql
org:target extension:pem
GitGot (Bishop Fox)#
Semi-automated, feedback-driven code search — suppresses already-reviewed hits so you focus on new matches.
gitgot -q target.com
gitgot -q "target api_key" -o gitgot-results.json
trufflehog — high-entropy secret scanning#
# Scan a repo
trufflehog git https://github.com/target/repo
# Scan an org
trufflehog github --org=target --token=$GITHUB_TOKEN
# Verified secrets only (no false positives)
trufflehog github --org=target --only-verified
gitleaks#
gitleaks detect --source . --report-format json --report-path gitleaks.json
gitleaks detect --source https://github.com/target/repo
github-subdomains#
Scrapes GitHub code for mentions of subdomains — a passive subdomain source most hunters skip.
github-subdomains -d target.com -t $GITHUB_TOKEN -o gh-subs.txt
What to look for#
.envfiles (check for AWS, Stripe, Twilio, SendGrid keys)config.yml/config.json/settings.pywith DB connection stringsDockerfilewithARGvalues hardcoding secrets- CI/CD YAML files (GitHub Actions, CircleCI) with plaintext tokens
- Private keys in commit history
- References to internal hostnames (
*.corp.target.com) - Historical commits — secrets are often “fixed” in a later commit but still in history
14. ASN & Infrastructure Expansion#
Most hunters stop at the subdomain list. Elite hunters expand into IP space and find the forgotten servers that nobody maps to DNS anymore.
Find the ASN#
# From a known IP
whois -h whois.cymru.com 192.0.2.10
# Via bgpview
curl -s "https://api.bgpview.io/ip/192.0.2.10" | jq .
# Interactive: bgp.he.net
Enumerate all IP ranges for an ASN#
# bgpview
curl -s "https://api.bgpview.io/asn/AS13335/prefixes" \
| jq -r '.data.ipv4_prefixes[].prefix' > asn-ranges.txt
# amass intel
amass intel -asn 13335 > asn-hosts.txt
Probe everything in the range#
# Port scan the range
sudo masscan -iL asn-ranges.txt -p80,443,8080,8443 --rate=10000 -oG masscan.out
# HTTP probe
awk '/open/{print $4}' masscan.out | httpx -silent -title -sc -ip
Reverse DNS across the range#
# PTR lookups across a CIDR
prips 192.0.2.0/24 | dnsx -ptr -resp-only
Why this works#
Corporate infra expansion happens faster than DNS hygiene. You’ll regularly find:
- Acquired companies whose old infra still runs on the parent’s ASN
- Staging servers assigned IPs but never given DNS
- Legacy admin panels on forgotten boxes
- Dev environments behind no auth, exposed via direct IP
15. Wordlist Resources#
Wordlists are the difference between finding /admin and finding /admin-backup-2019.zip. Use multiple, rotate them, and grow your own.
Core wordlist repos#
| Repo | What it contains |
|---|---|
SecLists (danielmiessler/SecLists) | The canonical source — subdomains, content, params, fuzzing, passwords, payloads |
Assetnote wordlists (wordlists.assetnote.io) | Data-mined from Common Crawl, hugely higher signal than generic lists |
| OneListForAll | Merged, deduped megalist |
| fuzzdb | Legacy but still has unique patterns |
| Jhaddix all.txt | Classic subdomain brute list |
Key SecLists files#
Discovery/DNS/
subdomains-top1million-5000.txt
subdomains-top1million-110000.txt
dns-Jhaddix.txt
bitquark-subdomains-top100000.txt
Discovery/Web-Content/
raft-small-words.txt
raft-medium-words.txt
raft-large-words.txt
common.txt
big.txt
directory-list-2.3-medium.txt
api/api-endpoints.txt
Discovery/Web-Content/CMS/
wordpress.fuzz.txt
joomla.fuzz.txt
drupal.fuzz.txt
Fuzzing/
LFI/
XSS/
SQLi/
Passwords/
Common-Credentials/
Assetnote wordlist highlights#
best-dns-wordlist.txt # 9M real subdomains, mined from CT/CC
httparchive_directories_*.txt # Directories seen in HTTP Archive
httparchive_files_*.txt # Files seen in HTTP Archive
cloud-s3-bucket-names.txt # Real S3 bucket names
parameters_top_1M.txt # Real HTTP parameters
Building your own#
After a few engagements, the best wordlist is your own cumulative corpus:
# Collect every URL you've ever crawled
cat recon/*/urls.txt | unfurl paths | awk -F/ '{for(i=2;i<=NF;i++)print $i}' \
| sort | uniq -c | sort -rn > my-dirs.txt
16. Automation Pipelines#
At some point you stop running commands and start running pipelines. Recon frameworks chain every tool above into a single invocation and output a structured asset inventory.
bbot (Black Box Operations Tool)#
bbot is the most capable modern recon framework — 80+ modules, event-driven, produces graph output.
# Full subdomain enum
bbot -t target.com -f subdomain-enum
# Web spider + HTTP probe + tech detect
bbot -t target.com -f web-basic
# Everything (expensive)
bbot -t target.com -f subdomain-enum,web-basic,cloud-enum -o bbot-out/
recon-ng#
Modular Metasploit-style recon framework with a marketplace of modules.
recon-ng
> marketplace install all
> workspaces create target
> modules load recon/domains-hosts/hackertarget
> options set SOURCE target.com
> run
ReconFTW#
All-in-one bash pipeline that wraps subfinder, amass, httpx, nuclei, ffuf, and dozens more.
./reconftw.sh -d target.com -r # recon only
./reconftw.sh -d target.com -a # full scan
GarudRecon / subdomainx / Striker / ReconDog#
Smaller curated wrappers around the ProjectDiscovery stack — useful as reference implementations when building your own.
Custom bash pipeline (minimal example)#
#!/usr/bin/env bash
set -euo pipefail
TARGET=$1
OUT=recon/$TARGET
mkdir -p "$OUT"
# 1. Subdomains
{
subfinder -d "$TARGET" -all -silent
assetfinder --subs-only "$TARGET"
chaos -d "$TARGET" -silent
curl -s "https://crt.sh/?q=%25.$TARGET&output=json" \
| jq -r '.[].name_value' 2>/dev/null | sed 's/\*\.//g'
} | sort -u > "$OUT/subs.txt"
# 2. Resolve
dnsx -l "$OUT/subs.txt" -silent > "$OUT/resolved.txt"
# 3. HTTP probe
httpx -l "$OUT/resolved.txt" -silent -title -sc -td -ip -json \
> "$OUT/httpx.json"
jq -r '.url' "$OUT/httpx.json" > "$OUT/alive.txt"
# 4. Port scan
naabu -l "$OUT/alive.txt" -top-ports 1000 -silent > "$OUT/ports.txt"
# 5. Crawl
{
katana -list "$OUT/alive.txt" -silent -d 3
cat "$OUT/subs.txt" | waybackurls
cat "$OUT/subs.txt" | gau --threads 5
} | sort -u > "$OUT/urls.txt"
# 6. JS files
grep -Ei "\.js(\?|$)" "$OUT/urls.txt" | httpx -mc 200 -silent > "$OUT/js.txt"
# 7. gf classification
for pattern in ssrf xss sqli redirect lfi idor; do
gf "$pattern" < "$OUT/urls.txt" > "$OUT/gf-$pattern.txt" || true
done
# 8. Nuclei scan
nuclei -l "$OUT/alive.txt" -severity low,medium,high,critical -silent \
-o "$OUT/nuclei.txt"
echo "Done. Results in $OUT/"
Pipeline principles#
- Idempotent — rerunning should update, not duplicate
- Resumable — each step writes a file that the next step reads
- Rate-limited — respect program rules by default
- Diff-friendly — always sort -u output so you can diff across runs
- Resource-capped — run expensive tools in parallel with
xargs -Por GNU parallel, but cap concurrency - Logged — capture stdout and stderr per-tool for later debugging
17. Continuous Monitoring#
Recon isn’t a one-shot operation. New subdomains, new endpoints, and new open ports appear daily. The hunters who earn the most set up continuous recon and get alerted when something new shows up.
Nightly diff pattern#
#!/usr/bin/env bash
TARGET=$1
TODAY=$(date +%F)
OUT=recon/$TARGET/$TODAY
PREV=$(ls -1 recon/$TARGET | grep -v $TODAY | tail -1)
mkdir -p "$OUT"
subfinder -d "$TARGET" -all -silent | sort -u > "$OUT/subs.txt"
if [[ -f "recon/$TARGET/$PREV/subs.txt" ]]; then
comm -13 "recon/$TARGET/$PREV/subs.txt" "$OUT/subs.txt" > "$OUT/new.txt"
if [[ -s "$OUT/new.txt" ]]; then
curl -X POST "$SLACK_WEBHOOK" -d "New subs for $TARGET: $(cat $OUT/new.txt)"
fi
fi
Run via cron or GitHub Actions on a schedule (hourly for active programs, daily for slow-moving targets).
What to monitor#
| Signal | Action |
|---|---|
| New subdomain | Immediate triage — often a fresh deploy with bugs |
| New open port on known host | Service enumeration, version check |
| New JS file or bundle hash change | Re-run LinkFinder, diff endpoint list |
| New nuclei finding | Triage the specific template |
| Cert transparency alert | Certstream-based live feed |
| GitHub new public repo in org | Scan for secrets |
| DNS CNAME change | Takeover check |
| HTTP response hash change on login/admin pages | Manual review |
Tools for continuous monitoring#
- axiom — distribute recon across cloud workers
- interlace — parallelize any CLI tool across a target list
- notify (ProjectDiscovery) — multi-channel output for any pipeline
- certstream — real-time CT log feed
- GitHub webhooks — push notifications on new repos/commits
Notify example#
subfinder -d target.com -all -silent \
| anew subs.txt \
| notify -silent -bulk -provider slack
anew only emits newly seen lines, so notify only fires on genuinely new subdomains.
18. Real-World Recon Wins#
Actual case studies drawn from public writeups that hinged on recon quality, not exploit cleverness.
JavaScript endpoint → $25K#
A researcher grepped an obfuscated webpack bundle and found a reference to /api/v2/internal/users. The endpoint was reachable without authentication and returned the full user database. Total time to bug: 45 minutes. The app had passed a third-party pentest six months earlier.
Lesson: always pull the source maps (.js.map) when present — they reverse the minification and hand you the original file structure.
ASN expansion → SQL injection bounty#
Hunter started with 50 in-scope subdomains. Pulled the company ASN, enumerated IP ranges via bgpview, and probed with httpx. Found 500+ live hosts including a forgotten admin panel on an unlisted IP. The panel was vulnerable to classic SQL injection. Critical-severity payout.
Lesson: scope says “*.target.com” but the program owner owns entire IP blocks — check the program rules, many allow any asset owned by the company.
S3 bucket → 2M user PII leak#
Generated bucket name permutations (company-backup, company-backups, company-backup-prod) and tested each with aws s3 ls --no-sign-request. company-backup-prod returned a directory listing. It contained a full user database dump in SQL format. $50K critical bounty.
Lesson: bucket name permutation is low-effort, high-reward. Always run it as part of initial recon.
Wayback Machine → auth bypass#
Old Wayback snapshot from 2018 showed a /debug/users?bypass=1 endpoint. The endpoint was removed from the current site but the route handler was still mounted. Hitting it directly returned the admin UI. Critical severity.
Lesson: routes outlive the UI that references them. waybackurls + httpx on every historical URL is cheap and frequently pays.
Favicon hash pivot → exposed Jenkins#
Researcher grabbed the favicon hash of the target’s build system, queried Shodan for the same hash, and found an additional Jenkins instance on a random IP outside the scope’s DNS. The instance had an anonymous build execution bug because it had never been upgraded. RCE, $20K.
Lesson: favicon pivoting finds assets that DNS never advertised.
GitHub commit history → AWS keys#
Secret scanning in a current repo showed nothing — but git log --all on the repo history showed a commit from 2021 where a dev accidentally committed .env and “deleted” it the next day. The keys still worked. Full AWS account takeover. Max-severity report.
Lesson: current-state scanning misses historical secrets. Always scan the full git history.
Subdomain permutation → internal admin#
admin.target.com was in scope. Permutation with dnsgen generated admin-legacy.target.com which resolved. It was the pre-migration admin panel, still running, still authenticating against the old LDAP, with a test account nobody had deleted. Full admin. $30K.
Lesson: dev-, -old, -legacy, -v1, -staging, -internal permutations consistently find forgotten infra.
19. Quick Reference#
Install the core stack (Go tools)#
# ProjectDiscovery toolkit
go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
go install github.com/projectdiscovery/httpx/cmd/httpx@latest
go install github.com/projectdiscovery/dnsx/cmd/dnsx@latest
go install github.com/projectdiscovery/naabu/v2/cmd/naabu@latest
go install github.com/projectdiscovery/katana/cmd/katana@latest
go install github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
go install github.com/projectdiscovery/chaos-client/cmd/chaos@latest
go install github.com/projectdiscovery/notify/cmd/notify@latest
# Other Go tools
go install github.com/tomnomnom/assetfinder@latest
go install github.com/tomnomnom/waybackurls@latest
go install github.com/tomnomnom/anew@latest
go install github.com/tomnomnom/gf@latest
go install github.com/tomnomnom/unfurl@latest
go install github.com/hakluke/hakrawler@latest
go install github.com/lc/gau/v2/cmd/gau@latest
go install github.com/ffuf/ffuf/v2@latest
go install github.com/OJ/gobuster/v3@latest
# Rust / Python
cargo install feroxbuster
pip install arjun xnLinkFinder
One-liner pipelines#
# Subs → alive → screenshots
subfinder -d target.com -all -silent \
| dnsx -silent \
| httpx -silent -screenshot
# Subs → crawl → JS → endpoints
subfinder -d target.com -all -silent \
| httpx -silent \
| katana -silent -d 3 \
| grep -E "\.js$" \
| xargs -I{} python3 linkfinder.py -i {} -o cli
# Subs → params → SSRF candidates
subfinder -d target.com -all -silent \
| httpx -silent \
| waybackurls \
| gf ssrf > ssrf-candidates.txt
# Every URL ever crawled on every sub
subfinder -d target.com -silent \
| gau --threads 10 --blacklist png,jpg,gif,css,woff \
| sort -u > urls.txt
Tool family tally#
| Family | Canonical tool(s) |
|---|---|
| Subdomain passive | subfinder, amass, assetfinder, chaos, crt.sh |
| Subdomain brute | puredns, shuffledns |
| Permutation | dnsgen, altdns, gotator |
| Resolution | dnsx, massdns |
| HTTP probe | httpx |
| Port scan | naabu, masscan, nmap, rustscan |
| Crawling | katana, hakrawler, gospider |
| Archive | waybackurls, gau |
| JS analysis | LinkFinder, xnLinkFinder, SecretFinder, jsluice |
| Content discovery | ffuf, feroxbuster, gobuster, dirsearch |
| Parameters | arjun, paramspider, x8 |
| Fingerprinting | whatweb, wappalyzer, httpx -td |
| Cloud | cloud_enum, s3scanner, CloudFail |
| GitHub | trufflehog, gitleaks, gitgot, github-subdomains |
| Vuln scan | nuclei |
| Orchestration | bbot, reconftw, recon-ng |
| Notification | notify, anew |
Recon checklist (per engagement)#
[ ] Scope captured in a file
[ ] Seed domains expanded via reverse whois / ASN
[ ] Passive subdomain enum run (subfinder, amass, chaos, crt.sh)
[ ] Active subdomain enum run (puredns brute + permutation)
[ ] All subdomains resolved via dnsx
[ ] All live hosts probed via httpx with -td
[ ] Screenshots captured for visual triage
[ ] Port scan run (naabu top-1000 minimum)
[ ] Crawl complete (katana + waybackurls + gau)
[ ] JS files extracted and mined with LinkFinder
[ ] Secrets scan run on JS
[ ] Parameter discovery run (arjun) on high-value endpoints
[ ] gf patterns applied to URL corpus
[ ] Cloud buckets permuted and tested
[ ] GitHub org scanned with trufflehog
[ ] Nuclei baseline scan run
[ ] Nightly diff pipeline set up
[ ] All raw outputs archived for future re-use
Closing Notes#
Recon compounds. Every engagement you do adds to your personal corpus — subdomains seen, parameters seen, directory names seen, tech stacks fingerprinted. The hunters at the top of the leaderboards aren’t running secret tools; they’re running the same tools on datasets refined by years of prior hunts. Start the corpus now, automate the nightly diff, and treat every new asset as a fresh mini-engagement.
The bug isn’t in the tool you ran. It’s in the asset you didn’t know existed.