Comprehensive Recon Guide

Complete reconnaissance guide enhanced with 2026 cloud-native techniques including container/serverless discovery, modern API reconnaissance, and automated attack surface mapping. Comprehensive subdomain enumeration and continuous monitoring strategies.

April 10, 2026 · 33 min · Carl Sampson

Table of Contents

Comprehensive Recon Guide
Table of Contents
1. Fundamentals
2. Scope & Target Profiling
- Scope intake checklist
- Target profiling sources
- Seed expansion
3. Subdomain Enumeration
- Passive tools
- Subfinder in depth
- Amass in depth
- Certificate transparency
- Merging sources
4. DNS Brute Force & Permutation
- puredns
- Permutation (alterations)
- Resolvers
5. Live Host Discovery & HTTP Probing
- dnsx — DNS resolution at scale
- httpx — HTTP/HTTPS probing
6. Port Scanning
- naabu — fast SYN/CONNECT scanner
- masscan — internet-scale SYN scanner
- nmap — deep service/version detection
- rustscan
- Ports worth always scanning
7. URL & Endpoint Crawling
- katana — modern active crawler
- waybackurls — historical URLs
- gau — getallurls (Wayback + CommonCrawl + OTX + URLScan)
- hakrawler
- Merging crawl sources
- gf — pattern classifier
8. JavaScript Analysis
- Extract all JS files
- LinkFinder — endpoint extraction
- xnLinkFinder
- SecretFinder — secret detection in JS
- Manual grep patterns
- JSLuice
9. Content & Directory Discovery
- ffuf — fuzzing swiss army knife
- feroxbuster
- gobuster
- Discovery strategy
10. Parameter Discovery
- Arjun
- ParamSpider
- x8
- High-signal parameter names to always test
11. Technology Fingerprinting
- httpx tech detect
- whatweb
- Wappalyzer CLI
- Custom fingerprinting via headers/favicon
- Shodan & Censys pivots
12. Cloud Asset Discovery
- S3 bucket discovery
- S3Scanner
- cloud_enum
- Azure blob storage
- GCP bucket enumeration
- CloudFail (DNS/database-based origin discovery)
- Bucket wordlists
13. GitHub & Code Leak Hunting
- GitHub dorking
- GitGot (Bishop Fox)
- trufflehog — high-entropy secret scanning
- gitleaks
- github-subdomains
- What to look for
14. ASN & Infrastructure Expansion
- Find the ASN
- Enumerate all IP ranges for an ASN
- Probe everything in the range
- Reverse DNS across the range
- Why this works
15. Container & Serverless Discovery
- Container registry enumeration
- Container image analysis
- Serverless function discovery
- Infrastructure-as-Code analysis
16. Modern API Reconnaissance
- GraphQL introspection
- OpenAPI/Swagger discovery
- gRPC service discovery
- WebSocket discovery
17. ML-Powered Automation
- ML-driven subdomain generation
- AI-powered vulnerability detection
- Continuous intelligence gathering
- Distributed reconnaissance with cloud workers
- Advanced source map analysis
18. Wordlist Resources
- Core wordlist repos
- Key SecLists files
- Assetnote wordlist highlights
- Building your own
19. Automation Pipelines
- bbot (Black Box Operations Tool)
- recon-ng
- ReconFTW
- GarudRecon / subdomainx / Striker / ReconDog
- Custom bash pipeline (minimal example)
- Pipeline principles
20. Continuous Monitoring
- Nightly diff pattern
- What to monitor
- Tools for continuous monitoring
- Notify example
21. Real-World Recon Wins
- JavaScript endpoint → $25K
- ASN expansion → SQL injection bounty
- S3 bucket → 2M user PII leak
- Wayback Machine → auth bypass
- Favicon hash pivot → exposed Jenkins
- GitHub commit history → AWS keys
- Subdomain permutation → internal admin
- 2026 Modern Infrastructure Wins
- GraphQL introspection → admin privilege escalation
- Container registry → source code exposure
- Source map leak → $35K API discovery
- Serverless function enumeration → RCE
- Certificate transparency monitoring → zero-day infrastructure
- ML-powered subdomain generation → forgotten acquisition
22. Quick Reference
- Install the core stack (Go tools)
- One-liner pipelines
- Tool family tally
- Recon checklist (per engagement)
Closing Notes

Comprehensive Recon Guide

🆕 Enhanced May 2, 2026 - Updated with cloud-native techniques, container/serverless discovery, modern API reconnaissance, and automated attack surface mapping from comprehensive 2026 research.

A practitioner’s reference for web reconnaissance — attack surface discovery, subdomain enumeration, live host probing, content discovery, JS mining, cloud asset hunting, automation, and continuous monitoring. Enhanced for 2026 with modern cloud infrastructure discovery, ML-powered automation, and API reconnaissance techniques.

Fundamentals
Scope & Target Profiling
Subdomain Enumeration
DNS Brute Force & Permutation
Live Host Discovery & HTTP Probing
Port Scanning
URL & Endpoint Crawling
JavaScript Analysis
Content & Directory Discovery
Parameter Discovery
Technology Fingerprinting
Cloud Asset Discovery
GitHub & Code Leak Hunting
ASN & Infrastructure Expansion
Container & Serverless Discovery
Modern API Reconnaissance
ML-Powered Automation
Wordlist Resources
Automation Pipelines
Continuous Monitoring
Real-World Recon Wins
Quick Reference

1. Fundamentals

Recon is 80% of offensive security. The researchers who earn six figures aren’t running more tools than everyone else — they’re running them in smarter pipelines, feeding the output of one into the next, and manually reviewing the long tail that automation misses. Every hour spent deepening the asset inventory pays off when hunting begins: more subdomains means more parameters, more endpoints, more code paths, more chances for a bug nobody else has seen.

The three classes of recon:

Class	Description	Example
Passive	No packets sent to target — only public data sources	crt.sh, Shodan, Chaos, Wayback, Google dorks
Active	Direct interaction with target infrastructure	DNS brute force, HTTP probing, port scans, content fuzzing
Semi-active	Targets third-parties that hold target data	GitHub scraping, pastebin scraping, archive.org

Passive first, then active. Passive sources give you free intel with zero detection risk and zero scope violations. Active enumeration should only begin after passive has been exhausted — you use passive subdomains as seeds for permutation, passive URLs as seeds for parameter mining, and passive tech stack data to choose the right active wordlist.

The recon pipeline (end-to-end):

Seed domains
  ↓ passive + active enumeration
Subdomains
  ↓ dnsx resolve + httpx probe
Live hosts
  ↓ naabu/masscan port scan
Open services
  ↓ katana + waybackurls + gau crawl
URLs
  ↓ unfurl / gf / LinkFinder
Parameters, endpoints, secrets
  ↓ ffuf / nuclei / manual review
Attack surface map

Mindset rules:

Scope is not a limit — it’s a filter. Always map the entire organization first, then filter to what’s in-scope.
Everything is resumable. State your recon in files, diff against yesterday, and alert on new assets.
Automation finds the obvious; manual review finds the bounty. Eyeball every new subdomain at least once.
Save raw outputs. The dataset you enumerate today is worth re-running against tomorrow’s wordlists.

2. Scope & Target Profiling

Before any enumeration, you need to know what you’re looking at and what you’re allowed to touch.

Scope intake checklist

Item	Why it matters
In-scope domains (exact vs wildcard)	Determines which subdomains are eligible
Out-of-scope carveouts	Avoid N/A submissions and bans
Allowed testing types	Active scanning forbidden on many programs
Rate limiting rules	Saves you from getting blocked mid-recon
Accepted vulnerability classes	Don’t hunt bugs that get auto-closed
Third-party service rules	SendGrid, Intercom, Zendesk often out of scope

Target profiling sources

crt.sh — certificate transparency, gives wildcard certs and sibling domains
BGPView / bgp.he.net — find company ASN and all IP ranges owned
SecurityTrails / DNSDumpster — historical DNS records
Whoxy / WhoisXML — reverse WHOIS lookup across TLDs
LinkedIn / Crunchbase — subsidiaries, acquisitions, product names that become subdomains
GitHub org page — gives you the company’s org name which feeds dorking
Trademark filings — sometimes leak internal project codenames

Seed expansion

A single apex domain is rarely the whole story. Before enumeration, expand seeds via:

## Find other domains owned by the same org
amass intel -org "Target Corp"
amass intel -whois -d target.com
amass intel -asn 13335
amass intel -cidr 192.0.2.0/24

## Reverse whois via viewdns.info or whoxy
curl -s "https://api.whoxy.com/?key=$KEY&reverse=whois&email=admin@target.com"

Feed the resulting domain list into the rest of the pipeline as a single flat file (seeds.txt).

3. Subdomain Enumeration

Subdomain enumeration is the bedrock of recon. Every additional subdomain is a new host with its own code paths, its own auth model, its own tech stack, and its own chance of being forgotten by the dev team. Treat this phase like an exhaustive search — pull from as many independent sources as possible, dedupe, and re-resolve.

Passive tools

Tool	Strengths	Command
subfinder	Fast, 30+ passive sources, API-key aware	`subfinder -d target.com -all -silent`
amass (passive)	Deepest source coverage, graph storage	`amass enum -passive -d target.com`
assetfinder	Minimal, fast, good for pipelines	`assetfinder --subs-only target.com`
chaos (ProjectDiscovery)	ProjectDiscovery’s curated dataset	`chaos -d target.com -silent`
crt.sh	CT log scraper, finds wildcard certs	`curl -s "https://crt.sh/?q=%25.target.com&output=json"`
github-subdomains	Scrapes subdomains from GitHub code search	`github-subdomains -d target.com -t $TOKEN`
bbot	Swiss army knife, 80+ modules	`bbot -t target.com -f subdomain-enum`

Subfinder in depth

subfinder is the default passive tool — it’s fast, supports API key configuration for premium sources, and outputs one subdomain per line for easy piping.

## Basic
subfinder -d target.com -silent -o subs.txt

## All sources (slower but more thorough)
subfinder -d target.com -all -recursive -silent

## From multiple domains
subfinder -dL seeds.txt -all -silent -o subs.txt

## JSON output for enrichment
subfinder -d target.com -oJ -o subs.json

## Pipe directly into httpx
subfinder -d target.com -silent | httpx -silent

Configure API keys in ~/.config/subfinder/provider-config.yaml — Chaos, SecurityTrails, Censys, Shodan, VirusTotal, GitHub, BinaryEdge, and WhoisXML all materially improve coverage.

Amass in depth

amass is slower than subfinder but pulls from more sources and supports active enumeration, alteration generation, and graph-based asset tracking.

## Passive
amass enum -passive -d target.com -o amass.txt

## Active (adds DNS brute, zone walks, name alterations)
amass enum -active -d target.com -brute -o amass.txt

## Include unresolvable (internal leak hints)
amass enum -d target.com -include-unresolvable

## Bulk scan multiple domains
amass enum -df seeds.txt -o amass-multi.txt

## Track changes over time
amass track -d target.com -dir recon-data

## Visualize as graph
amass viz -d3 -dir recon-data

## Intel pivots
amass intel -org "Target Corp"
amass intel -asn 13335
amass intel -addr 192.0.2.10
amass intel -cidr 192.0.2.0/24
amass intel -whois -d target.com

Amass supports external datasource modules via config (Netlas, SecurityTrails, Shodan, Censys) — always populate the config for real coverage.

Certificate transparency

CT logs contain every TLS certificate ever issued, which means every subdomain that ever got a valid cert. This is one of the cleanest passive sources.

## crt.sh basic
curl -s "https://crt.sh/?q=%25.target.com&output=json" \
  | jq -r '.[].name_value' \
  | sed 's/\*\.//g' \
  | sort -u > crtsh.txt

## Historical wildcards
curl -s "https://crt.sh/?q=%25.%25.target.com&output=json" | jq -r '.[].name_value'

## Find email addresses in CT
curl -s "https://crt.sh/?q=%25@target.com&output=json" | jq -r '.[].name_value'

## Alternative: Censys, Facebook CT, Google CT API

Merging sources

Always run multiple tools and merge — no single source has full coverage.

{
  subfinder -d target.com -all -silent
  assetfinder --subs-only target.com
  amass enum -passive -d target.com
  chaos -d target.com -silent
  curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g'
} | sort -u > all-subs.txt

4. DNS Brute Force & Permutation

Passive sources miss internal-only subdomains and anything that never got a public cert. Brute force and permutation fill the gap.

puredns

puredns wraps massdns with accurate wildcard filtering — the gold standard for brute forcing.

## Brute force with a wordlist
puredns bruteforce wordlist.txt target.com \
  --resolvers resolvers.txt \
  --write results.txt

## Resolve a list of guessed names
puredns resolve candidates.txt --resolvers resolvers.txt

## Resolvers file: public-dns.info/nameservers-all.txt

Permutation (alterations)

Given known subdomains, generate plausible variants and resolve them. Frequently surfaces dev-api, staging-portal, api-v2 variants that nobody lists publicly.

## dnsgen — pattern-based mutations
cat subs.txt | dnsgen - | puredns resolve -r resolvers.txt

## altdns
altdns -i subs.txt -o altdns.txt -w words.txt -r -s results.txt

## gotator — pattern permutator
gotator -sub subs.txt -perm words.txt -depth 1 -numbers 5 | puredns resolve

Good permutation wordlists: best-dns-wordlist.txt from Assetnote, dnsgen built-ins, and a custom list of your target’s product names.

Resolvers

Always use a curated, validated resolver list. Bad resolvers cause false positives.

## Fetch and validate resolvers
wget https://raw.githubusercontent.com/trickest/resolvers/main/resolvers.txt
dnsvalidator -tL resolvers.txt -threads 200 -o validated.txt

5. Live Host Discovery & HTTP Probing

A subdomain list is useless until you know which hosts are live and what they serve.

dnsx — DNS resolution at scale

## Resolve and drop dead entries
cat subs.txt | dnsx -silent -a -resp > resolved.txt

## Only return live subdomains
dnsx -l subs.txt -silent > live-dns.txt

## Wildcard detection
dnsx -l subs.txt -wd target.com -silent

## Retrieve CNAME chain (finds takeover candidates)
dnsx -l subs.txt -cname -resp -silent

## Grab multiple record types
dnsx -l subs.txt -a -aaaa -cname -mx -ns -txt -silent -json

httpx — HTTP/HTTPS probing

httpx is the bridge between DNS and HTTP-layer recon — it probes, fingerprints, and grabs metadata in a single pass.

## Basic alive check
cat live-dns.txt | httpx -silent > alive.txt

## Rich metadata (title, status, tech, IP, CDN)
httpx -l live-dns.txt -title -sc -td -ip -cdn -server -silent -o probe.txt

## Multiple ports
httpx -l live-dns.txt -p 80,443,8080,8443,8000,3000,5000,9000 -silent

## Screenshot every live host
httpx -l live-dns.txt -screenshot -silent

## Filter by status code
httpx -l live-dns.txt -mc 200,301,302,401,403 -silent

## Match content type
httpx -l live-dns.txt -mct "application/json"

## Tech detection (Wappalyzer dataset)
httpx -l live-dns.txt -td -silent -json | jq '.tech'

## Favicon hash for clustering (mmh3)
httpx -l live-dns.txt -favicon -silent

The favicon hash is particularly useful — Shodan indexes favicon hashes, so if httpx returns a hash you can pivot in Shodan to find every other host on the internet serving the same application.

6. Port Scanning

Web apps on port 80/443 are the obvious targets, but devs love to run admin panels, debug dashboards, and internal APIs on unusual ports.

naabu — fast SYN/CONNECT scanner

## Top 1000 ports
naabu -l alive.txt -top-ports 1000 -silent

## Full range
naabu -host target.com -p - -rate 5000

## Pipe into httpx
naabu -l alive.txt -top-ports 1000 -silent | httpx -silent

masscan — internet-scale SYN scanner

## Full port sweep on a CIDR
sudo masscan -p1-65535 192.0.2.0/24 --rate=10000 -oG masscan.out

## Specific high-value ports across large ranges
sudo masscan -p22,80,443,3306,5432,6379,8080,8443,9200,27017 \
  192.0.2.0/16 --rate=20000

nmap — deep service/version detection

After masscan/naabu give you the open ports, use nmap for service detection.

## Service version + default scripts on discovered ports
nmap -sV -sC -p 22,80,443,8080 -iL hosts.txt -oA nmap-scan

## Top ports with aggressive OS detection
nmap -A -T4 --top-ports 1000 -iL hosts.txt

## Vulnerability scripts
nmap -sV --script vuln -iL hosts.txt

rustscan

rustscan is a fast SYN scanner that pipes directly into nmap for service detection — good for single-target deep dives.

rustscan -a target.com --ulimit 5000 -- -sV -sC

Ports worth always scanning

22    SSH
80    HTTP
443   HTTPS
2375  Docker API (unauthenticated)
3000  Grafana, Node dev
3306  MySQL
3389  RDP
5000  Flask dev
5432  Postgres
5601  Kibana
6379  Redis
7001  WebLogic
8000  HTTP alt
8080  HTTP alt / Jenkins
8443  HTTPS alt
8888  Jupyter
9000  SonarQube / PHP-FPM
9200  Elasticsearch
9090  Prometheus
11211 Memcached
15672 RabbitMQ management
27017 MongoDB

7. URL & Endpoint Crawling

URLs are where the vulnerabilities live. You want three sources running in parallel: a passive archive (Wayback / Common Crawl), an active crawler (katana), and a URL extractor from JS.

katana — modern active crawler

## Standard crawl
katana -u https://target.com -d 5 -silent

## From a list
katana -list alive.txt -d 3 -silent -o urls.txt

## Headless with JS rendering
katana -u https://target.com -headless -system-chrome -silent

## Respect robots.txt, follow same-host only
katana -u https://target.com -iqp -fs rdn -silent

## Output JSON with request bodies
katana -u https://target.com -jc -silent -o urls.json

waybackurls — historical URLs

## Basic
echo target.com | waybackurls > wayback.txt

## From all subdomains
cat subs.txt | waybackurls | sort -u > wayback.txt

gau — getallurls (Wayback + CommonCrawl + OTX + URLScan)

echo target.com | gau --threads 10 > gau.txt

## Filter by extension
echo target.com | gau --blacklist png,jpg,gif,css,woff | sort -u

hakrawler

echo https://target.com | hakrawler -d 3 -subs

Merging crawl sources

{
  cat alive.txt | katana -silent -d 3
  cat subs.txt | waybackurls
  cat subs.txt | gau --threads 5
} | sort -u > all-urls.txt

## Filter to interesting URLs
cat all-urls.txt | grep -E "\.(json|xml|js|php|aspx|jsp|env|bak|config|sql)$"
cat all-urls.txt | grep -E "api/|admin|internal|debug|swagger|graphql"

gf — pattern classifier

gf applies named regex patterns to URL lists to quickly bucket potential bug candidates.

cat all-urls.txt | gf ssrf     > ssrf-candidates.txt
cat all-urls.txt | gf xss      > xss-candidates.txt
cat all-urls.txt | gf sqli     > sqli-candidates.txt
cat all-urls.txt | gf redirect > redirect-candidates.txt
cat all-urls.txt | gf lfi      > lfi-candidates.txt
cat all-urls.txt | gf idor     > idor-candidates.txt

8. JavaScript Analysis

Modern apps hide half their API surface inside JavaScript bundles. Every JS file is a map of internal endpoints, buried parameters, legacy routes, hardcoded tokens, and feature flags.

Extract all JS files

## From Wayback
echo target.com | waybackurls | grep -Ei "\.js(\?|$)" | sort -u > js.txt

## From katana
katana -u https://target.com -silent | grep -Ei "\.js(\?|$)" | sort -u >> js.txt

## Verify alive
cat js.txt | httpx -mc 200 -silent > live-js.txt

LinkFinder — endpoint extraction

## Single file
python3 linkfinder.py -i https://target.com/app.js -o cli

## Batch
while read url; do
  python3 linkfinder.py -i "$url" -o cli
done < live-js.txt | sort -u > endpoints.txt

xnLinkFinder

xnLinkFinder is LinkFinder’s successor — it handles minified bundles, recursive crawls, and depth-based discovery.

xnLinkFinder -i live-js.txt -sf target.com -d 3 -o endpoints.txt

SecretFinder — secret detection in JS

python3 SecretFinder.py -i https://target.com/app.js -o cli

while read url; do
  python3 SecretFinder.py -i "$url" -o cli 2>/dev/null
done < live-js.txt | tee secrets.txt

Manual grep patterns

Automation misses clever obfuscation. Fetch the JS and grep for the classics:

curl -s https://target.com/app.js | grep -oE \
  "(api[_-]?key|apikey|secret|token|password|passwd|bearer|aws_access|private_key)[\"':= ]+[a-zA-Z0-9/+=_-]{16,}"

## Hardcoded IPs and internal hosts
curl -s https://target.com/app.js | grep -oE "(https?://)?[a-z0-9.-]*\.(internal|corp|local|intranet)"

## Feature flags
curl -s https://target.com/app.js | grep -oE '"[a-z_]*_(enabled|flag|debug)"'

## Route definitions
curl -s https://target.com/app.js | grep -oE '"/api/[a-zA-Z0-9/_-]+"'

JSLuice

JSLuice is a newer tool that parses JS with a proper AST rather than regex, catching dynamic route construction that regex-based tools miss.

cat live-js.txt | while read url; do
  curl -s "$url" | jsluice urls
done | sort -u > jsluice-urls.txt

9. Content & Directory Discovery

Directory brute forcing finds the files that aren’t linked anywhere — backups, configs, admin pages, staging artifacts, .git directories, and the /old/ folder devs promised to delete.

ffuf — fuzzing swiss army knife

## Basic directory fuzz
ffuf -u https://target.com/FUZZ -w wordlist.txt -mc 200,301,302,403 -o ffuf.json

## With extensions
ffuf -u https://target.com/FUZZ -w raft-medium-words.txt \
  -e .php,.bak,.old,.zip,.tar.gz,.env,.config \
  -mc 200,301,302,403

## Virtual host fuzzing
ffuf -u https://target.com -H "Host: FUZZ.target.com" -w subs.txt -fs 1234

## Recursive
ffuf -u https://target.com/FUZZ -w wordlist.txt -recursion -recursion-depth 2

## Parameter fuzzing
ffuf -u https://target.com/api?FUZZ=test -w params.txt -fs 0

## Rate limited
ffuf -u https://target.com/FUZZ -w wordlist.txt -rate 50

feroxbuster

Recursive content discoverer with smart filtering — great for deep dives.

feroxbuster -u https://target.com -w wordlist.txt -x php,html,txt,bak -d 3

## From a list of targets
feroxbuster --stdin -w wordlist.txt < alive.txt

## Filter by response size
feroxbuster -u https://target.com -w raft-large.txt -S 0,1234

gobuster

gobuster dir -u https://target.com -w wordlist.txt -x php,html,txt
gobuster vhost -u https://target.com -w subs.txt
gobuster dns -d target.com -w dns-wordlist.txt

Discovery strategy

Start small — run raft-small-words.txt before raft-large. Calibrates response baselines.
Chain extensions — if you get hits at /backup, re-fuzz with backup-specific extensions.
Recurse manually — only recurse into directories that return 200/301, not 403/404.
Auto-calibrate — always use -ac in ffuf or filter by response length to dodge soft-404s.
Match response words — -fw in ffuf filters by word count, often tighter than length.

10. Parameter Discovery

Hidden parameters are one of the highest-ROI recon findings — they unlock debug modes, admin toggles, SSRF sinks, and auth bypass.

Arjun

## Basic
arjun -u https://target.com/api/users

## With wordlist
arjun -u https://target.com/api/users -w params.txt

## From URL list
arjun -i urls.txt

## Methods
arjun -u https://target.com/api -m GET,POST

## Output
arjun -u https://target.com -oJ arjun.json

ParamSpider

python3 paramspider.py -d target.com -o params.txt

x8

x8 is a fast parameter miner written in Rust with a large built-in wordlist.

x8 -u https://target.com/api -w params.txt

High-signal parameter names to always test

debug, test, admin, internal, trace, verbose
callback, jsonp, returnUrl, redirect, next, url, uri, path, file
id, userId, user_id, account, tenant, org, orgId
template, view, include, load, src, source, dest, target, action, cmd
token, auth, api_key, apikey, access_token, sso

11. Technology Fingerprinting

Knowing the stack narrows your attack surface — CVEs, default paths, framework-specific tricks, and auth bypass tricks all depend on what’s running.

httpx tech detect

httpx -l alive.txt -td -server -title -sc -silent -json | jq '.tech'

whatweb

whatweb -a 3 https://target.com
whatweb -i alive.txt --log-json=whatweb.json

Wappalyzer CLI

wappalyzer https://target.com

Custom fingerprinting via headers/favicon

Indicator	What it reveals
`X-Powered-By`	Framework (Express, PHP version, ASP.NET)
`Server`	Web server (nginx version, Apache, IIS)
`Set-Cookie` names	`PHPSESSID`, `JSESSIONID`, `XSRF-TOKEN`, `laravel_session`
`X-Generator`	CMS (Drupal, WordPress, TYPO3)
Favicon mmh3 hash	Pivot across Shodan to find every similar deploy
`/robots.txt`	Exposed paths, site generator hints
`/sitemap.xml`	Content structure
`/security.txt`	Bug bounty program contact
Error page fingerprints	Stack traces leak versions

Shodan & Censys pivots

After fingerprinting, pivot via Shodan to find every host on the internet running the same application.

## Shodan by favicon hash
shodan search "http.favicon.hash:-1234567890"

## By HTTP title
shodan search 'http.title:"Jenkins"'

## By SSL cert subject
shodan search 'ssl.cert.subject.cn:target.com'

## Censys
censys search 'services.tls.certificates.leaf_data.subject.common_name: target.com'

12. Cloud Asset Discovery

Cloud storage misconfigurations remain one of the fastest paths to a critical-severity finding — exposed S3 buckets, readable Azure blobs, and world-writable GCS objects are still common.

S3 bucket discovery

## Generate permutations
cat <<EOF > bucket-perms.txt
target
target-dev
target-prod
target-staging
target-backup
target-backups
target-assets
target-media
target-uploads
target-logs
target-data
target-db
target-internal
target-private
target-public
target-test
target-qa
EOF

## Test each
while read b; do
  aws s3 ls "s3://$b" --no-sign-request 2>&1 | grep -v NoSuchBucket | grep -v AllAccessDisabled
done < bucket-perms.txt

S3Scanner

s3scanner scan --bucket-file bucket-perms.txt
s3scanner scan --bucket target-backups --dump

cloud_enum

Covers S3, Azure, and GCS in one pass.

python3 cloud_enum.py -k target -k targetcorp -k target-internal

Azure blob storage

## Azure blob URL pattern
https://<storage-account>.blob.core.windows.net/<container>/<blob>

## Enumerate containers
curl -s "https://target.blob.core.windows.net/?comp=list" | xmllint --format -

## List blobs in a container
curl -s "https://target.blob.core.windows.net/container?restype=container&comp=list"

GCP bucket enumeration

## Public read check
curl -s "https://storage.googleapis.com/storage/v1/b/target-bucket" | jq .

## List objects
curl -s "https://storage.googleapis.com/storage/v1/b/target-bucket/o" | jq .

CloudFail (DNS/database-based origin discovery)

CloudFail pulls old DNS records and database leaks to bypass CloudFlare and find the real origin IP of a proxied site.

python3 cloudfail.py -t target.com

Bucket wordlists

Assetnote wordlists: wordlists.assetnote.io — cloud-s3-bucket-names.txt
SecLists: Discovery/Cloud/

13. GitHub & Code Leak Hunting

GitHub is a minefield of hardcoded credentials. The org’s public repos are the obvious place, but the real gold is in employee personal repos and in deleted-but-not-purged commits.

GitHub dorking

org:target password
org:target api_key
org:target aws_access_key_id
org:target BEGIN RSA
org:target smtp
org:target "internal-api"
org:target filename:.env
org:target filename:config
org:target extension:sql
org:target extension:pem

GitGot (Bishop Fox)

Semi-automated, feedback-driven code search — suppresses already-reviewed hits so you focus on new matches.

gitgot -q target.com
gitgot -q "target api_key" -o gitgot-results.json

trufflehog — high-entropy secret scanning

## Scan a repo
trufflehog git https://github.com/target/repo

## Scan an org
trufflehog github --org=target --token=$GITHUB_TOKEN

## Verified secrets only (no false positives)
trufflehog github --org=target --only-verified

gitleaks

gitleaks detect --source . --report-format json --report-path gitleaks.json
gitleaks detect --source https://github.com/target/repo

github-subdomains

Scrapes GitHub code for mentions of subdomains — a passive subdomain source most hunters skip.

github-subdomains -d target.com -t $GITHUB_TOKEN -o gh-subs.txt

What to look for

.env files (check for AWS, Stripe, Twilio, SendGrid keys)
config.yml / config.json / settings.py with DB connection strings
Dockerfile with ARG values hardcoding secrets
CI/CD YAML files (GitHub Actions, CircleCI) with plaintext tokens
Private keys in commit history
References to internal hostnames (*.corp.target.com)
Historical commits — secrets are often “fixed” in a later commit but still in history

14. ASN & Infrastructure Expansion

Most hunters stop at the subdomain list. Elite hunters expand into IP space and find the forgotten servers that nobody maps to DNS anymore.

Find the ASN

## From a known IP
whois -h whois.cymru.com 192.0.2.10

## Via bgpview
curl -s "https://api.bgpview.io/ip/192.0.2.10" | jq .

## Interactive: bgp.he.net

Enumerate all IP ranges for an ASN

## bgpview
curl -s "https://api.bgpview.io/asn/AS13335/prefixes" \
  | jq -r '.data.ipv4_prefixes[].prefix' > asn-ranges.txt

## amass intel
amass intel -asn 13335 > asn-hosts.txt

Probe everything in the range

## Port scan the range
sudo masscan -iL asn-ranges.txt -p80,443,8080,8443 --rate=10000 -oG masscan.out

## HTTP probe
awk '/open/{print $4}' masscan.out | httpx -silent -title -sc -ip

Reverse DNS across the range

## PTR lookups across a CIDR
prips 192.0.2.0/24 | dnsx -ptr -resp-only

Why this works

Corporate infra expansion happens faster than DNS hygiene. You’ll regularly find:

Acquired companies whose old infra still runs on the parent’s ASN
Staging servers assigned IPs but never given DNS
Legacy admin panels on forgotten boxes
Dev environments behind no auth, exposed via direct IP

15. Container & Serverless Discovery

Modern cloud infrastructure heavily relies on containers and serverless functions, creating new attack surfaces that traditional reconnaissance misses. These services often expose APIs, internal naming conventions, and forgotten development environments.

Container registry enumeration

Container registries frequently expose public repositories containing internal applications and their configurations.

## Docker Hub public repositories
curl -s "https://hub.docker.com/v2/repositories/target/?page_size=100" | jq '.results[].name'

## Amazon ECR public gallery
aws ecr-public describe-repositories --region us-east-1 --output table

## Google Container Registry
gcloud container images list --repository=gcr.io/target-project

## Azure Container Registry
az acr repository list --name targetregistry --output table

## Custom registry enumeration
curl -s https://registry.target.com/v2/_catalog | jq '.repositories'

Container image analysis

## Pull and analyze container images
docker pull target/app:latest
docker history target/app:latest

## Extract filesystem without running
docker create --name temp target/app:latest
docker export temp | tar -tv | grep -E "\.env|config|secret"
docker rm temp

## Dive - analyze image layers
dive target/app:latest

## Trivy - vulnerability and secret scanning
trivy image target/app:latest

Serverless function discovery

AWS Lambda functions, Azure Functions, and Google Cloud Functions often have predictable naming patterns and may be publicly accessible.

## AWS Lambda function enumeration
aws lambda list-functions --region us-east-1 --output table

## Generate Lambda function name permutations
cat <<EOF > lambda-names.txt
target-api
target-webhook
target-auth
target-processor
target-dev
target-staging
target-prod
EOF

## Test Lambda function URLs (if enabled)
while read name; do
  curl -s "https://$name.lambda-url.us-east-1.on.aws/"
done < lambda-names.txt

## Azure Function Apps
az functionapp list --output table

## Google Cloud Functions
gcloud functions list --region=us-central1

## API Gateway discovery for serverless backends
aws apigateway get-rest-apis --region us-east-1

Infrastructure-as-Code analysis

## GitHub repository search for IaC files
gh search code --owner=target "filename:terraform" OR "filename:cloudformation"
gh search code --owner=target "resource \"aws_" OR "resource \"google_" OR "resource \"azurerm_"

## Extract resource names from Terraform
grep -r "resource \"" . | grep -E "(aws_|google_|azurerm_)" | cut -d'"' -f4

## CloudFormation template analysis
aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE UPDATE_COMPLETE

## Kubernetes manifest discovery
kubectl get all --all-namespaces
kubectl get ingress --all-namespaces
kubectl describe configmaps --all-namespaces | grep -A5 -B5 "target"

16. Modern API Reconnaissance

API discovery has evolved beyond simple REST endpoints. Modern applications use GraphQL, gRPC, WebSockets, and complex API gateways that require specialized reconnaissance techniques.

GraphQL introspection

GraphQL introspection remains enabled in 70% of production environments, providing complete schema visibility.

## Basic introspection query
curl -X POST https://target.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "{ __schema { types { name description } } }"}'

## Get all queries and mutations
curl -X POST https://target.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "{ __schema { queryType { fields { name description args { name type { name } } } } } }"}'

## GraphQL Voyager for schema visualization
## Visit: https://ivangoncharov.github.io/graphql-voyager/
## Or use GraphiQL: https://github.com/graphql/graphiql

## Batch introspection with graphql-introspect
npm install -g graphql-introspect
graphql-introspect https://target.com/graphql > schema.json

## Common GraphQL endpoints
/graphql
/graphiql
/api/graphql
/v1/graphql
/api/v1/graphql

OpenAPI/Swagger discovery

## Common Swagger/OpenAPI paths
curl -s https://target.com/swagger.json
curl -s https://target.com/api-docs
curl -s https://target.com/openapi.json
curl -s https://target.com/docs/swagger.json
curl -s https://target.com/api/v1/swagger.json
curl -s https://target.com/swagger-ui/
curl -s https://target.com/redoc/

## Extract endpoints from OpenAPI spec
curl -s https://target.com/openapi.json | jq '.paths | keys[]'

## Swagger Codegen to generate client libraries
swagger-codegen generate -i https://target.com/swagger.json -l html2 -o swagger-docs

## openapi-directory search for known APIs
git clone https://github.com/APIs-guru/openapi-directory
grep -r "target.com" openapi-directory/

gRPC service discovery

## gRPC reflection (if enabled)
grpcurl -plaintext target.com:50051 list
grpcurl -plaintext target.com:50051 list target.Service
grpcurl -plaintext target.com:50051 describe target.Service.Method

## gRPC-Web detection
curl -s https://target.com/api/grpc -H "Content-Type: application/grpc-web-text"

## Extract protobuf definitions
protoc --decode_raw < binary_message.bin

## Common gRPC ports
50051, 9090, 8080, 443 (gRPC-Web)

WebSocket discovery

## WebSocket endpoint discovery
echo "ws://target.com:8080/ws" | websocat -n1
echo "wss://target.com/websocket" | websocat -n1

## Common WebSocket paths
/ws
/websocket
/socket.io/
/sockjs-node/
/live
/updates
/notifications

## Extract WebSocket endpoints from JavaScript
grep -r "WebSocket\|socket\.io" js-files/ | grep -oE "wss?://[^\"']+"

17. ML-Powered Automation

Machine learning and AI are transforming reconnaissance by identifying patterns, generating intelligent wordlists, and reducing false positives. Modern reconnaissance leverages these capabilities for enhanced discovery.

ML-driven subdomain generation

## Pattern-based subdomain generation using neural networks
## This represents the concept - actual implementation varies
python3 ml_subdomain_generator.py --domain target.com --patterns discovered_subdomains.txt

## GPT-based subdomain suggestions (conceptual)
## Extract patterns from known subdomains and generate likely candidates
cat known_subs.txt | python3 gpt_subdomain_suggester.py --model gpt-4

## Anomaly detection for interesting subdomains
python3 subdomain_anomaly_detector.py --input all_subs.txt --threshold 0.8

AI-powered vulnerability detection

## Nuclei v3 with ML-based template matching
nuclei -l targets.txt -t . -severity low,medium,high,critical -ai-powered

## Custom ML models for false positive reduction
nuclei -l targets.txt -t custom-templates/ -ml-filter confidence=0.7

## Dynamic payload generation based on application context
nuclei -l targets.txt -dynamic-payloads -context-aware

Continuous intelligence gathering

## Real-time certificate transparency monitoring
certstream-monitor --domain target.com --webhook https://your-webhook.com/ct-alerts

## Automated asset correlation and risk scoring
python3 asset_correlator.py --domain target.com --risk-threshold high

## ML-based threat intelligence integration
python3 threat_intel_correlator.py --assets assets.json --sources virustotal,shodan,censys

Distributed reconnaissance with cloud workers

## Axiom - distributed reconnaissance platform
axiom-scan targets.txt -m subfinder -o subfinder-results
axiom-scan live-targets.txt -m httpx --screenshot -o httpx-results

## Custom cloud worker deployment
## Deploy reconnaissance workers across multiple regions for speed and stealth
terraform apply -var="regions=['us-east-1','eu-west-1','ap-southeast-1']"

## Distributed port scanning
masscan-distributed --targets targets.txt --ports 1-65535 --workers 10

Advanced source map analysis

Source maps leak unminified code in 40% of modern applications, revealing internal structure and API endpoints.

## Source map discovery
find . -name "*.js.map"
curl -s https://target.com/static/js/main.js.map

## Source map extraction and analysis
source-map-resolve main.js.map > original-sources/

## Extract API endpoints from unminified code
grep -r "api\/" original-sources/ | grep -oE "api/[a-zA-Z0-9/_-]+"

## Progressive Web App manifest analysis
curl -s https://target.com/manifest.json | jq '.'

## Service worker discovery and analysis
curl -s https://target.com/sw.js | grep -oE "fetch\('[^']+'"

18. Wordlist Resources

Wordlists are the difference between finding /admin and finding /admin-backup-2019.zip. Use multiple, rotate them, and grow your own.

Core wordlist repos

Repo	What it contains
SecLists (`danielmiessler/SecLists`)	The canonical source — subdomains, content, params, fuzzing, passwords, payloads
Assetnote wordlists (`wordlists.assetnote.io`)	Data-mined from Common Crawl, hugely higher signal than generic lists
OneListForAll	Merged, deduped megalist
fuzzdb	Legacy but still has unique patterns
Jhaddix all.txt	Classic subdomain brute list

Key SecLists files

Discovery/DNS/
  subdomains-top1million-5000.txt
  subdomains-top1million-110000.txt
  dns-Jhaddix.txt
  bitquark-subdomains-top100000.txt

Discovery/Web-Content/
  raft-small-words.txt
  raft-medium-words.txt
  raft-large-words.txt
  common.txt
  big.txt
  directory-list-2.3-medium.txt
  api/api-endpoints.txt

Discovery/Web-Content/CMS/
  wordpress.fuzz.txt
  joomla.fuzz.txt
  drupal.fuzz.txt

Fuzzing/
  LFI/
  XSS/
  SQLi/

Passwords/
  Common-Credentials/

Assetnote wordlist highlights

best-dns-wordlist.txt              # 9M real subdomains, mined from CT/CC
httparchive_directories_*.txt      # Directories seen in HTTP Archive
httparchive_files_*.txt            # Files seen in HTTP Archive
cloud-s3-bucket-names.txt          # Real S3 bucket names
parameters_top_1M.txt              # Real HTTP parameters

Building your own

After a few engagements, the best wordlist is your own cumulative corpus:

## Collect every URL you've ever crawled
cat recon/*/urls.txt | unfurl paths | awk -F/ '{for(i=2;i<=NF;i++)print $i}' \
  | sort | uniq -c | sort -rn > my-dirs.txt

19. Automation Pipelines

At some point you stop running commands and start running pipelines. Recon frameworks chain every tool above into a single invocation and output a structured asset inventory.

bbot (Black Box Operations Tool)

bbot is the most capable modern recon framework — 80+ modules, event-driven, produces graph output.

## Full subdomain enum
bbot -t target.com -f subdomain-enum

## Web spider + HTTP probe + tech detect
bbot -t target.com -f web-basic

## Everything (expensive)
bbot -t target.com -f subdomain-enum,web-basic,cloud-enum -o bbot-out/

recon-ng

Modular Metasploit-style recon framework with a marketplace of modules.

recon-ng
> marketplace install all
> workspaces create target
> modules load recon/domains-hosts/hackertarget
> options set SOURCE target.com
> run

ReconFTW

All-in-one bash pipeline that wraps subfinder, amass, httpx, nuclei, ffuf, and dozens more.

./reconftw.sh -d target.com -r    # recon only
./reconftw.sh -d target.com -a    # full scan

GarudRecon / subdomainx / Striker / ReconDog

Smaller curated wrappers around the ProjectDiscovery stack — useful as reference implementations when building your own.

Custom bash pipeline (minimal example)

#!/usr/bin/env bash
set -euo pipefail
TARGET=$1
OUT=recon/$TARGET
mkdir -p "$OUT"

## 1. Subdomains
{
  subfinder -d "$TARGET" -all -silent
  assetfinder --subs-only "$TARGET"
  chaos -d "$TARGET" -silent
  curl -s "https://crt.sh/?q=%25.$TARGET&output=json" \
    | jq -r '.[].name_value' 2>/dev/null | sed 's/\*\.//g'
} | sort -u > "$OUT/subs.txt"

## 2. Resolve
dnsx -l "$OUT/subs.txt" -silent > "$OUT/resolved.txt"

## 3. HTTP probe
httpx -l "$OUT/resolved.txt" -silent -title -sc -td -ip -json \
  > "$OUT/httpx.json"
jq -r '.url' "$OUT/httpx.json" > "$OUT/alive.txt"

## 4. Port scan
naabu -l "$OUT/alive.txt" -top-ports 1000 -silent > "$OUT/ports.txt"

## 5. Crawl
{
  katana -list "$OUT/alive.txt" -silent -d 3
  cat "$OUT/subs.txt" | waybackurls
  cat "$OUT/subs.txt" | gau --threads 5
} | sort -u > "$OUT/urls.txt"

## 6. JS files
grep -Ei "\.js(\?|$)" "$OUT/urls.txt" | httpx -mc 200 -silent > "$OUT/js.txt"

## 7. gf classification
for pattern in ssrf xss sqli redirect lfi idor; do
  gf "$pattern" < "$OUT/urls.txt" > "$OUT/gf-$pattern.txt" || true
done

## 8. Nuclei scan
nuclei -l "$OUT/alive.txt" -severity low,medium,high,critical -silent \
  -o "$OUT/nuclei.txt"

echo "Done. Results in $OUT/"

Pipeline principles

Idempotent — rerunning should update, not duplicate
Resumable — each step writes a file that the next step reads
Rate-limited — respect program rules by default
Diff-friendly — always sort -u output so you can diff across runs
Resource-capped — run expensive tools in parallel with xargs -P or GNU parallel, but cap concurrency
Logged — capture stdout and stderr per-tool for later debugging

20. Continuous Monitoring

Recon isn’t a one-shot operation. New subdomains, new endpoints, and new open ports appear daily. The hunters who earn the most set up continuous recon and get alerted when something new shows up.

Nightly diff pattern

#!/usr/bin/env bash
TARGET=$1
TODAY=$(date +%F)
OUT=recon/$TARGET/$TODAY
PREV=$(ls -1 recon/$TARGET | grep -v $TODAY | tail -1)

mkdir -p "$OUT"
subfinder -d "$TARGET" -all -silent | sort -u > "$OUT/subs.txt"

if [[ -f "recon/$TARGET/$PREV/subs.txt" ]]; then
  comm -13 "recon/$TARGET/$PREV/subs.txt" "$OUT/subs.txt" > "$OUT/new.txt"
  if [[ -s "$OUT/new.txt" ]]; then
    curl -X POST "$SLACK_WEBHOOK" -d "New subs for $TARGET: $(cat $OUT/new.txt)"
  fi
fi

Run via cron or GitHub Actions on a schedule (hourly for active programs, daily for slow-moving targets).

What to monitor

Signal	Action
New subdomain	Immediate triage — often a fresh deploy with bugs
New open port on known host	Service enumeration, version check
New JS file or bundle hash change	Re-run LinkFinder, diff endpoint list
New nuclei finding	Triage the specific template
Cert transparency alert	Certstream-based live feed
GitHub new public repo in org	Scan for secrets
DNS CNAME change	Takeover check
HTTP response hash change on login/admin pages	Manual review

Tools for continuous monitoring

axiom — distribute recon across cloud workers
interlace — parallelize any CLI tool across a target list
notify (ProjectDiscovery) — multi-channel output for any pipeline
certstream — real-time CT log feed
GitHub webhooks — push notifications on new repos/commits

Notify example

subfinder -d target.com -all -silent \
  | anew subs.txt \
  | notify -silent -bulk -provider slack

anew only emits newly seen lines, so notify only fires on genuinely new subdomains.

21. Real-World Recon Wins

Actual case studies drawn from public writeups that hinged on recon quality, not exploit cleverness.

JavaScript endpoint → $25K

A researcher grepped an obfuscated webpack bundle and found a reference to /api/v2/internal/users. The endpoint was reachable without authentication and returned the full user database. Total time to bug: 45 minutes. The app had passed a third-party pentest six months earlier.

Lesson: always pull the source maps (.js.map) when present — they reverse the minification and hand you the original file structure.

ASN expansion → SQL injection bounty

Hunter started with 50 in-scope subdomains. Pulled the company ASN, enumerated IP ranges via bgpview, and probed with httpx. Found 500+ live hosts including a forgotten admin panel on an unlisted IP. The panel was vulnerable to classic SQL injection. Critical-severity payout.

Lesson: scope says “*.target.com” but the program owner owns entire IP blocks — check the program rules, many allow any asset owned by the company.

S3 bucket → 2M user PII leak

Generated bucket name permutations (company-backup, company-backups, company-backup-prod) and tested each with aws s3 ls --no-sign-request. company-backup-prod returned a directory listing. It contained a full user database dump in SQL format. $50K critical bounty.

Lesson: bucket name permutation is low-effort, high-reward. Always run it as part of initial recon.

Wayback Machine → auth bypass

Old Wayback snapshot from 2018 showed a /debug/users?bypass=1 endpoint. The endpoint was removed from the current site but the route handler was still mounted. Hitting it directly returned the admin UI. Critical severity.

Lesson: routes outlive the UI that references them. waybackurls + httpx on every historical URL is cheap and frequently pays.

Favicon hash pivot → exposed Jenkins

Researcher grabbed the favicon hash of the target’s build system, queried Shodan for the same hash, and found an additional Jenkins instance on a random IP outside the scope’s DNS. The instance had an anonymous build execution bug because it had never been upgraded. RCE, $20K.

Lesson: favicon pivoting finds assets that DNS never advertised.

GitHub commit history → AWS keys

Secret scanning in a current repo showed nothing — but git log --all on the repo history showed a commit from 2021 where a dev accidentally committed .env and “deleted” it the next day. The keys still worked. Full AWS account takeover. Max-severity report.

Lesson: current-state scanning misses historical secrets. Always scan the full git history.

Subdomain permutation → internal admin

admin.target.com was in scope. Permutation with dnsgen generated admin-legacy.target.com which resolved. It was the pre-migration admin panel, still running, still authenticating against the old LDAP, with a test account nobody had deleted. Full admin. $30K.

Lesson: dev-, -old, -legacy, -v1, -staging, -internal permutations consistently find forgotten infra.

2026 Modern Infrastructure Wins

The following discoveries highlight how modern reconnaissance techniques uncover new attack surfaces in cloud-native environments.

GraphQL introspection → admin privilege escalation

Researcher discovered a GraphQL endpoint through API documentation fuzzing at /api/graphql. Introspection queries revealed a hidden promoteToAdmin mutation not exposed in the public schema. The mutation lacked proper authorization checks. $40K critical bounty.

Lesson: GraphQL introspection often reveals administrative mutations hidden from public documentation. Always test discovered mutations with low-privilege accounts.

Container registry → source code exposure

Docker Hub enumeration revealed public repositories under the company’s organization containing development images. One image included the entire application source code with hardcoded database credentials and API keys. Critical data exposure bounty.

Lesson: Container registries are often overlooked but frequently contain sensitive development artifacts. Always enumerate public repositories and analyze image layers.

Source map leak → $35K API discovery

Application used webpack source maps in production. Downloading the .js.map file revealed unminified code containing 200+ internal API endpoints not discoverable through traditional crawling. Several endpoints had IDOR vulnerabilities. Total payout: $35K across multiple findings.

Lesson: Source maps are goldmines for API discovery. They reveal the complete application structure developers never intended to expose.

Serverless function enumeration → RCE

Lambda function name generation based on discovered patterns (company-api-{env}-{service}) led to discovering an unauthenticated function processing user uploads. The function was vulnerable to command injection through filename parsing. $60K RCE bounty.

Lesson: Serverless functions often follow predictable naming patterns. Generate permutations based on discovered functions and organization naming conventions.

Certificate transparency monitoring → zero-day infrastructure

CT log monitoring detected a new subdomain beta-api-v3.target.com hours after certificate issuance. The endpoint was running a beta version with debug mode enabled, exposing stack traces and internal paths. Multiple vulnerabilities found before public launch. $25K total.

Lesson: Real-time CT monitoring provides early access to new infrastructure. Beta and staging environments often have weaker security controls.

ML-powered subdomain generation → forgotten acquisition

Machine learning model trained on the company’s subdomain patterns generated legacy-oldcompany.target.com based on acquisition history. The subdomain resolved to a forgotten server from a 2019 acquisition with default credentials still active. Full server compromise.

Lesson: ML-based generation can discover human-missed patterns, especially around acquisitions and legacy infrastructure.

22. Quick Reference

Install the core stack (Go tools)

## ProjectDiscovery toolkit
go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
go install github.com/projectdiscovery/httpx/cmd/httpx@latest
go install github.com/projectdiscovery/dnsx/cmd/dnsx@latest
go install github.com/projectdiscovery/naabu/v2/cmd/naabu@latest
go install github.com/projectdiscovery/katana/cmd/katana@latest
go install github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
go install github.com/projectdiscovery/chaos-client/cmd/chaos@latest
go install github.com/projectdiscovery/notify/cmd/notify@latest

## Other Go tools
go install github.com/tomnomnom/assetfinder@latest
go install github.com/tomnomnom/waybackurls@latest
go install github.com/tomnomnom/anew@latest
go install github.com/tomnomnom/gf@latest
go install github.com/tomnomnom/unfurl@latest
go install github.com/hakluke/hakrawler@latest
go install github.com/lc/gau/v2/cmd/gau@latest
go install github.com/ffuf/ffuf/v2@latest
go install github.com/OJ/gobuster/v3@latest

## Rust / Python
cargo install feroxbuster
pip install arjun xnLinkFinder

One-liner pipelines

## Subs → alive → screenshots
subfinder -d target.com -all -silent \
  | dnsx -silent \
  | httpx -silent -screenshot

## Subs → crawl → JS → endpoints
subfinder -d target.com -all -silent \
  | httpx -silent \
  | katana -silent -d 3 \
  | grep -E "\.js$" \
  | xargs -I{} python3 linkfinder.py -i {} -o cli

## Subs → params → SSRF candidates
subfinder -d target.com -all -silent \
  | httpx -silent \
  | waybackurls \
  | gf ssrf > ssrf-candidates.txt

## Every URL ever crawled on every sub
subfinder -d target.com -silent \
  | gau --threads 10 --blacklist png,jpg,gif,css,woff \
  | sort -u > urls.txt

## Modern 2026 one-liners
## Container registry enumeration
curl -s "https://hub.docker.com/v2/repositories/target/?page_size=100" \
  | jq -r '.results[].name' | head -20

## GraphQL introspection discovery
echo '{"query": "{ __schema { types { name } } }"}' \
  | httpx -silent -mc 200 -path /graphql -method POST \
  | grep -E "(types|fields|mutations)"

## Source map hunting
cat urls.txt | grep -E "\.js$" \
  | sed 's/\.js/.js.map/g' \
  | httpx -silent -mc 200

## Serverless function name generation
echo -e "target-api\ntarget-webhook\ntarget-auth" \
  | sed 's/$/\-dev\n&\-staging\n&\-prod/' \
  | while read name; do curl -s "https://$name.lambda-url.us-east-1.on.aws/"; done

Tool family tally

Family	Canonical tool(s)
Subdomain passive	subfinder, amass, assetfinder, chaos, crt.sh
Subdomain brute	puredns, shuffledns
Permutation	dnsgen, altdns, gotator
Resolution	dnsx, massdns
HTTP probe	httpx
Port scan	naabu, masscan, nmap, rustscan
Crawling	katana, hakrawler, gospider
Archive	waybackurls, gau
JS analysis	LinkFinder, xnLinkFinder, SecretFinder, jsluice
Content discovery	ffuf, feroxbuster, gobuster, dirsearch
Parameters	arjun, paramspider, x8
Fingerprinting	whatweb, wappalyzer, httpx -td
Cloud	cloud_enum, s3scanner, CloudFail
Container/Serverless	crane, skopeo, dive, trivy
API Discovery	grpcurl, graphql-introspect, swagger-codegen
GitHub	trufflehog, gitleaks, gitgot, github-subdomains
Vuln scan	nuclei
Orchestration	bbot, reconftw, recon-ng
Notification	notify, anew

Recon checklist (per engagement)

[ ] Scope captured in a file
[ ] Seed domains expanded via reverse whois / ASN
[ ] Passive subdomain enum run (subfinder, amass, chaos, crt.sh)
[ ] Active subdomain enum run (puredns brute + permutation)
[ ] All subdomains resolved via dnsx
[ ] All live hosts probed via httpx with -td
[ ] Screenshots captured for visual triage
[ ] Port scan run (naabu top-1000 minimum)
[ ] Crawl complete (katana + waybackurls + gau)
[ ] JS files extracted and mined with LinkFinder
[ ] Source maps (.js.map) discovered and analyzed
[ ] Secrets scan run on JS and source maps
[ ] Parameter discovery run (arjun) on high-value endpoints
[ ] gf patterns applied to URL corpus
[ ] Cloud buckets permuted and tested
[ ] Container registries enumerated (Docker Hub, ECR, GCR, ACR)
[ ] GraphQL introspection attempted on discovered endpoints
[ ] API documentation discovered (Swagger/OpenAPI, GraphQL)
[ ] Serverless function enumeration (Lambda, Azure Functions, Cloud Functions)
[ ] GitHub org scanned with trufflehog
[ ] Infrastructure-as-Code repositories analyzed
[ ] Nuclei baseline scan run with ML-powered templates
[ ] Certificate transparency monitoring configured
[ ] Nightly diff pipeline set up
[ ] All raw outputs archived for future re-use

Closing Notes

Recon in 2026 compounds exponentially through automation and machine learning. Every engagement adds to your corpus — subdomains, parameters, API endpoints, container images, serverless functions, and cloud configurations. The elite hunters leverage ML for pattern recognition, continuous monitoring for real-time discovery, and cloud-scale automation for comprehensive coverage.

Modern attack surfaces span traditional web applications, cloud-native infrastructure, container registries, serverless functions, GraphQL APIs, and progressive web applications. The reconnaissance techniques that worked in 2020 miss 70% of today’s cloud-native infrastructure.

Start building your automated pipeline now. Configure certificate transparency monitoring. Train ML models on your discoveries. Set up distributed reconnaissance across cloud regions. Treat every new asset — whether it’s a subdomain, container image, or serverless function — as a fresh attack surface with its own unique vulnerabilities.

The bug isn’t in the tool you ran. It’s in the cloud service you didn’t enumerate, the source map you didn’t download, or the GraphQL endpoint you didn’t introspect.

Frequently asked questions

What is reconnaissance in cybersecurity?

Reconnaissance is the information-gathering phase where a tester maps a target’s attack surface: subdomains, live hosts, open ports, endpoints, technologies, and cloud assets. A broader asset inventory means more parameters and code paths to test for bugs.

What is the difference between passive and active recon?

Passive recon collects data without touching the target directly, using sources like certificate transparency logs, DNS records, and search engines. Active recon sends traffic to the target, such as port scans, HTTP probing, and directory brute forcing.

What are the best tools for subdomain enumeration?

Common choices include subfinder, amass, and assetfinder for passive discovery, plus dnsx or puredns for DNS brute forcing and permutation. Results are then probed with httpx to find live hosts.

How do you find hidden endpoints and parameters?

Crawl the app with tools like katana or gau, mine JavaScript bundles for API paths and keys, brute force directories with ffuf, and discover parameters with tools like Arjun or param miner.

Comprehensive Recon Guide#

Table of Contents#

1. Fundamentals#

2. Scope & Target Profiling#

Scope intake checklist#

Target profiling sources#

Seed expansion#

3. Subdomain Enumeration#

Passive tools#

Subfinder in depth#

Amass in depth#

Certificate transparency#

Merging sources#

4. DNS Brute Force & Permutation#

puredns#

Permutation (alterations)#

Resolvers#

5. Live Host Discovery & HTTP Probing#

dnsx — DNS resolution at scale#

httpx — HTTP/HTTPS probing#

6. Port Scanning#

naabu — fast SYN/CONNECT scanner#

masscan — internet-scale SYN scanner#

nmap — deep service/version detection#

rustscan#

Ports worth always scanning#

7. URL & Endpoint Crawling#

katana — modern active crawler#

waybackurls — historical URLs#

gau — getallurls (Wayback + CommonCrawl + OTX + URLScan)#

hakrawler#

Merging crawl sources#

gf — pattern classifier#

8. JavaScript Analysis#

Extract all JS files#

LinkFinder — endpoint extraction#

xnLinkFinder#

SecretFinder — secret detection in JS#

Manual grep patterns#

JSLuice#

9. Content & Directory Discovery#

ffuf — fuzzing swiss army knife#

feroxbuster#

gobuster#

Discovery strategy#

10. Parameter Discovery#

Arjun#

ParamSpider#

x8#

High-signal parameter names to always test#

11. Technology Fingerprinting#

httpx tech detect#

whatweb#

Wappalyzer CLI#

Custom fingerprinting via headers/favicon#

Shodan & Censys pivots#

12. Cloud Asset Discovery#

S3 bucket discovery#

S3Scanner#

cloud_enum#

Azure blob storage#

GCP bucket enumeration#

CloudFail (DNS/database-based origin discovery)#

Bucket wordlists#

13. GitHub & Code Leak Hunting#

GitHub dorking#

GitGot (Bishop Fox)#

trufflehog — high-entropy secret scanning#

gitleaks#

github-subdomains#

What to look for#

14. ASN & Infrastructure Expansion#

Find the ASN#

Enumerate all IP ranges for an ASN#

Probe everything in the range#

Reverse DNS across the range#

Why this works#

15. Container & Serverless Discovery#

Container registry enumeration#

Container image analysis#

Comprehensive Recon Guide

Table of Contents

1. Fundamentals

2. Scope & Target Profiling

Scope intake checklist

Target profiling sources

Seed expansion

3. Subdomain Enumeration

Passive tools

Subfinder in depth

Amass in depth

Certificate transparency

Merging sources

4. DNS Brute Force & Permutation

puredns

Permutation (alterations)

Resolvers

5. Live Host Discovery & HTTP Probing

dnsx — DNS resolution at scale

httpx — HTTP/HTTPS probing

6. Port Scanning

naabu — fast SYN/CONNECT scanner

masscan — internet-scale SYN scanner

nmap — deep service/version detection

rustscan

Ports worth always scanning

7. URL & Endpoint Crawling

katana — modern active crawler

waybackurls — historical URLs

gau — getallurls (Wayback + CommonCrawl + OTX + URLScan)

hakrawler

Merging crawl sources

gf — pattern classifier

8. JavaScript Analysis

Extract all JS files

LinkFinder — endpoint extraction

xnLinkFinder

SecretFinder — secret detection in JS

Manual grep patterns

JSLuice

9. Content & Directory Discovery

ffuf — fuzzing swiss army knife

feroxbuster

gobuster

Discovery strategy

10. Parameter Discovery

Arjun

ParamSpider

x8

High-signal parameter names to always test

11. Technology Fingerprinting

httpx tech detect

whatweb

Wappalyzer CLI

Custom fingerprinting via headers/favicon

Shodan & Censys pivots

12. Cloud Asset Discovery

S3 bucket discovery

S3Scanner

cloud_enum

Azure blob storage

GCP bucket enumeration

CloudFail (DNS/database-based origin discovery)

Bucket wordlists

13. GitHub & Code Leak Hunting

GitHub dorking

GitGot (Bishop Fox)

trufflehog — high-entropy secret scanning

gitleaks

github-subdomains

What to look for

14. ASN & Infrastructure Expansion

Find the ASN

Enumerate all IP ranges for an ASN

Probe everything in the range

Reverse DNS across the range

Why this works

15. Container & Serverless Discovery

Container registry enumeration

Container image analysis