Comprehensive OSINT Guide
π Enhanced May 2, 2026 - Updated with AI-assisted intelligence and blockchain analysis including enhanced social media techniques and modern OSINT automation from comprehensive 2026 research.
A practitioner’s reference for Open Source Intelligence β methodology, collection disciplines, tooling, pivoting techniques, and operational security. Enhanced with 2026 AI-assisted techniques and emerging platform intelligence. Compiled from 200+ research sources and enhanced through automated analysis of current OSINT developments.
Table of Contents
- Fundamentals
- The OSINT Lifecycle
- People OSINT (HUMINT/SOCMINT)
- Company & Corporate OSINT
- Infrastructure & Network OSINT
- Domain, DNS & Certificate Intel
- Social Media Intelligence
- 2026 Enhanced Social Media Intelligence
- Geolocation & Imagery (GEOINT)
- Breach, Leak & Paste Intel
- Metadata Extraction
- Code & Repository OSINT
- Dark Web & Threat Intel
- IoT & Device Discovery
- Automation & Visualization
- Cloud & Modern Infrastructure Intelligence
- Blockchain & Financial Intelligence 2026
- AI-Assisted OSINT 2026
- Anti-Detection & Privacy Evasion
- Continuous Monitoring & Threat Hunting
- Operational Security
- Legal & Ethical Considerations
- Quick Reference
- Tools Reference
1. Fundamentals
Open Source Intelligence (OSINT) is the discipline of collecting, correlating, and analyzing information that is publicly or legally available to produce actionable intelligence. “Open source” does not mean “easy” or “low value” β it means no clandestine collection is involved. The sources are lawful: the skill lies in knowing where to look, how to pivot, and how to assemble fragments into a coherent picture.
Why it matters:
| Use case | Practitioners |
|---|---|
| Adversary reconnaissance | Red teams, pentesters, bug bounty hunters |
| Attack surface management | Blue teams, security engineers, CISOs |
| Threat intelligence | SOC analysts, CTI teams, IR responders |
| Fraud and KYC investigation | Financial crime analysts, compliance |
| Journalism and research | Investigative reporters, academic researchers |
| Law enforcement | Missing persons, criminal investigations |
| Due diligence | M&A, investor research, hiring |
| Personal self-defense | Privacy audits, stalker detection |
Core principles:
- Every fact is a pivot. An email address is not an endpoint β it is a seed for breach lookups, social profile enumeration, domain registrations, and search engine dorks.
- Triangulate before trusting. Any single source can be wrong, stale, or planted. Cross-reference at least two independent sources before treating a data point as confirmed.
- Document as you go. If you cannot reproduce a finding in six months, it did not happen. Screenshot, hash, archive.
- Stay passive until you must be active. The default mode is observation. Only escalate to direct interaction when the intelligence you need cannot be harvested from existing public records.
- Scope creep kills investigations. Define the question up front and resist chasing shiny tangents unless they directly serve the objective.
Passive vs. active collection:
| Passive | Active | |
|---|---|---|
| Contact with target | None β consult third-party data only | Direct queries against target infrastructure |
| Detection risk | Near zero | Logs, rate limits, WAF alerts |
| Data freshness | Can be stale (days to years) | Real-time |
| Examples | crt.sh, Shodan, archive.org, Google dorks | Nmap scan, directory brute-force, HTTP probing |
| When used | Always first; enumerate scope and context | After passive exhausted, to confirm/expand |
The cardinal rule: finish passive recon before touching the target. Anything you can learn from Censys or certificate transparency logs is something you do not need to poke a production server for.
2. The OSINT Lifecycle
Every investigation, whether a two-hour recon sprint or a month-long deep-dive, follows the same phases. Discipline here separates practitioners from tourists.
Phase 1: Planning & Requirements
Write down the question. Who is the target? What decisions will the intelligence support? What is in scope, and what is off-limits? What is the deadline? What format does the deliverable take? Investigations without a defined question wander forever.
Phase 2: Collection
Gather raw data from identified sources. The temptation is to start here β resist it until the plan is clear. Collection spans subdomains, WHOIS records, social profiles, PDF metadata, breach dumps, code repos, certificate logs, historical archives, and more. Keep raw artifacts separate from processed notes.
Phase 3: Processing
Normalize the data. Dedupe subdomains, resolve hostnames to IPs, extract EXIF from images, parse PDFs for authors. Convert everything into a form you can query and pivot against. A messy collection phase dies here.
Phase 4: Analysis
Turn data into intelligence. Correlate findings across sources: the email on the WHOIS record matches the Gravatar on GitHub, which matches a LinkedIn photo, which matches a conference speaker bio. Link analysis tools (Maltego, spreadsheets, link graphs) help surface non-obvious connections.
Phase 5: Dissemination
Deliver findings in the format the consumer expects. A bug bounty report, a pentest recon appendix, an executive briefing, a due diligence memo. Include provenance for every claim β where it came from, when it was collected, and how confident you are.
Phase 6: Feedback
Does the intelligence answer the question? What was missed? What should have been collected sooner? Feed lessons back into Phase 1 for the next engagement.
3. People OSINT (HUMINT/SOCMINT)
People investigations map a target’s digital footprint: identifiers, aliases, affiliations, locations, and relationships. The process is iterative β each fact opens new pivots.
Starting identifiers
| Seed | Immediate pivots |
|---|---|
| Full name | Search engines, LinkedIn, Wikipedia, voter rolls, academic directories |
| HaveIBeenPwned, Hunter.io, Gravatar, breach dumps, domain WHOIS, Google | |
| Username | Sherlock, WhatsMyName, Namechk, Maigret |
| Phone number | PhoneInfoga, truecaller, reverse lookup, Telegram/WhatsApp checks |
| Profile photo | Reverse image search (Google, Yandex, TinEye, PimEyes) |
| Employer | LinkedIn, press releases, company filings |
| Address | Property records, voter rolls, Google Maps Street View |
The Maltego-style pivot graph
Treating a person investigation as a graph (nodes = identifiers/entities, edges = “associated with”) prevents losing track of where each fact originated. A typical pivot chain from a Maltego-style workflow:
- Name β search page titles, Wikipedia, personal website
- Personal website β footer emails, phone numbers, historical WHOIS (DomainTools)
- WHOIS email β other domains registered with same email (reverse WHOIS via WhoXY)
- Social handles (
marc_clotet,marcclotetoficial) β Instagram, Twitter, Facebook profiles - Mutual followers/following β close contacts, private accounts
- Affiliated company (mentioned in bio) β corporate registry β officers β other affiliated parties
- Historical Hotmail address uncovered via DomainTools β Pipl person search β age, relatives, locations
- Phone number β messaging app profile photos, account discovery
Each step is a pivot from a confirmed entity to new related entities. Maltego Transforms automate the individual hops; you can run the same workflow manually with curl, whois, and careful note-taking.
Username enumeration
Most people reuse handles across platforms. Tools that check hundreds of sites in parallel:
- Sherlock β Python tool checking 300+ social networks for a username
- WhatsMyName β web/CLI tool with a community-maintained JSON list of sites
- Maigret β fork of Sherlock with richer profile data extraction
- Namechk / KnowEm β brand/username availability checkers repurposed for OSINT
A hit on a niche forum is often worth more than another Twitter account β niche forums surface real interests, writing samples, and contact patterns.
Email enumeration and validation
- Hunter.io β finds email addresses by domain, infers patterns (
first.last@,flast@), verifies deliverability - Email permutator β generates plausible addresses from a name plus domain
- HaveIBeenPwned β reveals which breaches an email appears in (reveals services used)
- Gravatar β
https://gravatar.com/<md5(email)>.jsonreturns profile if registered - Epieos / Holehe β checks dozens of services for account registration without triggering password reset emails
Phone numbers
- PhoneInfoga β country, carrier, line type, breach hits
- Truecaller β crowdsourced caller ID; risky (reveals your query to Truecaller)
- Messaging apps β adding a number to contacts often reveals the registered profile name and avatar (opsec-heavy; use burner)
Reverse image search
- Google Images / Google Lens β best for product, landmark, and Western content
- Yandex Images β best for faces and people; still the strongest for facial matches
- TinEye β best for finding the original source and earliest occurrence
- PimEyes / FaceCheck β facial recognition across the open web (paid, ethically fraught)
- Bing Visual Search β decent for products and landmarks
4. Company & Corporate OSINT
Company investigations combine infrastructure recon with corporate filings, personnel mapping, and vendor/technology fingerprinting.
Corporate identity sources
| Source | Data |
|---|---|
| OpenCorporates | Global company registry metadata β officers, addresses, status |
| SEC EDGAR | US public company filings (10-K, 10-Q, insider transactions) |
| Companies House (UK) | Officers, filings, beneficial owners |
| Bureau van Dijk / Orbis | Paid, comprehensive global company intelligence |
| Dun & Bradstreet | Business credit, corporate family trees |
| Crunchbase / PitchBook | Funding, investors, board members (paid tiers for depth) |
| LinkedIn company pages | Headcount, departments, employee list |
| Full Contact / Clearbit | Enrichment APIs β size, industry, tech stack, key people |
Subsidiary and domain discovery
Large companies have sprawling digital footprints. Start from the primary name and expand:
- Reverse WHOIS (WhoXY, DomainTools) β find all domains registered to the same name or email. Remember: WhoXY requires exact-string matches, so “Blizzard Entertainment” and “Blizzard Entertainment, Inc” will return different sets.
- Trademark search β USPTO, EUIPO filings reveal product codenames and subsidiaries.
- Press releases and SEC filings β mention subsidiary names that never appear on the website.
- Job postings β often mention internal tool names, cloud providers, and office locations.
Employee enumeration
- LinkedIn β
site:linkedin.com "Acme Corp"via Google reveals public profiles even without login - Hunter.io / phonebook.cz β bulk email harvest by domain
- GitHub β commits by
@company.comemail addresses expose engineers - Conference talks, CVE credits, paper authorship β lists specialists
- RocketReach, Lusha, Apollo β sales tools repurposed for contact discovery (paid)
Technology fingerprinting
Knowing the stack narrows exploitation research later:
- BuiltWith / Wappalyzer β web stack detection from rendered HTML and headers
- Shodan / Censys β banner grabs reveal server software and versions
- DNS records β MX (
pphosted.com= Proofpoint), SPF (spf.protection.outlook.com= O365), CNAMEs revealing CDN/CMS - JavaScript bundles β library imports, API endpoints, third-party integrations
5. Infrastructure & Network OSINT
This is where OSINT crosses most directly into security recon. The goal is to enumerate every externally reachable asset and catalog what is running on it β without touching the target.
IP space and ASN
- ARIN / RIPE / APNIC / LACNIC / AFRINIC RDAP β WHOIS for IP blocks, netblock ownership
- bgp.he.net β AS number lookups, peering relationships, announced prefixes
- ipinfo.io / ipdata β enrichment APIs with geoloc, ASN, org
- RIPEstat β authoritative routing, abuse contacts, historical data
A company that owns its own ASN signals maturity and gives you a clean IP perimeter. A company entirely on cloud (all AWS/GCP/Azure) means you map their domains back to cloud ranges instead.
Search engines for infrastructure
These are the indispensable tools. None of them touch the target; they query pre-indexed scan data.
| Tool | Best for | Free tier |
|---|---|---|
| Shodan | Banners, service versions, SCADA/ICS, webcams, IoT, vulnerability filters (vuln:) | Limited queries, paid plans unlock filters |
| Censys | Certificate search, service fingerprinting, precise field queries | 250 searches/month |
| Netlas.io | Domains, IPs, WHOIS, DNS combined; Maltego integration | 50 searches/day, 2500 results/month |
| FOFA | Chinese alternative, strong for APAC infrastructure | Limited |
| ZoomEye | Another Chinese alternative | Limited |
| BinaryEdge | Scans, leaked databases, risk scoring | Paid |
| GreyNoise | Classifies “background noise” IPs to filter scan traffic | Community tier |
| Hunter.how | Cyberspace search engine | Limited |
Shodan query patterns:
hostname:example.com
ip:203.0.113.0/24
port:22 country:US
product:"nginx"
vuln:CVE-2021-44228
org:"Acme Corp"
ssl:"example.com"
Censys field queries:
parsed.names: example.com
services.service_name: "HTTP" and location.country: "United States"
services.tls.certificates.leaf_data.names: "*.example.com"
autonomous_system.asn: 13335
Cloud asset discovery
Most modern targets live in public cloud. Mapping cloud assets:
- S3 buckets β use
awsclior boto3 (authenticated anonymous checks surface more than unauthenticated HTTP probes). Bucket names need to be globally unique; try company-name variants with common prefixes/suffixes:qa,dev,staging,prod,bak,backup,logs,assets,uat,legacy,internal,public,private,docs. - Digital Ocean Spaces β same API shape as S3, separate namespace to enumerate.
- Azure blob storage β
<name>.blob.core.windows.net - GCS buckets β
<name>.storage.googleapis.com - Firebase databases β
<name>.firebaseio.com - Dangling CNAMEs β records pointing to deleted cloud resources are ripe for subdomain takeover. The can-i-take-over-xyz repo catalogs the fingerprints.
Historical data
- Wayback Machine (archive.org) β snapshots of old pages, forgotten endpoints,
robots.txtevolution, admin panel references - CommonCrawl β bulk web archive suitable for scripted search
- SecurityTrails β historical DNS records, WHOIS changes, subdomain discovery
- DomainTools β historical WHOIS (the closest thing to a time machine for registration data)
- Google Cache β the cached view is gradually being removed, but still useful when present
Historical data is often more valuable than current data. A subdomain that vanished last year may still point at a forgotten S3 bucket. An old WHOIS record may contain an admin’s personal email that was scrubbed from the current record.
6. Domain, DNS & Certificate Intel
Domain-level intel is the connective tissue of infrastructure OSINT.
Subdomain enumeration
Passive sources pull from pre-indexed databases β no traffic to the target:
- crt.sh β free, no rate limit, queries Certificate Transparency logs. Every TLS cert issued is logged publicly, so issuing a cert for
hr.example.comis enough to discover that subdomain even before it goes live. - Certificate Transparency (
transparencyreport.google.com) β Google’s CT aggregator - VirusTotal β passive DNS from submissions
- SecurityTrails / DNSDumpster / Netcraft β historical DNS aggregators
- Subfinder β orchestrates queries against 30+ passive sources
- Amass (intel/enum passive mode) β OWASP’s enumeration framework
- Assetfinder β tomnomnom’s lightweight passive finder
- chaos-client β ProjectDiscovery’s Chaos dataset
Active enumeration adds brute-force, permutation, and zone walking:
- puredns / shuffledns β mass DNS resolution with wildcards filtered
- altdns / dnsgen β permutation wordlists from discovered names
- dnsrecon β zone transfers, cache snooping
- massdns β raw resolver for large lists
Active validation
Once you have a list of candidate names, resolve and probe them:
- dnsx β bulk resolution, filtering by record type
- httpx β probes live HTTP(S) services, captures titles, tech stacks, status codes
- aquatone β visual recon; screenshots and clusters subdomains by similarity
- gowitness / eyewitness β alternative screenshotters
DNS record mining
Beyond A/CNAME, records leak intel:
| Record | Intel |
|---|---|
| MX | Email provider (Google, Microsoft, Proofpoint, Mimecast) |
| TXT | SPF records list third-party senders (Marketo, Salesforce, Zendesk); DKIM selectors hint at tooling |
| SRV | Exposes specific services (XMPP, SIP, LDAP) |
| CAA | Allowed certificate authorities |
| NS | DNS provider (Route53, Cloudflare, NS1) |
| SOA | Admin email, zone refresh parameters |
Weak or missing SPF/DMARC (e.g. v=DMARC1; p=none) signals exploitable email spoofing potential. DKIMValidator is a classic utility for testing DMARC alignment without interacting with the target infrastructure.
WHOIS
- Current WHOIS β often redacted under GDPR for individuals, still useful for corporate registrants
- Historical WHOIS β DomainTools, WhoisXML API, Whoxy β the unredacted gold
- Reverse WHOIS β find all domains sharing a registrant email, name, phone, or organization
7. Social Media Intelligence
SOCMINT is high-signal but high-noise. Treat public social media as a window into a target’s relationships, routines, locations, and interests β and treat closely curated accounts (execs, celebrities) as performative artifacts, not ground truth.
Platform-by-platform quick reference
| Platform | High-value intel |
|---|---|
| Employer history, skills, internal tool mentions, team maps, location | |
| Twitter/X | Writing style, real-time location, device fingerprints (Twitter for iPhone etc.), interests, connections |
| Family, relationships, check-ins, photos, events, groups | |
| Geolocation from photos, friend networks, routines, physical spaces | |
| TikTok | Schedules, location context, behavioral patterns |
| Long-form writing, niche communities, real-world interests | |
| GitHub | Code, commits, emails, working hours, associated accounts |
| Strava/fitness apps | Routines, home location, military base exposure |
| Telegram | Phone number β profile β channels joined |
| Discord | Real-time presence, community affiliations |
Techniques
- Close contact mapping β mutual followers on Instagram, Facebook friend overlap, Twitter interaction graphs
- Temporal analysis β timestamp clusters reveal timezone, sleep schedule, work hours
- Linguistic fingerprinting β consistent phrasing across accounts links aliases
- Photo OSINT β backgrounds reveal location, device clock shows timezone, reflections leak environment
- Story/ephemeral content β archive quickly; gone in 24 hours
Tools
- Sherlock / WhatsMyName / Maigret β username across platforms
- Osintgram β Instagram enumeration (rate-limited; may violate ToS)
- Twint / snscrape β Twitter scraping without API (fragile post-API lockdown)
- Social Analyzer β API and CLI for social profile discovery
- Social-Searcher β keyword monitoring across platforms
- Blackbird β username/email search across 500+ sites
- IntelX β indexed social content and leaked data
Note: platform APIs and terms have tightened significantly. Many classic scraping tools are in a state of perpetual repair. Always check recency.
8. 2026 Enhanced Social Media Intelligence
The social media intelligence landscape has evolved dramatically. Modern OSINT practitioners must adapt to new platforms, enhanced privacy controls, AI-generated content detection, and API restrictions. This section covers the latest developments and techniques for 2026.
TikTok Intelligence & Short-Form Video Analysis
TikTok has become a critical intelligence source, particularly for demographic research, trend analysis, and geopolitical monitoring. However, it presents unique challenges due to algorithm-driven content delivery and mobile-first design.
TikTok OSINT Techniques:
- Hashtag intelligence β tracking viral hashtags reveals emerging trends, social movements, and coordinated campaigns
- Sound/music tracking β original sounds often contain location or identity markers
- Duet/collaboration mapping β relationship analysis through video responses
- Geofence analysis β location-based content clustering
- Temporal analysis β posting patterns reveal timezone and routine information
Tools and Methods:
- TikTok Creative Center β official analytics for hashtag and trend analysis
- TikTok Ads Library β reveals promoted content and targeting demographics
- Browser automation β careful scraping with anti-detection measures
- Manual collection β screenshot and archive approach for sensitive investigations
Discord & Gaming Platform Intelligence
Discord has evolved beyond gaming into a primary communication platform for various communities, making it valuable for community mapping and real-time intelligence.
Discord OSINT Vectors:
- Server discovery β public server directories reveal community interests
- User ID enumeration β Discord IDs are sequential and reveal account creation timing
- Webhook monitoring β tracking automated posts from other platforms
- Voice channel analysis β real-time conversation monitoring (with consent/legal authorization)
- Bot interaction β custom bots can reveal user activity patterns
Operational Considerations:
- Discord has strong anti-scraping measures; prefer manual collection
- Server invites often expire or are single-use
- User privacy settings can hide activity from non-friends
Telegram Intelligence & Channel Analysis
Telegram’s channels and groups provide rich intelligence, particularly for threat research and information operations tracking.
Telegram OSINT Techniques:
- Channel subscriber analysis β public member counts and growth tracking
- Cross-platform content tracking β identifying content origins and propagation
- File metadata analysis β documents and media shared in channels
- Forward chain analysis β tracking message propagation across channels
- Bot intelligence β automated accounts reveal operational patterns
Tools and Resources:
- Telegram Web β browser-based access for collection and archival
- Telethon β Python library for programmatic access (requires API credentials)
- TGStat β public Telegram analytics platform
- Manual archival β screenshot and download approach for legal compliance
Deepfake & Synthetic Media Detection
As AI-generated content proliferates, distinguishing authentic from synthetic media becomes critical for intelligence validation.
Detection Techniques:
- Technical analysis β compression artifacts, consistency in lighting/shadows
- Behavioral analysis β micro-expressions, blinking patterns, speech patterns
- Metadata examination β generation software signatures
- Cross-reference verification β comparing against known authentic content
- AI-assisted detection β tools like Deepware Scanner, Microsoft Video Authenticator
Red Flags for Synthetic Content:
- Inconsistent lighting or shadows across the image/video
- Blurring or pixelation around face/hair boundaries
- Unnatural eye movement or blinking patterns
- Audio-visual synchronization issues
- Metadata inconsistencies or missing camera information
Anti-Detection & Privacy-Aware Collection
Modern social platforms employ sophisticated detection mechanisms. Collection methods must evolve to remain effective while respecting platform terms and legal boundaries.
Advanced Anti-Detection Techniques:
- Residential proxy rotation β avoiding data center IP detection
- Browser fingerprint randomization β Canvas, WebGL, and font fingerprint management
- Human-like timing patterns β randomized delays and interaction simulation
- Session management β cookie rotation and header consistency
- API quota management β distributed collection across multiple authenticated sessions
Platform-Specific Considerations:
| Platform | Primary Detection Methods | Recommended Approach |
|---|---|---|
| TikTok | Device fingerprinting, behavioral analysis | Mobile emulation, manual collection |
| API rate limiting, IP blocking | Residential proxies, authenticated sessions | |
| Discord | Bot detection, server monitoring | Manual interaction, webhook monitoring |
| Telegram | API restrictions, flood protection | Official client, rate-limited automation |
9. Geolocation & Imagery (GEOINT)
Determining where a photo, video, or person is located from visual evidence.
Classic technique stack
- Shadow analysis β sun angle gives latitude and time of day (SunCalc, Suncalc.org)
- Landmark identification β monuments, logos, business signage
- Language and script β signage language narrows region
- Vegetation β tree species and agriculture indicate climate zone
- Vehicle makes and license plate formats β country/region disambiguation
- Electrical plug shapes and pole construction β power grid standards vary by region
- Road markings β lane widths, stripe patterns, sign shapes (MUTCD vs. Vienna Convention)
- Architecture β roofing styles, window frames, construction materials
Reverse image search
Run the same image through all of these β coverage varies wildly:
- Google Images / Lens
- Yandex Images β still the best for Russian/Eastern European and general face matching
- TinEye β best for finding originals and earliest occurrences
- Bing Visual Search
- Baidu β better for Chinese content
AI-assisted geolocation
Modern multimodal models can synthesize the classic technique stack in seconds. The Hackers Arise walkthrough demonstrates using custom GPTs like GeoGuessr GPT for first-pass geolocation: upload an image, ask where it was taken. These models do not do reverse image search β they visually reason over architectural, vegetation, and signage cues. They are often wrong on specifics but provide a valuable starting framework of observations (“the road signs suggest Cyrillic-script Eastern Europe; the utility pole style matches post-Soviet construction”).
Practitioners should treat AI guesses as hypotheses, not conclusions, and verify every claim against ground-truth imagery.
Mapping and imagery sources
- Google Maps / Earth Pro β Street View, historical imagery, 3D buildings
- Yandex Maps / Mapillary / KartaView β alternative street-level imagery, stronger coverage in some regions
- Sentinel Hub / EO Browser β free satellite imagery (Sentinel-2, Landsat)
- Planet Labs β commercial high-cadence satellite imagery
- OpenStreetMap β community mapping with extractable POI data
- Overpass Turbo β query OSM for arbitrary features (e.g. “all churches in this bounding box with a spire over 30m”)
- Wikimapia β crowd-sourced photo-annotated POI database
Video and live stream OSINT
- EarthCam / Insecam β aggregated public webcams (many unintentional)
- Windy.com β live webcams for weather
- YouTube geosearch tools β find videos shot within a geographic radius
- FlightRadar24 / ADS-B Exchange β real-time civilian aircraft tracking
- MarineTraffic / VesselFinder β real-time ship AIS data
- RailSense / similar β train tracking by region
EXIF and video metadata
Raw camera files contain GPS coordinates by default unless stripped. Most social platforms strip EXIF, but platforms that preserve it (Flickr, some forums, raw email attachments) can hand-deliver the answer. Tools: exiftool, ExifTool Online, Jeffrey's Image Metadata Viewer.
9. Breach, Leak & Paste Intel
Credentials, PII, and internal data exposed through historical breaches and paste sites are a cornerstone of offensive OSINT.
Breach lookup services
- HaveIBeenPwned β free, non-commercial use; reveals which breaches an email appears in. Also exposes pastes containing the email. Pwned Passwords lets you check whether a specific password has been seen in any breach without sending the password (k-anonymity via SHA-1 prefix).
- Dehashed β paid, searchable index of actual credential content
- IntelligenceX / IntelX β indexed breach and leak content, darknet sources
- LeakCheck / Snusbase / LeakPeek β commercial breach databases
- Breach-parse / h8mail β local tools for searching personal breach archives
Operational notes
- HIBP tells you an email was in Collection #1, but not the password. Commercial services provide the cleartext, if ethically acceptable for your engagement.
- “Sensitive” breach flags (Ashley Madison, etc.) require judgment β referencing them in a client deliverable is frequently inappropriate even when technically accurate.
- Breach data ages: a password from 2013 is probably not current, but hints at password patterns and reveals services the user has engaged with.
- Pastes live and die quickly. If a paste URL 404s, check Google cache and Wayback Machine immediately.
Paste sites and dumps
- Pastebin β classic source, still productive
- Ghostbin / Hastebin / Rentry / Privatebin β newer alternatives
- GitHub Gist β frequently overlooked; indexed by Google (
site:gist.github.com) - Telegram channels β many dump channels operate exclusively on Telegram
- Darknet forums β BreachForums, XSS, Exploit β require careful opsec
10. Metadata Extraction
Documents, images, and files published by a target frequently leak internal usernames, software versions, file paths, and timestamps.
Document metadata
- exiftool β the canonical CLI tool; handles EXIF, XMP, IPTC, PDF metadata, Office documents
- FOCA (Fingerprinting Organizations with Collected Archives) β downloads documents from a target domain, extracts metadata in bulk, builds org charts from author fields
- metagoofil β FOCA-alike in Python, uses Google/Bing to find documents by filetype on a target domain
- PDFiD / peepdf β PDF internals inspection
- oletools β OLE/Office document internals
- mat2 β metadata anonymization tool; useful for understanding what it strips and therefore what is leaked
Google dorks for document hunts
site:example.com filetype:pdf
site:example.com filetype:xlsx
site:example.com filetype:docx
site:example.com ext:doc OR ext:docx OR ext:xls OR ext:xlsx
site:example.com "for internal use only"
What metadata reveals
| Field | Leak |
|---|---|
| Author | Internal username (often the domain login) |
| Creation software | Microsoft Office 2016, LibreOffice 7.4 β software inventory |
| Last modified by | Another internal user |
| Printer | Printer model and possibly IP |
| Revision history | Earlier drafts, collaborators |
| Embedded images | Secondary EXIF data |
| Hyperlinks | Internal SharePoint/intranet URLs |
| File paths | C:\Users\jdoe\Documents\... reveals username |
11. Code & Repository OSINT
Source code hosting platforms are a gold mine. Every commit is a historical record, and secrets leak constantly.
GitHub search techniques
Surface leaked credentials and sensitive content:
"org:acmecorp" password
"org:acmecorp" apikey
"@acmecorp.com" password
filename:.env acmecorp
filename:config.yml acmecorp
"BEGIN RSA PRIVATE KEY" acmecorp
extension:sql acmecorp INSERT INTO users
Note that GitHub’s secret scanning revokes many tokens automatically, so old dumps may have stale credentials β still useful for mapping services used.
Tools
- gitleaks β scans repos and Git history for secret patterns
- trufflehog β entropy-based secret detection, supports GitHub org scanning
- git-secrets β AWS Labs tool; primarily for preventing commits but usable for audit
- gitrob β catalogs secrets across an organization’s public repos
- github-dorks / gh-dork β curated dork lists
- GitHound / GitMiner β deep search across public GitHub
Pivoting from a single repo
- Commit metadata β author email, name, timestamps (working hours)
.github/CODEOWNERSβ team structure- Issue comments β internal tool names, vendors, ticket systems
- PR reviewers β collaboration networks
- Starred/forked repos β interests, technology exposure
- GitHub Pages β hosted sites under
<user>.github.iooften have separate content
Beyond GitHub
- GitLab.com β same techniques, smaller dork coverage
- Bitbucket β less searchable but still scannable
- Self-hosted instances β Gitea, Forgejo, cgit β find via Shodan (
http.title:"Gitea") - DockerHub β images often ship with embedded secrets or leaked file paths
- npm / PyPI / crates.io β package authors, private package mentions in public packages
12. Dark Web & Threat Intel
Aggregators and commercial platforms fold darknet content, malware telemetry, and threat actor intelligence into the OSINT pipeline.
Platforms
- Intel 471 β cybercriminal forum and actor intelligence
- Recorded Future β broad threat intel with OSINT and closed-source blend
- CloudSEK (XVigil) β external threat monitoring, brand exposure, dark web
- Flashpoint β illicit community monitoring
- DarkOwl β darknet content search
- ShadowDragon (SocialNet, etc.) β investigative toolkits with 200+ data sources integrated
- ZeroFox β brand protection, social and dark web
- Digital Shadows / ReliaQuest β digital risk protection
- Maltego + Transform Hub β glue for integrating many of the above
Threat intel feeds
- MISP β open-source threat intelligence sharing platform
- AlienVault OTX β free community threat exchange
- abuse.ch (URLhaus, MalwareBazaar, ThreatFox, Feodo Tracker) β free high-quality IoC feeds
- VirusTotal Intelligence β paid search over submitted samples, URLs, domains
- GreyNoise β distinguishes targeted scans from internet background noise
13. IoT & Device Discovery
Specialized search for internet-connected devices and sensors, from industrial control systems to smart home devices.
- Shodan β still the best for ICS/SCADA (
port:502,port:102,category:ics) - Censys β complementary coverage
- ZoomEye β strong APAC IoT coverage
- Thingful β the “search engine for the Internet of Things” β aggregates public IoT sensor data (air quality, weather, energy, transport) across millions of devices globally, suitable for environmental research and urban analytics
- Kamerka β geolocation-focused ICS/IoT scanner using Shodan/Binary Edge data
- Insecam β lists public webcams (many with default credentials)
These tools are powerful for researchers mapping exposure and for defenders cataloging their own attack surface. They are equally abused by attackers β defenders should track their own presence in them.
24. Tools Reference
A consolidated lookup of the tools practitioners reach for. The overlap between “OSINT tool” and “recon tool” is large; most of these appear repeatedly in the source surveys.
Frameworks and aggregators
| Tool | Purpose |
|---|---|
| Maltego | Graph-based link analysis, Transform Hub with 70+ data sources, the standard for investigations that must produce a visual link chart |
| SpiderFoot | Automated OSINT framework, 200+ modules, web UI, runs scheduled scans, correlates findings |
| recon-ng | Framework with Metasploit-style module system for recon workflows |
| theHarvester | Email, subdomain, employee name enumeration from search engines and PGP servers |
| OSINT Framework (osintframework.com) | Curated web directory of tools by category; not a scanner, but the best starting map of the ecosystem |
| IntelTechniques (OSINT Techniques) | Michael Bazzell’s methodology and tool collection |
Maltego
- Model: graph of entities (Person, Domain, IP, Email, etc.) connected by relationships. Transforms run against an entity to produce related entities.
- Data sources: the Transform Hub integrates DomainTools, Shodan, Pipl, OpenCorporates, Censys, Have I Been Pwned, Vetric, Netlas, IBM Watson, and many more. Many are paid.
- Use cases: person of interest investigations, corporate link analysis, threat actor attribution, fraud networks.
- Typical workflow: seed with names, domains, or emails β run passive Transforms β pivot on interesting results β prune noise β export as report or visual graph. A complete person investigation can move from a name to Wikipedia to personal website to historical WHOIS to personal email to person profile (age, relatives) in a handful of Transform runs.
SpiderFoot
- Model: modular scanner with 200+ modules, each tapping a specific data source. Configure the target and scan profile, run, review.
- Data sources: Shodan, VirusTotal, HIBP, SecurityTrails, HackerTarget, crt.sh, Censys, IntelX, and many more (some require API keys).
- Use cases: baseline external exposure audit, continuous monitoring, bug bounty asset discovery, threat investigation.
- Strengths: fire-and-forget automation, depth of coverage, built-in correlation rules that highlight interesting findings across modules.
theHarvester
- Model: CLI tool that queries search engines, DNS sources, and PGP key servers for emails, subdomains, IPs, and employee names.
- Sources: Google, Bing, DuckDuckGo, LinkedIn, Baidu, crt.sh, Shodan, Censys, and many more.
- Typical invocation:
theHarvester -d example.com -l 500 -b all - Strengths: simple, scriptable, pairs well with automation pipelines.
recon-ng
- Model: Metasploit-style framework (
workspaces,modules,options). Modules fetch specific data types into a workspace database. - Strengths: good persistence of results across sessions, scriptable, reasonable module coverage for core recon tasks.
- Typical flow:
workspaces create acmeβ add seed domains β runrecon/domains-hosts/*modules β export.
Sherlock
- Purpose: username enumeration across 300+ social sites.
python3 sherlock jdoe. - Strengths: fast, easy, no API keys. Good for alias discovery.
- Caveats: false positives on generic 200 responses; validate manually.
Shodan
- Purpose: search engine over internet-connected service banners. Queries scan data, not live services.
- Filters:
port:,product:,version:,org:,hostname:,country:,vuln:,category:,ssl:,http.title:,http.html: - CLI:
shodan host 1.2.3.4,shodan search 'apache country:US',shodan download,shodan parse - Best for: attack surface snapshots, finding forgotten assets, identifying vulnerable software at scale.
Censys
- Purpose: internet-wide scan data with particular strength in TLS certificates and precise field queries.
- Query language: Lucene-style with parsed fields.
services.service_name: "HTTP" and parsed.names: example.com - Strengths: certificate history, subdomain discovery via cert parsed names, strong API.
- Free tier: 250 web searches/month; API access requires a paid plan.
Specialized tools referenced across the surveys
| Category | Tools |
|---|---|
| Subdomain enum | Subfinder, Amass, Assetfinder, chaos-client, Findomain, Sublist3r |
| HTTP probing | httpx, aquatone, gowitness, EyeWitness |
| URL discovery | waybackurls, gau, katana, hakrawler, gospider |
| Port scanning | Nmap, Masscan, RustScan, naabu |
| Content discovery | ffuf, gobuster, feroxbuster, dirsearch |
| Email hunting | Hunter.io, theHarvester, phonebook.cz, Clearbit, Skymem |
| Username hunting | Sherlock, WhatsMyName, Maigret, Namechk, Holehe |
| Image search | Google Lens, Yandex, TinEye, PimEyes |
| Metadata | exiftool, FOCA, metagoofil, mat2 |
| Phone | PhoneInfoga |
| Breach | HaveIBeenPwned, Dehashed, h8mail, IntelX |
| Geolocation | SunCalc, Overpass Turbo, Mapillary, GeoGuessr GPT |
| Visualization | Maltego, Gephi, yEd |
| IoT | Shodan, Censys, Thingful, Kamerka |
| Dark web | IntelX, DarkOwl, Ahmia |
| Continuous monitoring | SpiderFoot, Recon-ng, custom crons, ShadowDragon |
Commercial platforms
The surveys repeatedly reference a cluster of commercial OSINT/threat intel platforms for enterprise use: Maltego, ShadowDragon, Recorded Future, Intel 471, Flashpoint, CloudSEK XVigil, ZeroFox, DarkOwl, SpiderFoot HX, Babel Street, Dataminr, Palantir Gotham. These bundle data access, analyst tooling, and curated feeds at cost. Free alternatives exist for most individual capabilities; the commercial value is integration, freshness, and support.
15. Automation & Visualization
Manual OSINT is unsustainable past a few targets. Automation and visualization amplify the analyst.
Automation patterns
- Scripts orchestrating free tools β a shell script that runs subfinder β httpx β nuclei β slack notify gives continuous monitoring on a cron
- Recon-ng workspaces β persistent state across sessions
- SpiderFoot scans β scheduled or triggered by webhook
- Custom Python pipelines β
requests,beautifulsoup, platform APIs,networkxfor graphs - Jupyter notebooks β for exploratory analysis with inline visualization
One practitioner-authored pipeline (ODIN) strings together WHOIS, reverse WHOIS, subdomain discovery, DNS records, Shodan, RDAP, email harvesting, breach lookups, paste searches, and bucket hunting into a single run against a target name and primary domain, producing a structured report. The underlying techniques are the ones in this guide; the automation just glues them together.
Visualization
Human eyes excel at spotting patterns in graphs that are invisible in tables.
- Maltego β the reference tool for investigative link analysis
- Gephi β open-source network visualization for large graphs
- yEd β free diagramming with auto-layout for medium graphs
- Neo4j β graph database for queryable link analysis at scale
- D3.js / vis.js / cytoscape.js β web-based custom visualizations
- Kibana / Grafana β dashboards for continuous OSINT feeds
16. Cloud & Modern Infrastructure Intelligence
Modern infrastructure spans multiple cloud providers, microservices architectures, and API ecosystems. Traditional network reconnaissance must evolve to address containerized applications, serverless functions, and cloud-native technologies.
Cloud Asset Discovery 2026
Multi-Cloud Enumeration:
- AWS β S3 buckets, CloudFront distributions, API Gateway endpoints, Lambda function URLs
- Azure β Blob storage, Azure Functions, Application Gateway, API Management
- Google Cloud β Cloud Storage buckets, Cloud Functions, Cloud Run services, API Gateway
- Specialized clouds β DigitalOcean Spaces, Vultr Object Storage, Linode buckets
Advanced Cloud OSINT Techniques:
- DNS CNAME analysis β cloud service providers reveal architecture through DNS records
- TLS certificate enumeration β cloud load balancer certificates expose internal service names
- API endpoint discovery β GraphQL introspection, REST API documentation leaks
- Container registry scanning β public Docker Hub, Quay.io, ECR repositories
Tools for Cloud Intelligence:
- cloud_enum β multi-cloud asset discovery tool
- CloudMapper β AWS security analysis and visualization
- ScoutSuite β multi-cloud security auditing
- Pacu β AWS exploitation framework (for authorized testing)
- Prowler β cloud security best practices scanner
API & Microservices Intelligence
Modern applications expose intelligence through API endpoints, often with insufficient access controls.
API Discovery Methods:
- Documentation leaks β Swagger/OpenAPI specs in public repositories
- GraphQL introspection β enabled by default in many implementations
- REST API enumeration β common endpoints, version discovery
- Webhook analysis β third-party integrations reveal internal architecture
- Mobile app reverse engineering β APK/IPA analysis for API endpoints
API Testing for OSINT:
## GraphQL introspection
curl -X POST -H "Content-Type: application/json" \
-d '{"query":"query IntrospectionQuery { __schema { types { name } } }"}' \
https://target.com/graphql
## REST API discovery
ffuf -w api-wordlist.txt -u https://target.com/api/FUZZ
## Swagger documentation discovery
curl https://target.com/swagger.json
curl https://target.com/v1/swagger.json
curl https://target.com/api/docs
Container & DevOps Intelligence
Containerized applications leave traces across registries, orchestration platforms, and CI/CD systems.
Container Registry Analysis:
- Docker Hub β public repositories reveal internal project names and configurations
- Quay.io β Red Hat’s container registry
- GitHub Container Registry β packages linked to repositories
- ECR/ACR/GCR β cloud-specific registries sometimes publicly accessible
DevOps Pipeline Intelligence:
- CI/CD artifacts β build logs, deployment scripts, environment variables
- Infrastructure as Code β Terraform, CloudFormation, Kubernetes manifests
- Secret management β HashiCorp Vault, AWS Secrets Manager exposure
- Monitoring endpoints β Prometheus, Grafana, ELK stack dashboards
Kubernetes OSINT:
- Service discovery β DNS enumeration for cluster services
- Ingress analysis β public-facing service mapping
- ConfigMap/Secret enumeration β exposed configuration data
- Pod security context analysis β privilege escalation vectors
17. Blockchain & Financial Intelligence 2026
Cryptocurrency and blockchain technology have matured into critical intelligence domains. Modern financial investigations require understanding of DeFi protocols, NFT ecosystems, and privacy-preserving cryptocurrencies.
Blockchain Analysis Fundamentals
Core Concepts:
- Address clustering β grouping addresses controlled by the same entity
- Transaction flow analysis β following funds through multiple transactions
- Exchange attribution β identifying centralized exchange addresses
- Mixing service detection β identifying privacy-enhancing transaction patterns
- Smart contract analysis β understanding DeFi protocol interactions
Major Blockchain Analysis Platforms:
- Chainalysis β professional blockchain analytics (law enforcement/compliance focus)
- Elliptic β crypto compliance and investigation platform
- CipherTrace β cryptocurrency AML and investigation tools
- Crystal Blockchain β transaction monitoring and investigation
- OXT β Bitcoin transaction analysis and privacy research
DeFi & Smart Contract Intelligence
Decentralized Finance (DeFi) protocols create complex financial relationships visible on-chain but requiring specialized analysis.
DeFi Investigation Techniques:
- Liquidity pool analysis β tracking funds in Uniswap, SushiSwap, Curve pools
- Yield farming tracking β following assets through lending protocols
- DAO governance participation β voting patterns reveal stakeholder relationships
- Flash loan analysis β identifying sophisticated financial attacks
- Cross-chain bridge monitoring β tracking assets between blockchains
Smart Contract OSINT:
// Contract verification on Etherscan reveals:
// - Source code and comments
// - Constructor parameters
// - Transaction history
// - Token transfers and interactions
Tools for DeFi Analysis:
- Dune Analytics β blockchain data queries and dashboards
- Nansen β on-chain analytics with entity labeling
- DeBank β DeFi portfolio and protocol tracking
- Zerion β DeFi asset tracking across protocols
- Token Sniffer β smart contract security analysis
Privacy Coin & Mixer Analysis
Privacy-focused cryptocurrencies and mixing services require specialized investigation techniques.
Privacy Coin Challenges:
- Monero (XMR) β ring signatures, stealth addresses, RingCT
- Zcash (ZEC) β zk-SNARKs, shielded transactions
- Dash β CoinJoin implementation, masternodes
- Beam/Grin β Mimblewimble protocol
Mixer & Tumbler Analysis:
- Bitcoin mixers β CoinJoin, Wasabi Wallet, Samourai Whirlpool
- Ethereum mixers β Tornado Cash, Aztec Protocol
- Cross-chain mixers β THORChain, Secret Network bridges
- Pattern analysis β timing, amounts, address reuse
Investigation Techniques:
- Input/output analysis β correlating mixer inputs with outputs
- Timing correlation β deposit/withdrawal pattern analysis
- Amount correlation β unique transaction amounts through mixers
- Change address analysis β non-mixed outputs reveal identity
- Network analysis β IP addresses, Tor exit nodes
Cryptocurrency Threat Intelligence
Blockchain data provides unique intelligence for cybercrime investigation and threat actor tracking.
Ransomware Payment Tracking:
- Payment address monitoring β tracking ransom payments to known groups
- Infrastructure correlation β linking payment addresses to command & control
- Attribution through payments β identifying affiliates and infrastructure providers
- Recovery operations β coordinating with exchanges for asset freezing
Scam & Fraud Detection:
- Ponzi scheme patterns β payment structures reveal fraudulent operations
- Exit scam prediction β liquidity drainage patterns in DeFi protocols
- Phishing address monitoring β detecting credential theft operations
- Social engineering campaigns β crypto-based advance fee fraud
18. AI-Assisted OSINT 2026
AI integration has fundamentally transformed OSINT workflows in 2026. Modern practitioners leverage large language models, computer vision, and automated analysis pipelines to process vast amounts of data while maintaining human verification standards.
Where AI helps
- Image analysis β a multimodal model can enumerate visible clues (signage, architecture, vegetation, vehicles) and propose geolocations in seconds
- Document summarization β long PDFs, financial filings, court documents
- Translation and transliteration β foreign-language sources at scale
- Link extraction β pulling structured entities (names, dates, orgs) from unstructured text
- Writing style analysis β comparing two corpora for likely authorship
- Code understanding β interpreting obfuscated JS, reverse engineering APIs
- Query generation β proposing Google dorks, Shodan filters, or Censys queries from natural-language intent
Where AI fails
- Hallucinated facts β models confidently fabricate names, dates, and attributions
- Stale training data β nothing past the cutoff
- Confirmation bias β will happily pretend to “find” what you ask for
- Source attribution β outputs typically lack provenance
The rule: AI outputs are hypotheses. Every claim must be independently verified against a primary source before it enters a deliverable.
Specific tools and workflows
- GeoGuessr GPT and similar custom GPTs β image geolocation first-pass
- ChatGPT / Claude with vision β general image and document analysis
- Recon agents β emerging autonomous agents that chain passive recon tools (early stage; reliability is poor)
- AI-powered dark web monitoring β vendors offering semantic search over crawled forum content
- AI entity extraction (IBM Watson NLU, spaCy, transformer-based NER) β scalable entity extraction from corpora
2026 AI Integration Advances
Multimodal Analysis Workflows:
- Image geolocation with custom GPTs β tools like GeoGuessr GPT synthesize architectural, vegetation, and signage cues to propose locations
- Video content analysis β frame-by-frame analysis for facial recognition, scene understanding, and temporal pattern detection
- Audio processing β voice identification, accent analysis, background noise geolocation
- Document intelligence β automated extraction of entities, relationships, and anomalies from large document corpuses
AI-Powered Correlation Engines:
- Cross-platform entity linking β automated identification of the same person across multiple social platforms
- Behavioral pattern recognition β identifying sockpuppet accounts through writing style and interaction patterns
- Network analysis β AI-driven identification of influence networks and coordination patterns
- Threat actor attribution β correlating tactics, techniques, and procedures across campaigns
Commercial AI-OSINT Platforms (2026):
- Maltego AI Transforms β natural language queries converted to graph operations
- SpiderFoot AI β automated correlation and anomaly detection across 300+ data sources
- ShadowDragon AI β behavioral analysis and entity resolution across social platforms
- Recorded Future NLP β threat intelligence extraction from unstructured sources
- Intel 471 AI β dark web content analysis and threat actor tracking
Emerging AI-OSINT Tools:
## AI-powered subdomain discovery
chaos-recon -d example.com --ai-analysis
## LLM-assisted Google dorking
osint-gpt "find exposed documents for [company]"
## AI image analysis for GEOINT
geolocation-ai image.jpg --confidence-threshold 0.8
## Automated social media correlation
socmint-correlator --target "john_doe" --platforms all --ai-clustering
AI Ethics & Verification Standards
Verification Protocols:
- Primary source confirmation β every AI-generated lead must be verified against original sources
- Confidence scoring β assign reliability scores to AI outputs (1-10 scale)
- Human-in-the-loop validation β critical decisions require human analyst approval
- Audit trails β document AI tool usage and decision points for legal proceedings
- Bias awareness β understand training data limitations and cultural biases
AI Failure Modes in OSINT:
- Hallucination validation β cross-reference AI claims with multiple independent sources
- Temporal accuracy β verify information currency, especially for rapidly changing situations
- Cultural context β AI may misinterpret region-specific social cues and communication patterns
- Privacy boundary confusion β AI may not distinguish between public and private information appropriately
19. Anti-Detection & Privacy Evasion
Modern targets employ sophisticated counter-surveillance measures. OSINT practitioners must understand both detection mechanisms and evasion techniques to maintain operational security while gathering intelligence.
Attribution Avoidance 2026
Advanced Browser Fingerprinting Defenses:
- Canvas fingerprint randomization β tools like FingerprintSwitcher alter canvas rendering
- WebGL spoofing β GPU fingerprint modification to avoid device identification
- Font enumeration protection β limiting font list exposure to reduce uniqueness
- Screen resolution spoofing β randomizing reported screen dimensions
- Timezone manipulation β masking location through timezone randomization
Network-Level Anti-Detection:
- Residential proxy networks β services like Bright Data, Oxylabs for legitimate IP rotation
- Mobile carrier proxies β 4G/5G connections to simulate mobile device access
- Tor with additional layers β VPN β Tor β VPN configurations for maximum anonymity
- DNS over HTTPS/TLS β encrypted DNS to prevent ISP monitoring
- Traffic pattern normalization β human-like timing and interaction patterns
Platform-Specific Evasion:
| Platform | Detection Method | Evasion Technique |
|---|---|---|
| Login pattern analysis | Gradual engagement, authentic session times | |
| Device fingerprinting | Mobile app simulation, varied access patterns | |
| API rate limiting | Multiple authenticated accounts, distributed collection | |
| Twitter/X | Behavioral analysis | Human-like interaction patterns, content engagement |
| TikTok | Device binding | Mobile emulation, app store download simulation |
Counter-Intelligence Awareness
Indicators of Target Awareness:
- Honeypot content β deliberately planted false information to detect collection
- Access pattern changes β sudden privacy setting modifications across platforms
- Canary tokens β embedded tracking pixels in documents or profiles
- Legal threats β cease and desist letters indicating detected investigation
- Technical countermeasures β IP blocking, CAPTCHA implementation, rate limiting
Operational Security Failures:
- Account linking β using same recovery email across sock puppet accounts
- Timing correlation β consistent daily access patterns revealing timezone
- Payment attribution β subscription payments linking to real identity
- Social graph exposure β accidentally connecting sock puppet to real social network
- Metadata leakage β device fingerprints, location data in uploaded content
20. Continuous Monitoring & Threat Hunting
Passive one-time collection has evolved into continuous intelligence operations. Modern OSINT practitioners implement persistent monitoring systems that detect changes and emerging threats automatically.
Automated Collection Pipelines
Infrastructure Monitoring:
- Subdomain discovery automation β daily runs of subfinder, amass, and certificate transparency monitoring
- Port scan automation β scheduled Nmap/Masscan against discovered assets
- Web application monitoring β httpx probing with screenshot capture for visual changes
- DNS monitoring β tracking record changes, new subdomains, certificate updates
- Cloud asset monitoring β S3 bucket enumeration, cloud storage discovery
Social Media Monitoring Frameworks:
## Automated social media monitoring
socialscan-monitor --target "company_name" \
--platforms twitter,linkedin,instagram,tiktok \
--keywords "data breach,security incident,insider threat" \
--alert-webhook https://alerts.company.com/webhook
## Continuous username monitoring
sherlock-monitor --usernames user_list.txt \
--new-platforms-only \
--notification slack://webhook_url
## Brand mention tracking
mention-tracker --brand "AcmeCorp" \
--sentiment-analysis \
--geographic-clustering \
--alert-threshold negative
Dark Web & Breach Monitoring:
- Paste site monitoring β automated scanning of Pastebin, Ghostbin, Hastebin
- Dark web forum tracking β monitoring threat actor forums for organization mentions
- Credential monitoring β automated breach database queries for employee emails
- Ransomware tracking β monitoring leak sites for organization data
- Marketplace surveillance β tracking sale of organizational data or access
Threat Intelligence Integration
MISP Integration for OSINT:
## OSINT β MISP integration example
import pymisp
def create_osint_event(domain, findings):
misp = pymisp.PyMISP(misp_url, misp_key)
event = misp.new_event(
distribution=1, # Organization only
threat_level_id=3, # Medium
analysis=1, # Initial
info=f"OSINT findings for {domain}"
)
## Add discovered subdomains as attributes
for subdomain in findings['subdomains']:
misp.add_attribute(
event,
type='hostname',
value=subdomain,
comment="Discovered via automated OSINT pipeline"
)
return event
Automated Correlation & Analysis:
- IOC enrichment β automatic lookup of discovered indicators in threat intelligence feeds
- Attribution scoring β machine learning models for threat actor correlation
- Campaign tracking β linking infrastructure across multiple investigations
- Predictive analysis β identifying likely future targets based on infrastructure patterns
Continuous OSINT Operations
Operational Frameworks:
- Collection automation β scheduled data gathering from all configured sources
- Processing pipelines β normalize and deduplicate collected intelligence
- Analysis automation β ML-driven pattern recognition and anomaly detection
- Alerting systems β configurable notifications for high-priority findings
- Response integration β automatic ticket creation and team notifications
Metrics & KPIs for OSINT Programs:
- Coverage metrics β percentage of digital footprint under monitoring
- Detection time β time from exposure to discovery and alerting
- False positive rates β accuracy of automated detection systems
- Attribution confidence β reliability of threat actor identification
- Response time β speed of investigation team response to alerts
21. Operational Security
OSINT is only passive if you do it right. Sloppy operators leak as much as they collect. Whether you are a defender running recon on your own company, an investigator looking into hostile actors, or a researcher probing sensitive communities, the target should never learn you were looking.
Attribution risks
- IP address β the target’s analytics and logs capture it
- User-Agent β fingerprints browser, OS, sometimes tool
- Account identity β logging into LinkedIn to view a profile attaches your real name
- Cookies / localStorage β cross-session tracking
- Referer headers β leaks where you clicked from
- DNS lookups β your ISP sees every domain you resolve
- Browser fingerprint β canvas, fonts, screen size, timezone
- TLS JA3/JA4 β tooling-specific TLS fingerprints
- Timing patterns β your working hours reveal your timezone
Layered defenses
- Dedicated investigation VM β never mix with personal or work browsing. Keep it disposable (snapshots, revert after every engagement).
- Separate OS profile or container β at minimum, a segregated browser profile
- VPN or residential proxy β Mullvad, IVPN, Proton VPN, or a commercial residential proxy for sensitive investigations. Know the provider’s logging policy.
- Tor β for the most sensitive operations and dark-web access. Never log into personal accounts over Tor.
- Burner accounts β sock puppets with their own email, phone (VoIP or burner SIM), aged over time, with plausible background activity
- Hardened browser β Firefox with resist fingerprinting, uBlock Origin, Cookie AutoDelete, NoScript; or Tor Browser; or Brave with strict settings
- Screenshot and archive tools with opsec-safe settings β Hunchly is purpose-built for investigators and captures every page automatically, with hash verification
- Separate phone / hardware β for investigations where device fingerprinting matters
- No personal accounts, ever β a single Google login while “just checking something” burns the entire persona
Sock puppet hygiene
- Create accounts well in advance; aged accounts draw less suspicion
- Use non-obvious names; avoid giveaway patterns (sequential usernames, shared avatars)
- Build plausible activity: followers, posts, reactions over weeks or months
- Different sock for different investigations β compartmentalize
- Record credentials and backstory in a secure, central store
- Never cross-contaminate between sock, work, and personal identities
- Accept that sock puppets burn β plan for rotation
Hunchly and investigation capture
Hunchly is one of the few tools in the space purpose-built for investigative OSINT capture. It records every page an investigator visits, preserving exact HTML, screenshots, hashes, and a searchable case database. This solves two perennial problems: (1) reproducibility β you can demonstrate exactly what was on the page when you looked, and (2) note-keeping β the tool captures in real time instead of after the fact. For any investigation that may be scrutinized (legal, regulatory, publication), capture-by-default tooling is essential.
Safe data handling
- Treat collected PII as sensitive from the moment it arrives
- Encrypt investigation data at rest
- Scrub workstations between engagements if commingling is a risk
- Understand your deliverable’s exposure β who will see this report, and does it contain information that could re-identify protected sources?
- Observe retention limits β delete when no longer needed
22. Legal & Ethical Considerations
OSINT is legal in broad strokes but varied in detail, and ethical only when practiced with judgment.
Legal surface area
- Computer Fraud and Abuse Act (US) and similar β unauthorized access laws. Passive consumption of public data is safe; active probing without authorization is not.
- GDPR (EU) β applies to processing personal data of EU residents. Investigators must have a lawful basis; “legitimate interest” often applies but must be documented.
- CFAA precedent β scraping public data from websites is generally legal (hiQ v. LinkedIn and progeny), but terms-of-service violations can create civil exposure.
- Platform ToS β scraping LinkedIn, Facebook, Instagram commonly violates ToS even if legal. Accounts can be banned; repeat offenders can face lawsuits.
- Anti-stalking and harassment laws β aggregating public data about an individual can become unlawful harassment depending on intent and jurisdiction.
- Breach data handling β possessing breach data is often legal, but further use (extorting victims, publishing PII) is not.
- Export controls β some OSINT tooling is regulated under dual-use export regimes.
Ethical guardrails
- Purpose test β can you articulate why you need this intelligence and who benefits?
- Proportionality test β is the depth of collection proportional to the stakes?
- Harm test β could publishing this information enable stalking, doxing, or physical harm?
- Consent test β would the subject reasonably expect this information to be collected and used this way?
- Transparency test β could you defend your methodology openly if challenged?
Investigators routinely face situations where the legal answer and the ethical answer diverge. A finding that is legal to discover may be unethical to publish. A technique that is clearly ethical may be restricted by platform ToS. Practitioners who survive long-term in the field develop judgment, not just skills.
Defender’s perspective
Defenders using these techniques against their own organization are on firm legal ground β you have implicit authorization over your own assets. The real risks are:
- Accidentally probing a third party β vendors, customers, partners, lookalike domains
- Storing personal data of employees β even collected from public sources, it falls under privacy law
- Tipping off attackers β noisy recon against your own infrastructure can alert adversaries that you are looking
23. Quick Reference
The five-minute external exposure check
Run this on your own domain periodically:
## Subdomains
subfinder -d example.com -all -silent | tee subs.txt
cat subs.txt | dnsx -silent | tee live.txt
cat live.txt | httpx -silent -title -tech-detect -status-code
## Certificates
curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sort -u
## Shodan
shodan search "ssl:example.com" --fields ip_str,port,product,version
## Historical URLs
echo "example.com" | waybackurls | sort -u
## Leaked secrets on GitHub
## Manual: https://github.com/search?q=%22example.com%22+password&type=code
Seed-to-report pivot map
NAME βββ¬βββΆ search engines βββΆ Wikipedia, personal sites
ββββΆ LinkedIn βββΆ employer, history
ββββΆ socials βββΆ aliases βββΆ Sherlock βββΆ more platforms
ββββΆ images βββΆ reverse search βββΆ more accounts
EMAIL ββ¬βββΆ HIBP βββΆ breach list βββΆ services used
ββββΆ Hunter.io βββΆ company patterns βββΆ more employees
ββββΆ Gravatar βββΆ profile image
ββββΆ Google βββΆ forum posts, paste hits
ββββΆ historical WHOIS βββΆ owned domains
DOMAIN β¬βββΆ crt.sh βββΆ subdomains
ββββΆ subfinder/amass βββΆ more subdomains
ββββΆ whoxy βββΆ reverse WHOIS βββΆ related domains
ββββΆ Shodan hostname: βββΆ services
ββββΆ DNS βββΆ MX/TXT βββΆ vendors
ββββΆ wayback βββΆ historical endpoints
IP βββββ¬βββΆ Shodan/Censys βββΆ services, vulns
ββββΆ RDAP βββΆ owner, netblock
ββββΆ reverse DNS βββΆ hostnames
ββββΆ bgp.he.net βββΆ ASN βββΆ more IPs
IMAGE ββ¬βββΆ Google Lens/Yandex/TinEye βββΆ source
ββββΆ exiftool βββΆ GPS, camera, timestamp
ββββΆ AI analysis βββΆ location hypothesis
ββββΆ visual clues βββΆ landmark/sign/architecture
Common Google dorks
site:target.com filetype:pdf
site:target.com ext:doc OR ext:docx OR ext:xls OR ext:xlsx
site:target.com inurl:admin
site:target.com intitle:"index of"
site:target.com "password" OR "confidential"
site:github.com "target.com"
site:linkedin.com/in "Target Corp"
site:pastebin.com "target.com"
site:s3.amazonaws.com target
"@target.com"
intext:"@target.com" site:pastebin.com
Common Shodan queries
hostname:target.com
ssl:"target.com"
org:"Target Corp"
port:3389 country:US org:"Target Corp"
http.title:"Login" hostname:target.com
product:"nginx" version:"1.18.0" hostname:target.com
vuln:CVE-2023-1234
has_screenshot:true port:5900
category:ics country:US
Checklist: before declaring recon complete
- All known domains and subdomains enumerated from at least three passive sources
- Certificate transparency logs checked for last 90 days
- Historical WHOIS reviewed for original/hidden contact data
- Wayback Machine checked for historical endpoints and scrubbed content
- Shodan and Censys both queried for hostname and org
- Cloud bucket namespaces checked (S3, Spaces, Azure, GCS)
- GitHub/GitLab/Bitbucket searched for leaked secrets and configs
- Employee emails and usernames harvested
- Key employees’ breach exposure checked
- Metadata extracted from published documents
- DNS records analyzed for third-party vendors (SPF/MX/CNAME)
- Dangling DNS records screened for takeover potential
- All findings documented with source URL, timestamp, and confidence level
- Raw artifacts archived separately from analysis notes
- Opsec review: no personal accounts touched, no direct target interaction beyond what’s documented
Closing notes
OSINT rewards patience and punishes shortcuts. The tools listed here will all be different in two years β platforms will lock down, APIs will change, services will die, and new ones will appear. What persists is the methodology: ask a clear question, collect broadly, process rigorously, analyze honestly, cite meticulously, and protect the investigation from blowback. Every identifier is a pivot. Every fact needs a source. Every finding needs a second source.
The defender’s version of this guide is the same document read sideways: every technique an attacker can use to map your external footprint is a technique you should be running against yourself, on a schedule, with alerts. The asymmetry between attackers and defenders collapses when defenders start doing their own OSINT first.