Comprehensive OSINT Guide

Complete OSINT guide with 412 insights enhanced with 2026 AI-assisted intelligence gathering including blockchain analysis, enhanced social media techniques, and modern OSINT automation. Comprehensive operational security and methodology.

May 2, 2026 · 43 min · Carl Sampson

Table of Contents

Comprehensive OSINT Guide
Table of Contents
1. Fundamentals
2. The OSINT Lifecycle
- Phase 1: Planning & Requirements
- Phase 2: Collection
- Phase 3: Processing
- Phase 4: Analysis
- Phase 5: Dissemination
- Phase 6: Feedback
3. People OSINT (HUMINT/SOCMINT)
- Starting identifiers
- The Maltego-style pivot graph
- Username enumeration
- Email enumeration and validation
- Phone numbers
- Reverse image search
4. Company & Corporate OSINT
- Corporate identity sources
- Subsidiary and domain discovery
- Employee enumeration
- Technology fingerprinting
5. Infrastructure & Network OSINT
- IP space and ASN
- Search engines for infrastructure
- Cloud asset discovery
- Historical data
6. Domain, DNS & Certificate Intel
- Subdomain enumeration
- Active validation
- DNS record mining
- WHOIS
7. Social Media Intelligence
- Platform-by-platform quick reference
- Techniques
- Tools
8. 2026 Enhanced Social Media Intelligence
- TikTok Intelligence & Short-Form Video Analysis
- Discord & Gaming Platform Intelligence
- Telegram Intelligence & Channel Analysis
- Deepfake & Synthetic Media Detection
- Anti-Detection & Privacy-Aware Collection
9. Geolocation & Imagery (GEOINT)
- Classic technique stack
- Reverse image search
- AI-assisted geolocation
- Mapping and imagery sources
- Video and live stream OSINT
- EXIF and video metadata
9. Breach, Leak & Paste Intel
- Breach lookup services
- Operational notes
- Paste sites and dumps
10. Metadata Extraction
- Document metadata
- Google dorks for document hunts
- What metadata reveals
11. Code & Repository OSINT
- GitHub search techniques
- Tools
- Pivoting from a single repo
- Beyond GitHub
12. Dark Web & Threat Intel
- Platforms
- Threat intel feeds
13. IoT & Device Discovery
24. Tools Reference
- Frameworks and aggregators
- Maltego
- SpiderFoot
- theHarvester
- recon-ng
- Sherlock
- Shodan
- Censys
- Specialized tools referenced across the surveys
- Commercial platforms
15. Automation & Visualization
- Automation patterns
- Visualization
16. Cloud & Modern Infrastructure Intelligence
- Cloud Asset Discovery 2026
- API & Microservices Intelligence
- Container & DevOps Intelligence
17. Blockchain & Financial Intelligence 2026
- Blockchain Analysis Fundamentals
- DeFi & Smart Contract Intelligence
- Privacy Coin & Mixer Analysis
- Cryptocurrency Threat Intelligence
18. AI-Assisted OSINT 2026
- Where AI helps
- Where AI fails
- Specific tools and workflows
- 2026 AI Integration Advances
- AI Ethics & Verification Standards
19. Anti-Detection & Privacy Evasion
- Attribution Avoidance 2026
- Counter-Intelligence Awareness
20. Continuous Monitoring & Threat Hunting
- Automated Collection Pipelines
- Threat Intelligence Integration
- Continuous OSINT Operations
21. Operational Security
- Attribution risks
- Layered defenses
- Sock puppet hygiene
- Hunchly and investigation capture
- Safe data handling
22. Legal & Ethical Considerations
- Legal surface area
- Ethical guardrails
- Defender’s perspective
23. Quick Reference
- The five-minute external exposure check
- Seed-to-report pivot map
- Common Google dorks
- Common Shodan queries
- Checklist: before declaring recon complete
Closing notes

Comprehensive OSINT Guide

🆕 Enhanced May 2, 2026 - Updated with AI-assisted intelligence and blockchain analysis including enhanced social media techniques and modern OSINT automation from comprehensive 2026 research.

A practitioner’s reference for Open Source Intelligence — methodology, collection disciplines, tooling, pivoting techniques, and operational security. Enhanced with 2026 AI-assisted techniques and emerging platform intelligence. Compiled from 200+ research sources and enhanced through automated analysis of current OSINT developments.

Fundamentals
The OSINT Lifecycle
People OSINT (HUMINT/SOCMINT)
Company & Corporate OSINT
Infrastructure & Network OSINT
Domain, DNS & Certificate Intel
Social Media Intelligence
2026 Enhanced Social Media Intelligence
Geolocation & Imagery (GEOINT)
Breach, Leak & Paste Intel
Metadata Extraction
Code & Repository OSINT
Dark Web & Threat Intel
IoT & Device Discovery
Automation & Visualization
Cloud & Modern Infrastructure Intelligence
Blockchain & Financial Intelligence 2026
AI-Assisted OSINT 2026
Anti-Detection & Privacy Evasion
Continuous Monitoring & Threat Hunting
Operational Security
Legal & Ethical Considerations
Quick Reference
Tools Reference

1. Fundamentals

Open Source Intelligence (OSINT) is the discipline of collecting, correlating, and analyzing information that is publicly or legally available to produce actionable intelligence. “Open source” does not mean “easy” or “low value” — it means no clandestine collection is involved. The sources are lawful: the skill lies in knowing where to look, how to pivot, and how to assemble fragments into a coherent picture.

Why it matters:

Use case	Practitioners
Adversary reconnaissance	Red teams, pentesters, bug bounty hunters
Attack surface management	Blue teams, security engineers, CISOs
Threat intelligence	SOC analysts, CTI teams, IR responders
Fraud and KYC investigation	Financial crime analysts, compliance
Journalism and research	Investigative reporters, academic researchers
Law enforcement	Missing persons, criminal investigations
Due diligence	M&A, investor research, hiring
Personal self-defense	Privacy audits, stalker detection

Core principles:

Every fact is a pivot. An email address is not an endpoint — it is a seed for breach lookups, social profile enumeration, domain registrations, and search engine dorks.
Triangulate before trusting. Any single source can be wrong, stale, or planted. Cross-reference at least two independent sources before treating a data point as confirmed.
Document as you go. If you cannot reproduce a finding in six months, it did not happen. Screenshot, hash, archive.
Stay passive until you must be active. The default mode is observation. Only escalate to direct interaction when the intelligence you need cannot be harvested from existing public records.
Scope creep kills investigations. Define the question up front and resist chasing shiny tangents unless they directly serve the objective.

Passive vs. active collection:

	Passive	Active
Contact with target	None — consult third-party data only	Direct queries against target infrastructure
Detection risk	Near zero	Logs, rate limits, WAF alerts
Data freshness	Can be stale (days to years)	Real-time
Examples	crt.sh, Shodan, archive.org, Google dorks	Nmap scan, directory brute-force, HTTP probing
When used	Always first; enumerate scope and context	After passive exhausted, to confirm/expand

The cardinal rule: finish passive recon before touching the target. Anything you can learn from Censys or certificate transparency logs is something you do not need to poke a production server for.

2. The OSINT Lifecycle

Every investigation, whether a two-hour recon sprint or a month-long deep-dive, follows the same phases. Discipline here separates practitioners from tourists.

Phase 1: Planning & Requirements

Write down the question. Who is the target? What decisions will the intelligence support? What is in scope, and what is off-limits? What is the deadline? What format does the deliverable take? Investigations without a defined question wander forever.

Phase 2: Collection

Gather raw data from identified sources. The temptation is to start here — resist it until the plan is clear. Collection spans subdomains, WHOIS records, social profiles, PDF metadata, breach dumps, code repos, certificate logs, historical archives, and more. Keep raw artifacts separate from processed notes.

Phase 3: Processing

Normalize the data. Dedupe subdomains, resolve hostnames to IPs, extract EXIF from images, parse PDFs for authors. Convert everything into a form you can query and pivot against. A messy collection phase dies here.

Phase 4: Analysis

Turn data into intelligence. Correlate findings across sources: the email on the WHOIS record matches the Gravatar on GitHub, which matches a LinkedIn photo, which matches a conference speaker bio. Link analysis tools (Maltego, spreadsheets, link graphs) help surface non-obvious connections.

Phase 5: Dissemination

Deliver findings in the format the consumer expects. A bug bounty report, a pentest recon appendix, an executive briefing, a due diligence memo. Include provenance for every claim — where it came from, when it was collected, and how confident you are.

Phase 6: Feedback

Does the intelligence answer the question? What was missed? What should have been collected sooner? Feed lessons back into Phase 1 for the next engagement.

3. People OSINT (HUMINT/SOCMINT)

People investigations map a target’s digital footprint: identifiers, aliases, affiliations, locations, and relationships. The process is iterative — each fact opens new pivots.

Starting identifiers

Seed	Immediate pivots
Full name	Search engines, LinkedIn, Wikipedia, voter rolls, academic directories
Email	HaveIBeenPwned, Hunter.io, Gravatar, breach dumps, domain WHOIS, Google
Username	Sherlock, WhatsMyName, Namechk, Maigret
Phone number	PhoneInfoga, truecaller, reverse lookup, Telegram/WhatsApp checks
Profile photo	Reverse image search (Google, Yandex, TinEye, PimEyes)
Employer	LinkedIn, press releases, company filings
Address	Property records, voter rolls, Google Maps Street View

The Maltego-style pivot graph

Treating a person investigation as a graph (nodes = identifiers/entities, edges = “associated with”) prevents losing track of where each fact originated. A typical pivot chain from a Maltego-style workflow:

Name → search page titles, Wikipedia, personal website
Personal website → footer emails, phone numbers, historical WHOIS (DomainTools)
WHOIS email → other domains registered with same email (reverse WHOIS via WhoXY)
Social handles (marc_clotet, marcclotetoficial) → Instagram, Twitter, Facebook profiles
Mutual followers/following → close contacts, private accounts
Affiliated company (mentioned in bio) → corporate registry → officers → other affiliated parties
Historical Hotmail address uncovered via DomainTools → Pipl person search → age, relatives, locations
Phone number → messaging app profile photos, account discovery

Each step is a pivot from a confirmed entity to new related entities. Maltego Transforms automate the individual hops; you can run the same workflow manually with curl, whois, and careful note-taking.

Username enumeration

Most people reuse handles across platforms. Tools that check hundreds of sites in parallel:

Sherlock — Python tool checking 300+ social networks for a username
WhatsMyName — web/CLI tool with a community-maintained JSON list of sites
Maigret — fork of Sherlock with richer profile data extraction
Namechk / KnowEm — brand/username availability checkers repurposed for OSINT

A hit on a niche forum is often worth more than another Twitter account — niche forums surface real interests, writing samples, and contact patterns.

Email enumeration and validation

Hunter.io — finds email addresses by domain, infers patterns (first.last@, flast@), verifies deliverability
Email permutator — generates plausible addresses from a name plus domain
HaveIBeenPwned — reveals which breaches an email appears in (reveals services used)
Gravatar — https://gravatar.com/<md5(email)>.json returns profile if registered
Epieos / Holehe — checks dozens of services for account registration without triggering password reset emails

Phone numbers

PhoneInfoga — country, carrier, line type, breach hits
Truecaller — crowdsourced caller ID; risky (reveals your query to Truecaller)
Messaging apps — adding a number to contacts often reveals the registered profile name and avatar (opsec-heavy; use burner)

Reverse image search

Google Images / Google Lens — best for product, landmark, and Western content
Yandex Images — best for faces and people; still the strongest for facial matches
TinEye — best for finding the original source and earliest occurrence
PimEyes / FaceCheck — facial recognition across the open web (paid, ethically fraught)
Bing Visual Search — decent for products and landmarks

4. Company & Corporate OSINT

Company investigations combine infrastructure recon with corporate filings, personnel mapping, and vendor/technology fingerprinting.

Corporate identity sources

Source	Data
OpenCorporates	Global company registry metadata — officers, addresses, status
SEC EDGAR	US public company filings (10-K, 10-Q, insider transactions)
Companies House (UK)	Officers, filings, beneficial owners
Bureau van Dijk / Orbis	Paid, comprehensive global company intelligence
Dun & Bradstreet	Business credit, corporate family trees
Crunchbase / PitchBook	Funding, investors, board members (paid tiers for depth)
LinkedIn company pages	Headcount, departments, employee list
Full Contact / Clearbit	Enrichment APIs — size, industry, tech stack, key people

Subsidiary and domain discovery

Large companies have sprawling digital footprints. Start from the primary name and expand:

Reverse WHOIS (WhoXY, DomainTools) — find all domains registered to the same name or email. Remember: WhoXY requires exact-string matches, so “Blizzard Entertainment” and “Blizzard Entertainment, Inc” will return different sets.
Trademark search — USPTO, EUIPO filings reveal product codenames and subsidiaries.
Press releases and SEC filings — mention subsidiary names that never appear on the website.
Job postings — often mention internal tool names, cloud providers, and office locations.

Employee enumeration

LinkedIn — site:linkedin.com "Acme Corp" via Google reveals public profiles even without login
Hunter.io / phonebook.cz — bulk email harvest by domain
GitHub — commits by @company.com email addresses expose engineers
Conference talks, CVE credits, paper authorship — lists specialists
RocketReach, Lusha, Apollo — sales tools repurposed for contact discovery (paid)

Technology fingerprinting

Knowing the stack narrows exploitation research later:

BuiltWith / Wappalyzer — web stack detection from rendered HTML and headers
Shodan / Censys — banner grabs reveal server software and versions
DNS records — MX (pphosted.com = Proofpoint), SPF (spf.protection.outlook.com = O365), CNAMEs revealing CDN/CMS
JavaScript bundles — library imports, API endpoints, third-party integrations

5. Infrastructure & Network OSINT

This is where OSINT crosses most directly into security recon. The goal is to enumerate every externally reachable asset and catalog what is running on it — without touching the target.

IP space and ASN

ARIN / RIPE / APNIC / LACNIC / AFRINIC RDAP — WHOIS for IP blocks, netblock ownership
bgp.he.net — AS number lookups, peering relationships, announced prefixes
ipinfo.io / ipdata — enrichment APIs with geoloc, ASN, org
RIPEstat — authoritative routing, abuse contacts, historical data

A company that owns its own ASN signals maturity and gives you a clean IP perimeter. A company entirely on cloud (all AWS/GCP/Azure) means you map their domains back to cloud ranges instead.

Search engines for infrastructure

These are the indispensable tools. None of them touch the target; they query pre-indexed scan data.

Tool	Best for	Free tier
Shodan	Banners, service versions, SCADA/ICS, webcams, IoT, vulnerability filters (`vuln:`)	Limited queries, paid plans unlock filters
Censys	Certificate search, service fingerprinting, precise field queries	250 searches/month
Netlas.io	Domains, IPs, WHOIS, DNS combined; Maltego integration	50 searches/day, 2500 results/month
FOFA	Chinese alternative, strong for APAC infrastructure	Limited
ZoomEye	Another Chinese alternative	Limited
BinaryEdge	Scans, leaked databases, risk scoring	Paid
GreyNoise	Classifies “background noise” IPs to filter scan traffic	Community tier
Hunter.how	Cyberspace search engine	Limited

Shodan query patterns:

hostname:example.com
ip:203.0.113.0/24
port:22 country:US
product:"nginx"
vuln:CVE-2021-44228
org:"Acme Corp"
ssl:"example.com"

Censys field queries:

parsed.names: example.com
services.service_name: "HTTP" and location.country: "United States"
services.tls.certificates.leaf_data.names: "*.example.com"
autonomous_system.asn: 13335

Cloud asset discovery

Most modern targets live in public cloud. Mapping cloud assets:

S3 buckets — use awscli or boto3 (authenticated anonymous checks surface more than unauthenticated HTTP probes). Bucket names need to be globally unique; try company-name variants with common prefixes/suffixes: qa, dev, staging, prod, bak, backup, logs, assets, uat, legacy, internal, public, private, docs.
Digital Ocean Spaces — same API shape as S3, separate namespace to enumerate.
Azure blob storage — <name>.blob.core.windows.net
GCS buckets — <name>.storage.googleapis.com
Firebase databases — <name>.firebaseio.com
Dangling CNAMEs — records pointing to deleted cloud resources are ripe for subdomain takeover. The can-i-take-over-xyz repo catalogs the fingerprints.

Historical data

Wayback Machine (archive.org) — snapshots of old pages, forgotten endpoints, robots.txt evolution, admin panel references
CommonCrawl — bulk web archive suitable for scripted search
SecurityTrails — historical DNS records, WHOIS changes, subdomain discovery
DomainTools — historical WHOIS (the closest thing to a time machine for registration data)
Google Cache — the cached view is gradually being removed, but still useful when present

Historical data is often more valuable than current data. A subdomain that vanished last year may still point at a forgotten S3 bucket. An old WHOIS record may contain an admin’s personal email that was scrubbed from the current record.

6. Domain, DNS & Certificate Intel

Domain-level intel is the connective tissue of infrastructure OSINT.

Subdomain enumeration

Passive sources pull from pre-indexed databases — no traffic to the target:

crt.sh — free, no rate limit, queries Certificate Transparency logs. Every TLS cert issued is logged publicly, so issuing a cert for hr.example.com is enough to discover that subdomain even before it goes live.
Certificate Transparency (transparencyreport.google.com) — Google’s CT aggregator
VirusTotal — passive DNS from submissions
SecurityTrails / DNSDumpster / Netcraft — historical DNS aggregators
Subfinder — orchestrates queries against 30+ passive sources
Amass (intel/enum passive mode) — OWASP’s enumeration framework
Assetfinder — tomnomnom’s lightweight passive finder
chaos-client — ProjectDiscovery’s Chaos dataset

Active enumeration adds brute-force, permutation, and zone walking:

puredns / shuffledns — mass DNS resolution with wildcards filtered
altdns / dnsgen — permutation wordlists from discovered names
dnsrecon — zone transfers, cache snooping
massdns — raw resolver for large lists

Active validation

Once you have a list of candidate names, resolve and probe them:

dnsx — bulk resolution, filtering by record type
httpx — probes live HTTP(S) services, captures titles, tech stacks, status codes
aquatone — visual recon; screenshots and clusters subdomains by similarity
gowitness / eyewitness — alternative screenshotters

DNS record mining

Beyond A/CNAME, records leak intel:

Record	Intel
MX	Email provider (Google, Microsoft, Proofpoint, Mimecast)
TXT	SPF records list third-party senders (Marketo, Salesforce, Zendesk); DKIM selectors hint at tooling
SRV	Exposes specific services (XMPP, SIP, LDAP)
CAA	Allowed certificate authorities
NS	DNS provider (Route53, Cloudflare, NS1)
SOA	Admin email, zone refresh parameters

Weak or missing SPF/DMARC (e.g. v=DMARC1; p=none) signals exploitable email spoofing potential. DKIMValidator is a classic utility for testing DMARC alignment without interacting with the target infrastructure.

WHOIS

Current WHOIS — often redacted under GDPR for individuals, still useful for corporate registrants
Historical WHOIS — DomainTools, WhoisXML API, Whoxy — the unredacted gold
Reverse WHOIS — find all domains sharing a registrant email, name, phone, or organization

SOCMINT is high-signal but high-noise. Treat public social media as a window into a target’s relationships, routines, locations, and interests — and treat closely curated accounts (execs, celebrities) as performative artifacts, not ground truth.

Platform-by-platform quick reference

Platform	High-value intel
LinkedIn	Employer history, skills, internal tool mentions, team maps, location
Twitter/X	Writing style, real-time location, device fingerprints (Twitter for iPhone etc.), interests, connections
Facebook	Family, relationships, check-ins, photos, events, groups
Instagram	Geolocation from photos, friend networks, routines, physical spaces
TikTok	Schedules, location context, behavioral patterns
Reddit	Long-form writing, niche communities, real-world interests
GitHub	Code, commits, emails, working hours, associated accounts
Strava/fitness apps	Routines, home location, military base exposure
Telegram	Phone number → profile → channels joined
Discord	Real-time presence, community affiliations

Techniques

Close contact mapping — mutual followers on Instagram, Facebook friend overlap, Twitter interaction graphs
Temporal analysis — timestamp clusters reveal timezone, sleep schedule, work hours
Linguistic fingerprinting — consistent phrasing across accounts links aliases
Photo OSINT — backgrounds reveal location, device clock shows timezone, reflections leak environment
Story/ephemeral content — archive quickly; gone in 24 hours

Tools

Sherlock / WhatsMyName / Maigret — username across platforms
Osintgram — Instagram enumeration (rate-limited; may violate ToS)
Twint / snscrape — Twitter scraping without API (fragile post-API lockdown)
Social Analyzer — API and CLI for social profile discovery
Social-Searcher — keyword monitoring across platforms
Blackbird — username/email search across 500+ sites
IntelX — indexed social content and leaked data

Note: platform APIs and terms have tightened significantly. Many classic scraping tools are in a state of perpetual repair. Always check recency.

The social media intelligence landscape has evolved dramatically. Modern OSINT practitioners must adapt to new platforms, enhanced privacy controls, AI-generated content detection, and API restrictions. This section covers the latest developments and techniques for 2026.

TikTok Intelligence & Short-Form Video Analysis

TikTok has become a critical intelligence source, particularly for demographic research, trend analysis, and geopolitical monitoring. However, it presents unique challenges due to algorithm-driven content delivery and mobile-first design.

TikTok OSINT Techniques:

Hashtag intelligence — tracking viral hashtags reveals emerging trends, social movements, and coordinated campaigns
Sound/music tracking — original sounds often contain location or identity markers
Duet/collaboration mapping — relationship analysis through video responses
Geofence analysis — location-based content clustering
Temporal analysis — posting patterns reveal timezone and routine information

Tools and Methods:

TikTok Creative Center — official analytics for hashtag and trend analysis
TikTok Ads Library — reveals promoted content and targeting demographics
Browser automation — careful scraping with anti-detection measures
Manual collection — screenshot and archive approach for sensitive investigations

Discord & Gaming Platform Intelligence

Discord has evolved beyond gaming into a primary communication platform for various communities, making it valuable for community mapping and real-time intelligence.

Discord OSINT Vectors:

Server discovery — public server directories reveal community interests
User ID enumeration — Discord IDs are sequential and reveal account creation timing
Webhook monitoring — tracking automated posts from other platforms
Voice channel analysis — real-time conversation monitoring (with consent/legal authorization)
Bot interaction — custom bots can reveal user activity patterns

Operational Considerations:

Discord has strong anti-scraping measures; prefer manual collection
Server invites often expire or are single-use
User privacy settings can hide activity from non-friends

Telegram Intelligence & Channel Analysis

Telegram’s channels and groups provide rich intelligence, particularly for threat research and information operations tracking.

Telegram OSINT Techniques:

Channel subscriber analysis — public member counts and growth tracking
Cross-platform content tracking — identifying content origins and propagation
File metadata analysis — documents and media shared in channels
Forward chain analysis — tracking message propagation across channels
Bot intelligence — automated accounts reveal operational patterns

Tools and Resources:

Telegram Web — browser-based access for collection and archival
Telethon — Python library for programmatic access (requires API credentials)
TGStat — public Telegram analytics platform
Manual archival — screenshot and download approach for legal compliance

Deepfake & Synthetic Media Detection

As AI-generated content proliferates, distinguishing authentic from synthetic media becomes critical for intelligence validation.

Detection Techniques:

Technical analysis — compression artifacts, consistency in lighting/shadows
Behavioral analysis — micro-expressions, blinking patterns, speech patterns
Metadata examination — generation software signatures
Cross-reference verification — comparing against known authentic content
AI-assisted detection — tools like Deepware Scanner, Microsoft Video Authenticator

Red Flags for Synthetic Content:

Inconsistent lighting or shadows across the image/video
Blurring or pixelation around face/hair boundaries
Unnatural eye movement or blinking patterns
Audio-visual synchronization issues
Metadata inconsistencies or missing camera information

Anti-Detection & Privacy-Aware Collection

Modern social platforms employ sophisticated detection mechanisms. Collection methods must evolve to remain effective while respecting platform terms and legal boundaries.

Advanced Anti-Detection Techniques:

Residential proxy rotation — avoiding data center IP detection
Browser fingerprint randomization — Canvas, WebGL, and font fingerprint management
Human-like timing patterns — randomized delays and interaction simulation
Session management — cookie rotation and header consistency
API quota management — distributed collection across multiple authenticated sessions

Platform-Specific Considerations:

Platform	Primary Detection Methods	Recommended Approach
TikTok	Device fingerprinting, behavioral analysis	Mobile emulation, manual collection
Instagram	API rate limiting, IP blocking	Residential proxies, authenticated sessions
Discord	Bot detection, server monitoring	Manual interaction, webhook monitoring
Telegram	API restrictions, flood protection	Official client, rate-limited automation

9. Geolocation & Imagery (GEOINT)

Determining where a photo, video, or person is located from visual evidence.

Classic technique stack

Shadow analysis — sun angle gives latitude and time of day (SunCalc, Suncalc.org)
Landmark identification — monuments, logos, business signage
Language and script — signage language narrows region
Vegetation — tree species and agriculture indicate climate zone
Vehicle makes and license plate formats — country/region disambiguation
Electrical plug shapes and pole construction — power grid standards vary by region
Road markings — lane widths, stripe patterns, sign shapes (MUTCD vs. Vienna Convention)
Architecture — roofing styles, window frames, construction materials

Reverse image search

Run the same image through all of these — coverage varies wildly:

Google Images / Lens
Yandex Images — still the best for Russian/Eastern European and general face matching
TinEye — best for finding originals and earliest occurrences
Bing Visual Search
Baidu — better for Chinese content

AI-assisted geolocation

Modern multimodal models can synthesize the classic technique stack in seconds. The Hackers Arise walkthrough demonstrates using custom GPTs like GeoGuessr GPT for first-pass geolocation: upload an image, ask where it was taken. These models do not do reverse image search — they visually reason over architectural, vegetation, and signage cues. They are often wrong on specifics but provide a valuable starting framework of observations (“the road signs suggest Cyrillic-script Eastern Europe; the utility pole style matches post-Soviet construction”).

Practitioners should treat AI guesses as hypotheses, not conclusions, and verify every claim against ground-truth imagery.

Mapping and imagery sources

Google Maps / Earth Pro — Street View, historical imagery, 3D buildings
Yandex Maps / Mapillary / KartaView — alternative street-level imagery, stronger coverage in some regions
Sentinel Hub / EO Browser — free satellite imagery (Sentinel-2, Landsat)
Planet Labs — commercial high-cadence satellite imagery
OpenStreetMap — community mapping with extractable POI data
Overpass Turbo — query OSM for arbitrary features (e.g. “all churches in this bounding box with a spire over 30m”)
Wikimapia — crowd-sourced photo-annotated POI database

Video and live stream OSINT

EarthCam / Insecam — aggregated public webcams (many unintentional)
Windy.com — live webcams for weather
YouTube geosearch tools — find videos shot within a geographic radius
FlightRadar24 / ADS-B Exchange — real-time civilian aircraft tracking
MarineTraffic / VesselFinder — real-time ship AIS data
RailSense / similar — train tracking by region

EXIF and video metadata

Raw camera files contain GPS coordinates by default unless stripped. Most social platforms strip EXIF, but platforms that preserve it (Flickr, some forums, raw email attachments) can hand-deliver the answer. Tools: exiftool, ExifTool Online, Jeffrey's Image Metadata Viewer.

9. Breach, Leak & Paste Intel

Credentials, PII, and internal data exposed through historical breaches and paste sites are a cornerstone of offensive OSINT.

Breach lookup services

HaveIBeenPwned — free, non-commercial use; reveals which breaches an email appears in. Also exposes pastes containing the email. Pwned Passwords lets you check whether a specific password has been seen in any breach without sending the password (k-anonymity via SHA-1 prefix).
Dehashed — paid, searchable index of actual credential content
IntelligenceX / IntelX — indexed breach and leak content, darknet sources
LeakCheck / Snusbase / LeakPeek — commercial breach databases
Breach-parse / h8mail — local tools for searching personal breach archives

Operational notes

HIBP tells you an email was in Collection #1, but not the password. Commercial services provide the cleartext, if ethically acceptable for your engagement.
“Sensitive” breach flags (Ashley Madison, etc.) require judgment — referencing them in a client deliverable is frequently inappropriate even when technically accurate.
Breach data ages: a password from 2013 is probably not current, but hints at password patterns and reveals services the user has engaged with.
Pastes live and die quickly. If a paste URL 404s, check Google cache and Wayback Machine immediately.

Paste sites and dumps

Pastebin — classic source, still productive
Ghostbin / Hastebin / Rentry / Privatebin — newer alternatives
GitHub Gist — frequently overlooked; indexed by Google (site:gist.github.com)
Telegram channels — many dump channels operate exclusively on Telegram
Darknet forums — BreachForums, XSS, Exploit — require careful opsec

10. Metadata Extraction

Documents, images, and files published by a target frequently leak internal usernames, software versions, file paths, and timestamps.

Document metadata

exiftool — the canonical CLI tool; handles EXIF, XMP, IPTC, PDF metadata, Office documents
FOCA (Fingerprinting Organizations with Collected Archives) — downloads documents from a target domain, extracts metadata in bulk, builds org charts from author fields
metagoofil — FOCA-alike in Python, uses Google/Bing to find documents by filetype on a target domain
PDFiD / peepdf — PDF internals inspection
oletools — OLE/Office document internals
mat2 — metadata anonymization tool; useful for understanding what it strips and therefore what is leaked

Google dorks for document hunts

site:example.com filetype:pdf
site:example.com filetype:xlsx
site:example.com filetype:docx
site:example.com ext:doc OR ext:docx OR ext:xls OR ext:xlsx
site:example.com "for internal use only"

What metadata reveals

Field	Leak
Author	Internal username (often the domain login)
Creation software	Microsoft Office 2016, LibreOffice 7.4 — software inventory
Last modified by	Another internal user
Printer	Printer model and possibly IP
Revision history	Earlier drafts, collaborators
Embedded images	Secondary EXIF data
Hyperlinks	Internal SharePoint/intranet URLs
File paths	`C:\Users\jdoe\Documents\...` reveals username

11. Code & Repository OSINT

Source code hosting platforms are a gold mine. Every commit is a historical record, and secrets leak constantly.

GitHub search techniques

Surface leaked credentials and sensitive content:

"org:acmecorp" password
"org:acmecorp" apikey
"@acmecorp.com" password
filename:.env acmecorp
filename:config.yml acmecorp
"BEGIN RSA PRIVATE KEY" acmecorp
extension:sql acmecorp INSERT INTO users

Note that GitHub’s secret scanning revokes many tokens automatically, so old dumps may have stale credentials — still useful for mapping services used.

Tools

gitleaks — scans repos and Git history for secret patterns
trufflehog — entropy-based secret detection, supports GitHub org scanning
git-secrets — AWS Labs tool; primarily for preventing commits but usable for audit
gitrob — catalogs secrets across an organization’s public repos
github-dorks / gh-dork — curated dork lists
GitHound / GitMiner — deep search across public GitHub

Pivoting from a single repo

Commit metadata — author email, name, timestamps (working hours)
.github/CODEOWNERS — team structure
Issue comments — internal tool names, vendors, ticket systems
PR reviewers — collaboration networks
Starred/forked repos — interests, technology exposure
GitHub Pages — hosted sites under <user>.github.io often have separate content

Beyond GitHub

GitLab.com — same techniques, smaller dork coverage
Bitbucket — less searchable but still scannable
Self-hosted instances — Gitea, Forgejo, cgit — find via Shodan (http.title:"Gitea")
DockerHub — images often ship with embedded secrets or leaked file paths
npm / PyPI / crates.io — package authors, private package mentions in public packages

12. Dark Web & Threat Intel

Aggregators and commercial platforms fold darknet content, malware telemetry, and threat actor intelligence into the OSINT pipeline.

Platforms

Intel 471 — cybercriminal forum and actor intelligence
Recorded Future — broad threat intel with OSINT and closed-source blend
CloudSEK (XVigil) — external threat monitoring, brand exposure, dark web
Flashpoint — illicit community monitoring
DarkOwl — darknet content search
ShadowDragon (SocialNet, etc.) — investigative toolkits with 200+ data sources integrated
ZeroFox — brand protection, social and dark web
Digital Shadows / ReliaQuest — digital risk protection
Maltego + Transform Hub — glue for integrating many of the above

Threat intel feeds

MISP — open-source threat intelligence sharing platform
AlienVault OTX — free community threat exchange
abuse.ch (URLhaus, MalwareBazaar, ThreatFox, Feodo Tracker) — free high-quality IoC feeds
VirusTotal Intelligence — paid search over submitted samples, URLs, domains
GreyNoise — distinguishes targeted scans from internet background noise

13. IoT & Device Discovery

Specialized search for internet-connected devices and sensors, from industrial control systems to smart home devices.

Shodan — still the best for ICS/SCADA (port:502, port:102, category:ics)
Censys — complementary coverage
ZoomEye — strong APAC IoT coverage
Thingful — the “search engine for the Internet of Things” — aggregates public IoT sensor data (air quality, weather, energy, transport) across millions of devices globally, suitable for environmental research and urban analytics
Kamerka — geolocation-focused ICS/IoT scanner using Shodan/Binary Edge data
Insecam — lists public webcams (many with default credentials)

These tools are powerful for researchers mapping exposure and for defenders cataloging their own attack surface. They are equally abused by attackers — defenders should track their own presence in them.

24. Tools Reference

A consolidated lookup of the tools practitioners reach for. The overlap between “OSINT tool” and “recon tool” is large; most of these appear repeatedly in the source surveys.

Frameworks and aggregators

Tool	Purpose
Maltego	Graph-based link analysis, Transform Hub with 70+ data sources, the standard for investigations that must produce a visual link chart
SpiderFoot	Automated OSINT framework, 200+ modules, web UI, runs scheduled scans, correlates findings
recon-ng	Framework with Metasploit-style module system for recon workflows
theHarvester	Email, subdomain, employee name enumeration from search engines and PGP servers
OSINT Framework (osintframework.com)	Curated web directory of tools by category; not a scanner, but the best starting map of the ecosystem
IntelTechniques (OSINT Techniques)	Michael Bazzell’s methodology and tool collection

Maltego

Model: graph of entities (Person, Domain, IP, Email, etc.) connected by relationships. Transforms run against an entity to produce related entities.
Data sources: the Transform Hub integrates DomainTools, Shodan, Pipl, OpenCorporates, Censys, Have I Been Pwned, Vetric, Netlas, IBM Watson, and many more. Many are paid.
Use cases: person of interest investigations, corporate link analysis, threat actor attribution, fraud networks.
Typical workflow: seed with names, domains, or emails → run passive Transforms → pivot on interesting results → prune noise → export as report or visual graph. A complete person investigation can move from a name to Wikipedia to personal website to historical WHOIS to personal email to person profile (age, relatives) in a handful of Transform runs.

SpiderFoot

Model: modular scanner with 200+ modules, each tapping a specific data source. Configure the target and scan profile, run, review.
Data sources: Shodan, VirusTotal, HIBP, SecurityTrails, HackerTarget, crt.sh, Censys, IntelX, and many more (some require API keys).
Use cases: baseline external exposure audit, continuous monitoring, bug bounty asset discovery, threat investigation.
Strengths: fire-and-forget automation, depth of coverage, built-in correlation rules that highlight interesting findings across modules.

theHarvester

Model: CLI tool that queries search engines, DNS sources, and PGP key servers for emails, subdomains, IPs, and employee names.
Sources: Google, Bing, DuckDuckGo, LinkedIn, Baidu, crt.sh, Shodan, Censys, and many more.
Typical invocation: theHarvester -d example.com -l 500 -b all
Strengths: simple, scriptable, pairs well with automation pipelines.

recon-ng

Model: Metasploit-style framework (workspaces, modules, options). Modules fetch specific data types into a workspace database.
Strengths: good persistence of results across sessions, scriptable, reasonable module coverage for core recon tasks.
Typical flow: workspaces create acme → add seed domains → run recon/domains-hosts/* modules → export.

Sherlock

Purpose: username enumeration across 300+ social sites. python3 sherlock jdoe.
Strengths: fast, easy, no API keys. Good for alias discovery.
Caveats: false positives on generic 200 responses; validate manually.

Shodan

Purpose: search engine over internet-connected service banners. Queries scan data, not live services.
Filters: port:, product:, version:, org:, hostname:, country:, vuln:, category:, ssl:, http.title:, http.html:
CLI: shodan host 1.2.3.4, shodan search 'apache country:US', shodan download, shodan parse
Best for: attack surface snapshots, finding forgotten assets, identifying vulnerable software at scale.

Censys

Purpose: internet-wide scan data with particular strength in TLS certificates and precise field queries.
Query language: Lucene-style with parsed fields. services.service_name: "HTTP" and parsed.names: example.com
Strengths: certificate history, subdomain discovery via cert parsed names, strong API.
Free tier: 250 web searches/month; API access requires a paid plan.

Specialized tools referenced across the surveys

Category	Tools
Subdomain enum	Subfinder, Amass, Assetfinder, chaos-client, Findomain, Sublist3r
HTTP probing	httpx, aquatone, gowitness, EyeWitness
URL discovery	waybackurls, gau, katana, hakrawler, gospider
Port scanning	Nmap, Masscan, RustScan, naabu
Content discovery	ffuf, gobuster, feroxbuster, dirsearch
Email hunting	Hunter.io, theHarvester, phonebook.cz, Clearbit, Skymem
Username hunting	Sherlock, WhatsMyName, Maigret, Namechk, Holehe
Image search	Google Lens, Yandex, TinEye, PimEyes
Metadata	exiftool, FOCA, metagoofil, mat2
Phone	PhoneInfoga
Breach	HaveIBeenPwned, Dehashed, h8mail, IntelX
Geolocation	SunCalc, Overpass Turbo, Mapillary, GeoGuessr GPT
Visualization	Maltego, Gephi, yEd
IoT	Shodan, Censys, Thingful, Kamerka
Dark web	IntelX, DarkOwl, Ahmia
Continuous monitoring	SpiderFoot, Recon-ng, custom crons, ShadowDragon

Commercial platforms

The surveys repeatedly reference a cluster of commercial OSINT/threat intel platforms for enterprise use: Maltego, ShadowDragon, Recorded Future, Intel 471, Flashpoint, CloudSEK XVigil, ZeroFox, DarkOwl, SpiderFoot HX, Babel Street, Dataminr, Palantir Gotham. These bundle data access, analyst tooling, and curated feeds at cost. Free alternatives exist for most individual capabilities; the commercial value is integration, freshness, and support.

15. Automation & Visualization

Manual OSINT is unsustainable past a few targets. Automation and visualization amplify the analyst.

Automation patterns

Scripts orchestrating free tools — a shell script that runs subfinder → httpx → nuclei → slack notify gives continuous monitoring on a cron
Recon-ng workspaces — persistent state across sessions
SpiderFoot scans — scheduled or triggered by webhook
Custom Python pipelines — requests, beautifulsoup, platform APIs, networkx for graphs
Jupyter notebooks — for exploratory analysis with inline visualization

One practitioner-authored pipeline (ODIN) strings together WHOIS, reverse WHOIS, subdomain discovery, DNS records, Shodan, RDAP, email harvesting, breach lookups, paste searches, and bucket hunting into a single run against a target name and primary domain, producing a structured report. The underlying techniques are the ones in this guide; the automation just glues them together.

Visualization

Human eyes excel at spotting patterns in graphs that are invisible in tables.

Maltego — the reference tool for investigative link analysis
Gephi — open-source network visualization for large graphs
yEd — free diagramming with auto-layout for medium graphs
Neo4j — graph database for queryable link analysis at scale
D3.js / vis.js / cytoscape.js — web-based custom visualizations
Kibana / Grafana — dashboards for continuous OSINT feeds

16. Cloud & Modern Infrastructure Intelligence

Modern infrastructure spans multiple cloud providers, microservices architectures, and API ecosystems. Traditional network reconnaissance must evolve to address containerized applications, serverless functions, and cloud-native technologies.

Cloud Asset Discovery 2026

Multi-Cloud Enumeration:

AWS — S3 buckets, CloudFront distributions, API Gateway endpoints, Lambda function URLs
Azure — Blob storage, Azure Functions, Application Gateway, API Management
Google Cloud — Cloud Storage buckets, Cloud Functions, Cloud Run services, API Gateway
Specialized clouds — DigitalOcean Spaces, Vultr Object Storage, Linode buckets

Advanced Cloud OSINT Techniques:

DNS CNAME analysis — cloud service providers reveal architecture through DNS records
TLS certificate enumeration — cloud load balancer certificates expose internal service names
API endpoint discovery — GraphQL introspection, REST API documentation leaks
Container registry scanning — public Docker Hub, Quay.io, ECR repositories

Tools for Cloud Intelligence:

cloud_enum — multi-cloud asset discovery tool
CloudMapper — AWS security analysis and visualization
ScoutSuite — multi-cloud security auditing
Pacu — AWS exploitation framework (for authorized testing)
Prowler — cloud security best practices scanner

API & Microservices Intelligence

Modern applications expose intelligence through API endpoints, often with insufficient access controls.

API Discovery Methods:

Documentation leaks — Swagger/OpenAPI specs in public repositories
GraphQL introspection — enabled by default in many implementations
REST API enumeration — common endpoints, version discovery
Webhook analysis — third-party integrations reveal internal architecture
Mobile app reverse engineering — APK/IPA analysis for API endpoints

API Testing for OSINT:

## GraphQL introspection
curl -X POST -H "Content-Type: application/json" \
  -d '{"query":"query IntrospectionQuery { __schema { types { name } } }"}' \
  https://target.com/graphql

## REST API discovery
ffuf -w api-wordlist.txt -u https://target.com/api/FUZZ

## Swagger documentation discovery
curl https://target.com/swagger.json
curl https://target.com/v1/swagger.json
curl https://target.com/api/docs

Container & DevOps Intelligence

Containerized applications leave traces across registries, orchestration platforms, and CI/CD systems.

Container Registry Analysis:

Docker Hub — public repositories reveal internal project names and configurations
Quay.io — Red Hat’s container registry
GitHub Container Registry — packages linked to repositories
ECR/ACR/GCR — cloud-specific registries sometimes publicly accessible

DevOps Pipeline Intelligence:

CI/CD artifacts — build logs, deployment scripts, environment variables
Infrastructure as Code — Terraform, CloudFormation, Kubernetes manifests
Secret management — HashiCorp Vault, AWS Secrets Manager exposure
Monitoring endpoints — Prometheus, Grafana, ELK stack dashboards

Kubernetes OSINT:

Service discovery — DNS enumeration for cluster services
Ingress analysis — public-facing service mapping
ConfigMap/Secret enumeration — exposed configuration data
Pod security context analysis — privilege escalation vectors

17. Blockchain & Financial Intelligence 2026

Cryptocurrency and blockchain technology have matured into critical intelligence domains. Modern financial investigations require understanding of DeFi protocols, NFT ecosystems, and privacy-preserving cryptocurrencies.

Blockchain Analysis Fundamentals

Core Concepts:

Address clustering — grouping addresses controlled by the same entity
Transaction flow analysis — following funds through multiple transactions
Exchange attribution — identifying centralized exchange addresses
Mixing service detection — identifying privacy-enhancing transaction patterns
Smart contract analysis — understanding DeFi protocol interactions

Major Blockchain Analysis Platforms:

Chainalysis — professional blockchain analytics (law enforcement/compliance focus)
Elliptic — crypto compliance and investigation platform
CipherTrace — cryptocurrency AML and investigation tools
Crystal Blockchain — transaction monitoring and investigation
OXT — Bitcoin transaction analysis and privacy research

DeFi & Smart Contract Intelligence

Decentralized Finance (DeFi) protocols create complex financial relationships visible on-chain but requiring specialized analysis.

DeFi Investigation Techniques:

Liquidity pool analysis — tracking funds in Uniswap, SushiSwap, Curve pools
Yield farming tracking — following assets through lending protocols
DAO governance participation — voting patterns reveal stakeholder relationships
Flash loan analysis — identifying sophisticated financial attacks
Cross-chain bridge monitoring — tracking assets between blockchains

Smart Contract OSINT:

// Contract verification on Etherscan reveals:
// - Source code and comments
// - Constructor parameters
// - Transaction history
// - Token transfers and interactions

Tools for DeFi Analysis:

Dune Analytics — blockchain data queries and dashboards
Nansen — on-chain analytics with entity labeling
DeBank — DeFi portfolio and protocol tracking
Zerion — DeFi asset tracking across protocols
Token Sniffer — smart contract security analysis

Privacy Coin & Mixer Analysis

Privacy-focused cryptocurrencies and mixing services require specialized investigation techniques.

Privacy Coin Challenges:

Monero (XMR) — ring signatures, stealth addresses, RingCT
Zcash (ZEC) — zk-SNARKs, shielded transactions
Dash — CoinJoin implementation, masternodes
Beam/Grin — Mimblewimble protocol

Mixer & Tumbler Analysis:

Bitcoin mixers — CoinJoin, Wasabi Wallet, Samourai Whirlpool
Ethereum mixers — Tornado Cash, Aztec Protocol
Cross-chain mixers — THORChain, Secret Network bridges
Pattern analysis — timing, amounts, address reuse

Investigation Techniques:

Input/output analysis — correlating mixer inputs with outputs
Timing correlation — deposit/withdrawal pattern analysis
Amount correlation — unique transaction amounts through mixers
Change address analysis — non-mixed outputs reveal identity
Network analysis — IP addresses, Tor exit nodes

Cryptocurrency Threat Intelligence

Blockchain data provides unique intelligence for cybercrime investigation and threat actor tracking.

Ransomware Payment Tracking:

Payment address monitoring — tracking ransom payments to known groups
Infrastructure correlation — linking payment addresses to command & control
Attribution through payments — identifying affiliates and infrastructure providers
Recovery operations — coordinating with exchanges for asset freezing

Scam & Fraud Detection:

Ponzi scheme patterns — payment structures reveal fraudulent operations
Exit scam prediction — liquidity drainage patterns in DeFi protocols
Phishing address monitoring — detecting credential theft operations
Social engineering campaigns — crypto-based advance fee fraud

18. AI-Assisted OSINT 2026

AI integration has fundamentally transformed OSINT workflows in 2026. Modern practitioners leverage large language models, computer vision, and automated analysis pipelines to process vast amounts of data while maintaining human verification standards.

Where AI helps

Image analysis — a multimodal model can enumerate visible clues (signage, architecture, vegetation, vehicles) and propose geolocations in seconds
Document summarization — long PDFs, financial filings, court documents
Translation and transliteration — foreign-language sources at scale
Link extraction — pulling structured entities (names, dates, orgs) from unstructured text
Writing style analysis — comparing two corpora for likely authorship
Code understanding — interpreting obfuscated JS, reverse engineering APIs
Query generation — proposing Google dorks, Shodan filters, or Censys queries from natural-language intent

Where AI fails

Hallucinated facts — models confidently fabricate names, dates, and attributions
Stale training data — nothing past the cutoff
Confirmation bias — will happily pretend to “find” what you ask for
Source attribution — outputs typically lack provenance

The rule: AI outputs are hypotheses. Every claim must be independently verified against a primary source before it enters a deliverable.

Specific tools and workflows

GeoGuessr GPT and similar custom GPTs — image geolocation first-pass
ChatGPT / Claude with vision — general image and document analysis
Recon agents — emerging autonomous agents that chain passive recon tools (early stage; reliability is poor)
AI-powered dark web monitoring — vendors offering semantic search over crawled forum content
AI entity extraction (IBM Watson NLU, spaCy, transformer-based NER) — scalable entity extraction from corpora

2026 AI Integration Advances

Multimodal Analysis Workflows:

Image geolocation with custom GPTs — tools like GeoGuessr GPT synthesize architectural, vegetation, and signage cues to propose locations
Video content analysis — frame-by-frame analysis for facial recognition, scene understanding, and temporal pattern detection
Audio processing — voice identification, accent analysis, background noise geolocation
Document intelligence — automated extraction of entities, relationships, and anomalies from large document corpuses

AI-Powered Correlation Engines:

Cross-platform entity linking — automated identification of the same person across multiple social platforms
Behavioral pattern recognition — identifying sockpuppet accounts through writing style and interaction patterns
Network analysis — AI-driven identification of influence networks and coordination patterns
Threat actor attribution — correlating tactics, techniques, and procedures across campaigns

Commercial AI-OSINT Platforms (2026):

Maltego AI Transforms — natural language queries converted to graph operations
SpiderFoot AI — automated correlation and anomaly detection across 300+ data sources
ShadowDragon AI — behavioral analysis and entity resolution across social platforms
Recorded Future NLP — threat intelligence extraction from unstructured sources
Intel 471 AI — dark web content analysis and threat actor tracking

Emerging AI-OSINT Tools:

## AI-powered subdomain discovery
chaos-recon -d example.com --ai-analysis

## LLM-assisted Google dorking
osint-gpt "find exposed documents for [company]"

## AI image analysis for GEOINT
geolocation-ai image.jpg --confidence-threshold 0.8

## Automated social media correlation
socmint-correlator --target "john_doe" --platforms all --ai-clustering

AI Ethics & Verification Standards

Verification Protocols:

Primary source confirmation — every AI-generated lead must be verified against original sources
Confidence scoring — assign reliability scores to AI outputs (1-10 scale)
Human-in-the-loop validation — critical decisions require human analyst approval
Audit trails — document AI tool usage and decision points for legal proceedings
Bias awareness — understand training data limitations and cultural biases

AI Failure Modes in OSINT:

Hallucination validation — cross-reference AI claims with multiple independent sources
Temporal accuracy — verify information currency, especially for rapidly changing situations
Cultural context — AI may misinterpret region-specific social cues and communication patterns
Privacy boundary confusion — AI may not distinguish between public and private information appropriately

19. Anti-Detection & Privacy Evasion

Modern targets employ sophisticated counter-surveillance measures. OSINT practitioners must understand both detection mechanisms and evasion techniques to maintain operational security while gathering intelligence.

Attribution Avoidance 2026

Advanced Browser Fingerprinting Defenses:

Canvas fingerprint randomization — tools like FingerprintSwitcher alter canvas rendering
WebGL spoofing — GPU fingerprint modification to avoid device identification
Font enumeration protection — limiting font list exposure to reduce uniqueness
Screen resolution spoofing — randomizing reported screen dimensions
Timezone manipulation — masking location through timezone randomization

Network-Level Anti-Detection:

Residential proxy networks — services like Bright Data, Oxylabs for legitimate IP rotation
Mobile carrier proxies — 4G/5G connections to simulate mobile device access
Tor with additional layers — VPN → Tor → VPN configurations for maximum anonymity
DNS over HTTPS/TLS — encrypted DNS to prevent ISP monitoring
Traffic pattern normalization — human-like timing and interaction patterns

Platform-Specific Evasion:

Platform	Detection Method	Evasion Technique
LinkedIn	Login pattern analysis	Gradual engagement, authentic session times
Facebook	Device fingerprinting	Mobile app simulation, varied access patterns
Instagram	API rate limiting	Multiple authenticated accounts, distributed collection
Twitter/X	Behavioral analysis	Human-like interaction patterns, content engagement
TikTok	Device binding	Mobile emulation, app store download simulation

Counter-Intelligence Awareness

Indicators of Target Awareness:

Honeypot content — deliberately planted false information to detect collection
Access pattern changes — sudden privacy setting modifications across platforms
Canary tokens — embedded tracking pixels in documents or profiles
Legal threats — cease and desist letters indicating detected investigation
Technical countermeasures — IP blocking, CAPTCHA implementation, rate limiting

Operational Security Failures:

Account linking — using same recovery email across sock puppet accounts
Timing correlation — consistent daily access patterns revealing timezone
Payment attribution — subscription payments linking to real identity
Social graph exposure — accidentally connecting sock puppet to real social network
Metadata leakage — device fingerprints, location data in uploaded content

20. Continuous Monitoring & Threat Hunting

Passive one-time collection has evolved into continuous intelligence operations. Modern OSINT practitioners implement persistent monitoring systems that detect changes and emerging threats automatically.

Automated Collection Pipelines

Infrastructure Monitoring:

Subdomain discovery automation — daily runs of subfinder, amass, and certificate transparency monitoring
Port scan automation — scheduled Nmap/Masscan against discovered assets
Web application monitoring — httpx probing with screenshot capture for visual changes
DNS monitoring — tracking record changes, new subdomains, certificate updates
Cloud asset monitoring — S3 bucket enumeration, cloud storage discovery

Social Media Monitoring Frameworks:

## Automated social media monitoring
socialscan-monitor --target "company_name" \
  --platforms twitter,linkedin,instagram,tiktok \
  --keywords "data breach,security incident,insider threat" \
  --alert-webhook https://alerts.company.com/webhook

## Continuous username monitoring
sherlock-monitor --usernames user_list.txt \
  --new-platforms-only \
  --notification slack://webhook_url

## Brand mention tracking
mention-tracker --brand "AcmeCorp" \
  --sentiment-analysis \
  --geographic-clustering \
  --alert-threshold negative

Dark Web & Breach Monitoring:

Paste site monitoring — automated scanning of Pastebin, Ghostbin, Hastebin
Dark web forum tracking — monitoring threat actor forums for organization mentions
Credential monitoring — automated breach database queries for employee emails
Ransomware tracking — monitoring leak sites for organization data
Marketplace surveillance — tracking sale of organizational data or access

Threat Intelligence Integration

MISP Integration for OSINT:

## OSINT → MISP integration example
import pymisp

def create_osint_event(domain, findings):
    misp = pymisp.PyMISP(misp_url, misp_key)
    
    event = misp.new_event(
        distribution=1,  # Organization only
        threat_level_id=3,  # Medium
        analysis=1,  # Initial
        info=f"OSINT findings for {domain}"
    )
    
    ## Add discovered subdomains as attributes
    for subdomain in findings['subdomains']:
        misp.add_attribute(
            event, 
            type='hostname', 
            value=subdomain,
            comment="Discovered via automated OSINT pipeline"
        )
    
    return event

Automated Correlation & Analysis:

IOC enrichment — automatic lookup of discovered indicators in threat intelligence feeds
Attribution scoring — machine learning models for threat actor correlation
Campaign tracking — linking infrastructure across multiple investigations
Predictive analysis — identifying likely future targets based on infrastructure patterns

Continuous OSINT Operations

Operational Frameworks:

Collection automation — scheduled data gathering from all configured sources
Processing pipelines — normalize and deduplicate collected intelligence
Analysis automation — ML-driven pattern recognition and anomaly detection
Alerting systems — configurable notifications for high-priority findings
Response integration — automatic ticket creation and team notifications

Metrics & KPIs for OSINT Programs:

Coverage metrics — percentage of digital footprint under monitoring
Detection time — time from exposure to discovery and alerting
False positive rates — accuracy of automated detection systems
Attribution confidence — reliability of threat actor identification
Response time — speed of investigation team response to alerts

21. Operational Security

OSINT is only passive if you do it right. Sloppy operators leak as much as they collect. Whether you are a defender running recon on your own company, an investigator looking into hostile actors, or a researcher probing sensitive communities, the target should never learn you were looking.

Attribution risks

IP address — the target’s analytics and logs capture it
User-Agent — fingerprints browser, OS, sometimes tool
Account identity — logging into LinkedIn to view a profile attaches your real name
Cookies / localStorage — cross-session tracking
Referer headers — leaks where you clicked from
DNS lookups — your ISP sees every domain you resolve
Browser fingerprint — canvas, fonts, screen size, timezone
TLS JA3/JA4 — tooling-specific TLS fingerprints
Timing patterns — your working hours reveal your timezone

Layered defenses

Dedicated investigation VM — never mix with personal or work browsing. Keep it disposable (snapshots, revert after every engagement).
Separate OS profile or container — at minimum, a segregated browser profile
VPN or residential proxy — Mullvad, IVPN, Proton VPN, or a commercial residential proxy for sensitive investigations. Know the provider’s logging policy.
Tor — for the most sensitive operations and dark-web access. Never log into personal accounts over Tor.
Burner accounts — sock puppets with their own email, phone (VoIP or burner SIM), aged over time, with plausible background activity
Hardened browser — Firefox with resist fingerprinting, uBlock Origin, Cookie AutoDelete, NoScript; or Tor Browser; or Brave with strict settings
Screenshot and archive tools with opsec-safe settings — Hunchly is purpose-built for investigators and captures every page automatically, with hash verification
Separate phone / hardware — for investigations where device fingerprinting matters
No personal accounts, ever — a single Google login while “just checking something” burns the entire persona

Sock puppet hygiene

Create accounts well in advance; aged accounts draw less suspicion
Use non-obvious names; avoid giveaway patterns (sequential usernames, shared avatars)
Build plausible activity: followers, posts, reactions over weeks or months
Different sock for different investigations — compartmentalize
Record credentials and backstory in a secure, central store
Never cross-contaminate between sock, work, and personal identities
Accept that sock puppets burn — plan for rotation

Hunchly and investigation capture

Hunchly is one of the few tools in the space purpose-built for investigative OSINT capture. It records every page an investigator visits, preserving exact HTML, screenshots, hashes, and a searchable case database. This solves two perennial problems: (1) reproducibility — you can demonstrate exactly what was on the page when you looked, and (2) note-keeping — the tool captures in real time instead of after the fact. For any investigation that may be scrutinized (legal, regulatory, publication), capture-by-default tooling is essential.

Safe data handling

Treat collected PII as sensitive from the moment it arrives
Encrypt investigation data at rest
Scrub workstations between engagements if commingling is a risk
Understand your deliverable’s exposure — who will see this report, and does it contain information that could re-identify protected sources?
Observe retention limits — delete when no longer needed

22. Legal & Ethical Considerations

OSINT is legal in broad strokes but varied in detail, and ethical only when practiced with judgment.

Legal surface area

Computer Fraud and Abuse Act (US) and similar — unauthorized access laws. Passive consumption of public data is safe; active probing without authorization is not.
GDPR (EU) — applies to processing personal data of EU residents. Investigators must have a lawful basis; “legitimate interest” often applies but must be documented.
CFAA precedent — scraping public data from websites is generally legal (hiQ v. LinkedIn and progeny), but terms-of-service violations can create civil exposure.
Platform ToS — scraping LinkedIn, Facebook, Instagram commonly violates ToS even if legal. Accounts can be banned; repeat offenders can face lawsuits.
Anti-stalking and harassment laws — aggregating public data about an individual can become unlawful harassment depending on intent and jurisdiction.
Breach data handling — possessing breach data is often legal, but further use (extorting victims, publishing PII) is not.
Export controls — some OSINT tooling is regulated under dual-use export regimes.

Ethical guardrails

Purpose test — can you articulate why you need this intelligence and who benefits?
Proportionality test — is the depth of collection proportional to the stakes?
Harm test — could publishing this information enable stalking, doxing, or physical harm?
Consent test — would the subject reasonably expect this information to be collected and used this way?
Transparency test — could you defend your methodology openly if challenged?

Investigators routinely face situations where the legal answer and the ethical answer diverge. A finding that is legal to discover may be unethical to publish. A technique that is clearly ethical may be restricted by platform ToS. Practitioners who survive long-term in the field develop judgment, not just skills.

Defender’s perspective

Defenders using these techniques against their own organization are on firm legal ground — you have implicit authorization over your own assets. The real risks are:

Accidentally probing a third party — vendors, customers, partners, lookalike domains
Storing personal data of employees — even collected from public sources, it falls under privacy law
Tipping off attackers — noisy recon against your own infrastructure can alert adversaries that you are looking

23. Quick Reference

The five-minute external exposure check

Run this on your own domain periodically:

## Subdomains
subfinder -d example.com -all -silent | tee subs.txt
cat subs.txt | dnsx -silent | tee live.txt
cat live.txt | httpx -silent -title -tech-detect -status-code

## Certificates
curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sort -u

## Shodan
shodan search "ssl:example.com" --fields ip_str,port,product,version

## Historical URLs
echo "example.com" | waybackurls | sort -u

## Leaked secrets on GitHub
## Manual: https://github.com/search?q=%22example.com%22+password&type=code

Seed-to-report pivot map

NAME ──┬──▶ search engines ──▶ Wikipedia, personal sites
       ├──▶ LinkedIn ──▶ employer, history
       ├──▶ socials ──▶ aliases ──▶ Sherlock ──▶ more platforms
       └──▶ images ──▶ reverse search ──▶ more accounts

EMAIL ─┬──▶ HIBP ──▶ breach list ──▶ services used
       ├──▶ Hunter.io ──▶ company patterns ──▶ more employees
       ├──▶ Gravatar ──▶ profile image
       ├──▶ Google ──▶ forum posts, paste hits
       └──▶ historical WHOIS ──▶ owned domains

DOMAIN ┬──▶ crt.sh ──▶ subdomains
       ├──▶ subfinder/amass ──▶ more subdomains
       ├──▶ whoxy ──▶ reverse WHOIS ──▶ related domains
       ├──▶ Shodan hostname: ──▶ services
       ├──▶ DNS ──▶ MX/TXT ──▶ vendors
       └──▶ wayback ──▶ historical endpoints

IP ────┬──▶ Shodan/Censys ──▶ services, vulns
       ├──▶ RDAP ──▶ owner, netblock
       ├──▶ reverse DNS ──▶ hostnames
       └──▶ bgp.he.net ──▶ ASN ──▶ more IPs

IMAGE ─┬──▶ Google Lens/Yandex/TinEye ──▶ source
       ├──▶ exiftool ──▶ GPS, camera, timestamp
       ├──▶ AI analysis ──▶ location hypothesis
       └──▶ visual clues ──▶ landmark/sign/architecture

Common Google dorks

site:target.com filetype:pdf
site:target.com ext:doc OR ext:docx OR ext:xls OR ext:xlsx
site:target.com inurl:admin
site:target.com intitle:"index of"
site:target.com "password" OR "confidential"
site:github.com "target.com"
site:linkedin.com/in "Target Corp"
site:pastebin.com "target.com"
site:s3.amazonaws.com target
"@target.com"
intext:"@target.com" site:pastebin.com

Common Shodan queries

hostname:target.com
ssl:"target.com"
org:"Target Corp"
port:3389 country:US org:"Target Corp"
http.title:"Login" hostname:target.com
product:"nginx" version:"1.18.0" hostname:target.com
vuln:CVE-2023-1234
has_screenshot:true port:5900
category:ics country:US

Checklist: before declaring recon complete

All known domains and subdomains enumerated from at least three passive sources
Certificate transparency logs checked for last 90 days
Historical WHOIS reviewed for original/hidden contact data
Wayback Machine checked for historical endpoints and scrubbed content
Shodan and Censys both queried for hostname and org
Cloud bucket namespaces checked (S3, Spaces, Azure, GCS)
GitHub/GitLab/Bitbucket searched for leaked secrets and configs
Employee emails and usernames harvested
Key employees’ breach exposure checked
Metadata extracted from published documents
DNS records analyzed for third-party vendors (SPF/MX/CNAME)
Dangling DNS records screened for takeover potential
All findings documented with source URL, timestamp, and confidence level
Raw artifacts archived separately from analysis notes
Opsec review: no personal accounts touched, no direct target interaction beyond what’s documented

Closing notes

OSINT rewards patience and punishes shortcuts. The tools listed here will all be different in two years — platforms will lock down, APIs will change, services will die, and new ones will appear. What persists is the methodology: ask a clear question, collect broadly, process rigorously, analyze honestly, cite meticulously, and protect the investigation from blowback. Every identifier is a pivot. Every fact needs a source. Every finding needs a second source.

The defender’s version of this guide is the same document read sideways: every technique an attacker can use to map your external footprint is a technique you should be running against yourself, on a schedule, with alerts. The asymmetry between attackers and defenders collapses when defenders start doing their own OSINT first.

Frequently asked questions

What is OSINT?

Open Source Intelligence is the practice of collecting and analyzing publicly available information from sources like websites, social media, DNS records, breach data, and imagery to answer an investigative question. It relies only on data anyone can legally access, not hacking or private records.

Is OSINT legal?

Gathering information from publicly available sources is generally legal, but how you collect and use it matters. Accessing data behind authentication you are not authorized for, violating terms of service, or stalking and harassment can cross legal lines, so scope and intent are key.

What tools are used for OSINT?

Common tools include Maltego for link analysis, Shodan and Censys for internet-connected devices, theHarvester and Amass for domain and email recon, Sherlock for username enumeration, and search engines with advanced operators, increasingly paired with AI-assisted analysis.

What are the stages of the OSINT lifecycle?

A typical cycle runs from planning and defining requirements, to collection of raw data, processing and normalizing it, analysis and pivoting to find connections, and finally dissemination of the finished intelligence, all while maintaining operational security.

Comprehensive OSINT Guide#

Table of Contents#

1. Fundamentals#

2. The OSINT Lifecycle#

Phase 1: Planning & Requirements#

Phase 2: Collection#

Phase 3: Processing#

Phase 4: Analysis#

Phase 5: Dissemination#

Phase 6: Feedback#

3. People OSINT (HUMINT/SOCMINT)#

Starting identifiers#

The Maltego-style pivot graph#

Username enumeration#

Email enumeration and validation#

Phone numbers#

Reverse image search#

4. Company & Corporate OSINT#

Corporate identity sources#

Subsidiary and domain discovery#

Employee enumeration#

Technology fingerprinting#

5. Infrastructure & Network OSINT#

IP space and ASN#

Search engines for infrastructure#

Cloud asset discovery#

Historical data#

6. Domain, DNS & Certificate Intel#

Subdomain enumeration#

Active validation#

DNS record mining#

WHOIS#

7. Social Media Intelligence#

Platform-by-platform quick reference#

Techniques#

Tools#

8. 2026 Enhanced Social Media Intelligence#

TikTok Intelligence & Short-Form Video Analysis#

Discord & Gaming Platform Intelligence#

Telegram Intelligence & Channel Analysis#

Deepfake & Synthetic Media Detection#

Anti-Detection & Privacy-Aware Collection#

9. Geolocation & Imagery (GEOINT)#

Classic technique stack#

Reverse image search#

AI-assisted geolocation#

Mapping and imagery sources#

Video and live stream OSINT#

EXIF and video metadata#

9. Breach, Leak & Paste Intel#

Breach lookup services#

Operational notes#

Paste sites and dumps#

10. Metadata Extraction#

Document metadata#

Google dorks for document hunts#

What metadata reveals#

11. Code & Repository OSINT#

GitHub search techniques#

Tools#

Pivoting from a single repo#

Beyond GitHub#

12. Dark Web & Threat Intel#

Platforms#

Threat intel feeds#

13. IoT & Device Discovery#

24. Tools Reference#

Frameworks and aggregators#

Maltego#

SpiderFoot#

theHarvester#

recon-ng#

Sherlock#

Shodan#

Censys#

Specialized tools referenced across the surveys#

Commercial platforms#

15. Automation & Visualization#

Automation patterns#

Visualization#

Comprehensive OSINT Guide

Table of Contents

1. Fundamentals

2. The OSINT Lifecycle

Phase 1: Planning & Requirements

Phase 2: Collection

Phase 3: Processing

Phase 4: Analysis

Phase 5: Dissemination

Phase 6: Feedback

3. People OSINT (HUMINT/SOCMINT)

Starting identifiers

The Maltego-style pivot graph

Username enumeration

Email enumeration and validation

Phone numbers

Reverse image search

4. Company & Corporate OSINT

Corporate identity sources

Subsidiary and domain discovery

Employee enumeration

Technology fingerprinting

5. Infrastructure & Network OSINT

IP space and ASN

Search engines for infrastructure

Cloud asset discovery

Historical data

6. Domain, DNS & Certificate Intel

Subdomain enumeration

Active validation

DNS record mining

WHOIS

7. Social Media Intelligence

Platform-by-platform quick reference

Techniques

Tools

8. 2026 Enhanced Social Media Intelligence

TikTok Intelligence & Short-Form Video Analysis

Discord & Gaming Platform Intelligence

Telegram Intelligence & Channel Analysis

Deepfake & Synthetic Media Detection

Anti-Detection & Privacy-Aware Collection

9. Geolocation & Imagery (GEOINT)

Classic technique stack

Reverse image search

AI-assisted geolocation

Mapping and imagery sources

Video and live stream OSINT

EXIF and video metadata

9. Breach, Leak & Paste Intel

Breach lookup services

Operational notes

Paste sites and dumps

10. Metadata Extraction

Document metadata

Google dorks for document hunts

What metadata reveals

11. Code & Repository OSINT

GitHub search techniques

Tools

Pivoting from a single repo

Beyond GitHub

12. Dark Web & Threat Intel

Platforms

Threat intel feeds

13. IoT & Device Discovery

24. Tools Reference

Frameworks and aggregators

Maltego

SpiderFoot

theHarvester

recon-ng

Sherlock

Shodan

Censys

Specialized tools referenced across the surveys

Commercial platforms

15. Automation & Visualization

Automation patterns

Visualization