Comprehensive Secrets Management & Leakage Guide

A practitioner’s reference for secrets sprawl, credential leakage, detection, remediation, and hardening. Compiled from 30 research sources covering GitGuardian State of Secrets Sprawl 2025/2026, OWASP Secrets Management Cheat Sheet, TruffleHog, Gitleaks, real-world breaches (Trivy/European Commission, Shai-Hulud, LiteLLM), AI-era leakage patterns, and vault/NHI governance guidance.

Fundamentals & Impact
Threat Landscape & Statistics
Leak Locations & Attack Surface
Secret Types & Regex Signatures
JavaScript Bundle Extraction
Mobile App Secret Extraction
Cloud Metadata Exfiltration
Environment Variable & File Leakage
JWT Leaks & Validation Failures
Git History Mining
Secret Scanners Compared
AI-Era Leakage Patterns
Real-World Breaches
Rotation & Incident Response Playbook
Vaults & Secret Managers
Developer Hygiene & Prevention
Non-Human Identity Governance
Quick Reference

1. Fundamentals & Impact

A secret is any credential a machine or human uses to authenticate itself to another system: API keys, database passwords, private encryption keys, OAuth client secrets, tokens, SSH keys, TLS certificates, IAM credentials, webhook URLs, and service account JSON. Secrets are the connective tissue of modern distributed architectures, and they are simultaneously the shortest path from reconnaissance to full account takeover.

Impact spectrum of a leaked secret:

Stage	Example
Recon	Attacker grabs a leaked key, enumerates scope via CLI (`aws sts get-caller-identity`, `gh auth status`)
Lateral expansion	Pivot from one API key to connected services (S3, RDS, Slack, private repos)
Data exfiltration	Download customer PII, source code, models, or training data
Persistence	Create new access keys, add SSH keys, invite IAM users
Supply chain	Push malicious packages to npm/PyPI, poison Docker images, tamper with CI/CD
Monetisation	Crypto mining on stolen cloud credentials, ransom of S3 buckets, resale on dark web

Why it matters:

Over the past 10 years (Verizon DBIR), stolen credentials appear in 31% of all breaches.
IBM: breaches involving stolen or compromised credentials take an average of 292 days to identify and remediate — nearly a full year of attacker dwell time.
GitGuardian found 70% of secrets leaked in 2022 are still valid in 2025, and 64% of 2022 leaks remain exploitable in 2026. Detection is not remediation.
35% of all private repositories scanned contained at least one plaintext secret; 32.2% of internal repos contain a hardcoded secret compared to 5.6% of public repos — internal repos are the highest-value target once an attacker gets a foothold.

A single leaked AWS IAM key, GitHub PAT, or Slack bot token has repeatedly produced full cloud account takeover, supply chain compromise, and nine-figure breach costs. Secrets are not an edge case — they are the primary initial-access vector of the 2020s.

2. Threat Landscape & Statistics

The sprawl curve

Year	New hardcoded secrets on public GitHub	YoY change
2021	~6M	baseline
2023	~19M	—
2024	23.77M	+25%
2025	~29M (GitGuardian State of Secrets Sprawl 2026)	+34% — largest single-year jump ever recorded

Since 2021, leaked secrets have grown 152%, while GitHub’s public developer base expanded only 98%. Secrets sprawl is outrunning developer growth.

Where the leaks actually live

5.6% of public repos contain a secret.
32.2% of internal repos contain a secret (6x rate of public).
18% of scanned public Docker images contain secrets; 15% of those are valid.
28% of all incidents originate outside source code — Slack, Jira, Confluence, Teams.
Collaboration-tool-only leaks are more severe: 56.7% rated critical vs 43.7% for code-only.
7,000+ valid AWS keys remain exposed on Docker Hub.
100,000 valid secrets in GitGuardian’s analysis of 15M public Docker images, including Fortune 500 AWS keys and GitHub tokens.
Self-hosted GitLab & Docker registries expose secrets at 3–4x the rate of public GitHub.
15% of commit authors on public GitHub leaked a secret at least once.

Top leaked secret categories (2024–2025)

AWS IAM access keys
Slack webhooks and bot tokens
Azure AD API keys / service principal secrets
GitHub PATs and fine-grained tokens
MongoDB / MySQL / PostgreSQL connection strings (no standard prefix → hard to detect)
Stripe keys, SendGrid, Twilio
Generic / custom API keys (fastest-growing, hardest to scan)
OpenAI, Anthropic, and other LLM provider keys

AI amplification

29 million new hardcoded secrets in 2025 (+34% YoY).
1,275,105 leaked secrets tied specifically to AI services (+81% YoY).
Eight of the ten fastest-growing leak categories are AI-related.
Brave Search API keys: +1,255% YoY.
Firecrawl: +796% YoY. Supabase: +992% YoY.
Public repos using GitHub Copilot had a 6.4% secret leakage rate, vs ~4.6% baseline.
Wiz audited the Forbes AI 50: 65% had leaked verified secrets on GitHub, frequently in deleted forks, gists, and personal developer repos.
MCP (Model Context Protocol) config files exposed 24,008 unique secrets in 2025 (2,117 validated) in their first year.

3. Leak Locations & Attack Surface

Source code hosts

Platform	Notes
GitHub public repos	Largest visible attack surface; push protection helps only for prefixed keys
GitHub internal/private repos	6x more secret-dense than public; often misconfigured to public
GitHub Gists	Frequently ignored by scanning; personal devs paste snippets with embedded keys
GitHub forks	Force-deleted forks remain in the commit graph (see “oops commits”)
GitLab (SaaS & self-hosted)	Self-hosted instances frequently exposed to internet with default creds
Bitbucket	Less scanning tooling, frequently forgotten legacy repos
Azure DevOps	Pipeline variables, var groups, wiki pages

Beyond source code

Surface	Leak vector
Docker Hub / GHCR	Secrets baked into layers via `ENV`, `ARG`, or forgotten `COPY .`
npm / PyPI / crates.io packages	`.env`, `.npmrc`, tarball artifacts
CI/CD logs	Jenkins, GitHub Actions, CircleCI echoing secrets or masking failures
Slack / Teams / Discord	Pasted credentials during incident response and onboarding
Jira / Confluence / Notion	“Temporary” creds in runbooks that never get removed
Shodan / Censys	Exposed .env, .git, config endpoints on the open internet
Pastebin / GitHub code search	Intentional and accidental dumps
Wayback Machine / Google cache	Historical copies of pages that briefly exposed keys
Mobile app bundles (APK/IPA)	Hardcoded API keys in decompiled smali / binary strings
JavaScript bundles	Keys in SPA `main.js`, `chunk-*.js`, source maps
Backup files	`.sql`, `.bak`, `.tar.gz`, `config.php.swp`, `.DS_Store`
Browser dev tools	Tokens visible in localStorage, sessionStorage, cookies
Artifacts & build outputs	WAR, JAR, PyInstaller .exe, Electron app.asar
Kubernetes manifests	`stringData:` in `Secret` resources stored in git
Terraform state files	`terraform.tfstate` committed with plaintext provider credentials

Developer endpoints as aggregation layer

The Shai-Hulud 2 supply chain incident gave rare telemetry across 6,943 compromised systems:

294,842 secret occurrences observed
33,185 unique secrets
Average secret appeared in 8 different locations per machine (.env, shell history, ~/.aws/credentials, IDE configs, .git-credentials, cached tokens, build artifacts)
59% of compromised machines were CI/CD runners, not personal laptops

Once secrets sprawl into build infrastructure, the blast radius becomes organisational, not individual.

4. Secret Types & Regex Signatures

Detection tools work by combining pattern matching (regex + prefix), entropy analysis (Shannon entropy over the candidate substring), and live verification (attempting to authenticate).

Common prefixes and formats

Provider	Pattern	Example
AWS Access Key ID	`AKIA[0-9A-Z]{16}`	`AKIAIOSFODNN7EXAMPLE`
AWS Secret Access Key	`[A-Za-z0-9/+=]{40}` (entropy-based)	—
AWS Session Token	`ASIA[0-9A-Z]{16}`	ASIAxxx…
GitHub PAT (classic)	`ghp_[A-Za-z0-9]{36}`	—
GitHub fine-grained	`github_pat_[A-Za-z0-9_]{82}`	—
GitHub OAuth	`gho_[A-Za-z0-9]{36}`	—
GitHub app install	`ghs_[A-Za-z0-9]{36}`	—
Slack bot token	`xoxb-[0-9]+-[0-9]+-[A-Za-z0-9]+`	—
Slack user token	`xoxp-…`	—
Slack webhook	`https://hooks.slack.com/services/T[A-Z0-9]+/B[A-Z0-9]+/[A-Za-z0-9]+`	—
Stripe live secret	`sk_live_[A-Za-z0-9]{24,}`	—
Stripe restricted	`rk_live_[A-Za-z0-9]{24,}`	—
Google API key	`AIza[0-9A-Za-z_-]{35}`	—
Google OAuth client secret	`GOCSPX-[A-Za-z0-9_-]{28}`	—
OpenAI	`sk-[A-Za-z0-9]{48}` / `sk-proj-…`	—
Anthropic	`sk-ant-api03-[A-Za-z0-9_-]{95}`	—
HuggingFace	`hf_[A-Za-z0-9]{34}`	—
Twilio	`SK[0-9a-fA-F]{32}` / AC account SID	—
SendGrid	`SG\.[A-Za-z0-9_-]{22}\.[A-Za-z0-9_-]{43}`	—
NPM token	`npm_[A-Za-z0-9]{36}`	—
PyPI token	`pypi-AgEIcHlwaS5vcmc[A-Za-z0-9_-]+`	—
JWT	`eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+`	—
Private keys	`—–BEGIN (RSA	OPENSSH

Generic secret detection (the hard part)

Generic secrets (api_key = "...", password = "...", arbitrary database URLs) have no standard prefix and are the fastest-growing leak category. Detection strategies:

Keyword proximity + entropy: look for secret, key, token, password, passwd, pwd, auth, credentials, api_key within N characters of a high-entropy string.
Shannon entropy threshold: typically >= 3.5 for base64, >= 4.5 for hex-like strings.
Connection-string parsers: mysql://user:pass@host, postgres://…, mongodb+srv://…, redis://:pass@….
ML-assisted classifiers: GitGuardian and TruffleHog both layer ML models on top of regex for generic secret classification.
Live verification: the definitive signal — TruffleHog’s “verified” mode attempts authentication against the relevant API before raising an alert.

GitHub Push Protection struggles with generic secrets — MySQL and MongoDB credentials were measurably not impacted by Push Protection rollout because they lack a standardized prefix.

5. JavaScript Bundle Extraction

Single-page applications frequently ship secrets inside compiled JS bundles under the mistaken belief that minification is obfuscation. A bundled SPA is client-side code — anything in it is public.

What leaks in JS bundles

Firebase configs (apiKey, authDomain, projectId) — some fields are intended to be public, but databaseURL with open rules leads to full read/write
Stripe publishable and accidental secret keys
Algolia admin keys (vs the intended search-only key)
Mapbox, Google Maps server keys
Segment write keys with elevated scopes
Hardcoded JWT signing secrets (symmetric HS256)
AWS Cognito unauth pool IDs leading to IAM assumption
Backend base URLs, staging endpoints, internal API paths

Extraction workflow

Crawl the target with a headless browser or katana / hakrawler and collect every .js, .mjs, .map, and chunk-*.js.
Fetch source maps (.map files) where available — they reconstruct original source trees including comments that often reveal keys.
Run secret scanners over collected JS:
- trufflehog filesystem ./js
- gitleaks dir ./js
- SecretFinder, LinkFinder (Python, classic toolkit)
- jsluice (newer, AST-based extractor for URLs, params, and secrets)
Beautify bundles with js-beautify to reveal string literals hidden by minification.
Grep for high-value markers: firebaseConfig, accessKeyId, process.env, Bearer , Authorization.
Diff historical versions from Wayback Machine — developers often remove keys after disclosure without rotation.

Why this keeps happening

Webpack/Vite inline process.env.* at build time when using define: plugins. A developer setting VITE_API_SECRET in .env and referencing it as import.meta.env.VITE_API_SECRET ships the value in main.js. The NEXT_PUBLIC_ convention is explicit about this; the VITE_ convention is not always respected.

6. Mobile App Secret Extraction

Mobile apps are binary blobs distributed to every user — treat them as adversarial-read by default.

Android (APK)

Extraction pipeline:

Pull the APK: adb shell pm path <pkg> then adb pull.
Unpack with apktool d app.apk — yields smali/, res/, AndroidManifest.xml, assets/.
Decompile DEX to Java with jadx -d out app.apk for readable source.
Scan with secret scanners:
- trufflehog filesystem out/
- Custom grep for api_key, password, BuildConfig, R.string.
Check res/values/strings.xml — developers routinely put API keys here.
Check assets/ for bundled .env, .json, .properties, config.js.
Extract strings from native .so libraries: strings lib/arm64-v8a/*.so | grep -iE 'key|token|secret'.

Common APK secret locations:

BuildConfig.java — Gradle build-time constants
strings.xml resources
.properties files under assets/
Hardcoded in Java/Kotlin via String KEY = "..."
Certificate pinning bypass targets that leak API keys

iOS (IPA)

Extraction pipeline:

Pull IPA from a jailbroken device or from a decrypted source (frida-ios-dump, bagbak).
Unzip IPA — contains Payload/App.app/ bundle.
Inspect the main executable with strings or otool.
Class-dump Objective-C metadata: class-dump or classdumpios.
For Swift, use Hopper or Ghidra; Swift symbols are mangled but string constants remain.
Check Info.plist and embedded .plist files for API keys.
Check Assets.car and bundled resources.

Runtime extraction: Frida scripts hook NSString allocation or SecItem keychain calls to dump secrets at runtime, bypassing any at-rest obfuscation.

Mitigations that actually help

Never embed a production secret in a mobile binary. Full stop.
Use short-lived tokens minted by your backend after user authentication.
For Firebase/GCP, use App Check / DeviceCheck to bind requests to genuine app installs.
For Android, use Play Integrity API; iOS uses App Attest.
Where client-side crypto is required, derive keys from user credentials + PBKDF2/Argon2.
Server-side authorisation is the only real defence — client-side obfuscation is speed bump, not a gate.

7. Cloud Metadata Exfiltration

When an attacker gains SSRF, RCE, or code execution inside a cloud workload, the Instance Metadata Service (IMDS) becomes the single highest-value internal target: it hands out short-lived IAM credentials that the workload itself is entitled to use.

AWS

Endpoint	Returns
`http://169.254.169.254/latest/meta-data/`	Index of metadata categories
`http://169.254.169.254/latest/meta-data/iam/security-credentials/`	IAM role names attached to the instance
`http://169.254.169.254/latest/meta-data/iam/security-credentials/<role>`	`AccessKeyId`, `SecretAccessKey`, `Token` (STS session credentials)
`http://169.254.169.254/latest/user-data/`	EC2 user-data script (often contains bootstrap secrets)
`http://169.254.169.254/latest/dynamic/instance-identity/document`	Account ID, region, instance ID

IMDSv2 (token-based) is the mitigation: attacker must first PUT to /latest/api/token with X-aws-ec2-metadata-token-ttl-seconds, receive a token, then include it as X-aws-ec2-metadata-token on subsequent requests. Many SSRF primitives cannot issue PUT or arbitrary headers, and are thus blocked. Enforce IMDSv2 cluster-wide via the HttpTokens=required launch template setting.

Azure

Endpoint	Returns
`http://169.254.169.254/metadata/instance?api-version=2021-02-01` (requires `Metadata: true` header)	Instance metadata
`http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/`	Managed Identity access token
`http://169.254.169.254/metadata/identity/oauth2/token?...&resource=https://vault.azure.net`	Key Vault access token

The required Metadata: true header and resource parameter are Azure’s lightweight mitigation. Once an attacker obtains a Managed Identity token for Key Vault, every secret and key the MI can reach is accessible.

GCP

Endpoint	Returns
`http://metadata.google.internal/computeMetadata/v1/` (requires `Metadata-Flavor: Google`)	Metadata index
`http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token`	OAuth2 access token for the attached service account
`http://metadata.google.internal/computeMetadata/v1/project/attributes/`	Project-level SSH keys, metadata

Kubernetes

Location	Secret
`/var/run/secrets/kubernetes.io/serviceaccount/token`	Pod service-account JWT
`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`	Cluster CA
`https://kubernetes.default.svc/api/v1/namespaces/<ns>/secrets` (with SA token)	All secrets the SA can read
Kubelet `:10250/pods` (if unauthenticated)	Pod metadata including env vars

Exfiltration once credentials are obtained

aws sts get-caller-identity          # confirm scope
aws iam list-attached-role-policies  # enumerate permissions
aws s3 ls                            # list buckets
aws secretsmanager list-secrets      # enumerate centrally-stored secrets
aws ssm describe-parameters          # SSM Parameter Store often holds legacy creds

A single SSRF → IMDS chain routinely produces full AWS account takeover. The Capital One 2019 breach (100M+ records) is the canonical case: a WAF SSRF → IMDSv1 → IAM credentials → S3 exfiltration.

8. Environment Variable & File Leakage

Secrets migrated out of source code typically landed in .env files and process environment variables. Both introduce new leak channels.

`.env` files

Committed to git by developers who forgot .gitignore.
Shipped inside Docker images via COPY . ..
Exposed by web servers serving the project directory root (https://victim.com/.env returns 200 with full file).
Present in backup tarballs accidentally made public on S3 or FTP.
Loaded by Next.js, Laravel, Rails, and Django during dotenv initialisation, then echoed into error pages and debug toolbars.

Recon one-liners (defensive — to test your own assets):

curl https://target/.env
curl https://target/.env.backup
curl https://target/.env.local
curl https://target/config/.env
curl https://target/.git/config

Environment variable leakage

Channel	Mechanism
Error pages	Django `DEBUG=True`, Flask debug toolbar, Rails `WEB_CONSOLE`, Laravel Ignition — all dump full env on exception
phpinfo()	Leftover diagnostic file dumps `$_ENV`, `$_SERVER`, including DB creds
`/proc/self/environ`	On LFI, reading this file under a worker process returns that worker’s env vars
`/proc/<pid>/environ`	Same, for other processes with appropriate permissions
Docker image history	`docker history <image>` leaks `ENV` directives from Dockerfile layers
Kubernetes API	`kubectl describe pod` shows env vars that reference `configMapKeyRef` but inlined `value:` entries leak
Process listing	`ps auxe` reveals env; long-running daemons started with `--secret=...` as argv leak via `/proc/<pid>/cmdline`
APM / logging stacks	Datadog, Sentry, New Relic, Elastic APM frequently capture full env on error
Core dumps	`/var/crash/*.core` contains full process memory including secrets

Defensive patterns

Never set secrets as ENV in Dockerfiles; use runtime --env-file or orchestrator secret mounts.
Tmpfs-mount .env files in containers; never bake them into the image.
Disable DEBUG in production, verify explicit error pages.
Scrub env vars before sending events to APM/logging. Sentry, Datadog, and Honeycomb all have allowlist/denylist filters.
Use memory-safe primitives: in Java/.NET, prefer byte[]/char[] over String, zero the buffer after use. Strings are immutable and cannot be reliably garbage-collected.

9. JWT Leaks & Validation Failures

JWTs are bearer tokens — anyone possessing one can use it. They are also among the most commonly leaked secrets because they appear liberally in logs, browser storage, and URLs.

Common JWT leak vectors

Stored in localStorage (accessible from any XSS).
Passed in URL query strings (?token=eyJ...) — logged by every proxy, analytics tool, and browser history.
Copied into Slack/Jira during debugging.
Baked into mobile apps as “API key” equivalents.
Logged by web frameworks on request errors.
Cached by CDNs when cache keys don’t include the Authorization header.

Validation failures that turn a leak into a breach

Failure	Impact
`alg: none`	Library accepts unsigned tokens
HS256 vs RS256 confusion	Attacker signs tokens with the RSA public key as HMAC secret
Weak HS256 secret	Brute force `hashcat -m 16500` cracks short secrets in seconds
Missing `iss`/`aud` validation	Tokens from sibling tenants accepted
Expired `exp` not enforced	Tokens live forever
`kid` injection	Attacker points to a file under their control or injects SQL
JKU/JWK header trust	Attacker hosts their own JWK set and forges tokens

Defensive JWT handling

Store tokens in HttpOnly; Secure; SameSite=Strict cookies, never in localStorage.
Short-lived access tokens (5–15 min), refresh token rotation with reuse detection.
Always validate alg against an allowlist — never trust the header’s alg field.
Validate iss, aud, exp, nbf, sub every request.
Rotate signing keys regularly, publish via JWKS with kid pinning.
Never log full tokens — truncate or hash.

10. Git History Mining

Git’s immutable log is an attacker’s treasure map. Deleting a secret in a later commit does not remove it from history; force-pushing does not remove it from GitHub’s event archive.

The “oops commits” problem

GitHub retains every public commit, even those developers attempt to erase through force pushes, as zero-commit PushEvent entries in its event archive. Sharon Brizinov (Truffle Security) scanned all force-pushed/deleted commits since 2020 via the GH Archive BigQuery dataset and found thousands of active secrets, including:

A GitHub Personal Access Token with admin permissions over the Istio repositories (immediate supply chain compromise potential)
Valid MongoDB credentials, AWS keys, GitHub PATs
Bug bounties totalling ~$25,000

Truffle Security released the open-source Force Push Scanner (https://github.com/trufflesecurity/force-push-scanner) that queries GH Archive via BigQuery and runs TruffleHog on orphaned commits. The practical conclusion: once a secret has been pushed to a public repo, it must be considered permanently compromised. Rotate, don’t hide.

Historical scanning workflow

# Full history, all branches, all tags
git clone --mirror https://github.com/org/repo
trufflehog git file://repo.git --results=verified
gitleaks git -v repo.git

# Scan a specific commit range
gitleaks git --log-opts="--all commitA..commitB" path/

# Scan every branch including deleted refs (if still reachable)
git fetch origin '+refs/pull/*:refs/remotes/origin/pr/*'

# Dangling blob scan
git fsck --full --unreachable --no-reflogs

Removing secrets from history (cautiously)

Historical rewrites are destructive and break every clone of the repo. They do not retroactively invalidate the leaked secret — rotation is always step one.

git filter-repo — modern replacement for git filter-branch; faster and safer.
BFG Repo-Cleaner — single-purpose tool that removes large files and secret strings.
After rewriting: force-push, notify all collaborators to re-clone, and assume the old history is archived somewhere (GH Archive, clones, forks, mirrors). Rotate the secret first, always.

11. Secret Scanners Compared

TruffleHog

Strengths: 800+ classified detectors, active verification (hits the vendor API to confirm the secret is live), “Analyze” mode that enumerates what the credential can access (IAM permissions, resources, owner).
Scope: Git, GitHub, GitLab, filesystem, S3, GCS, Docker images, Jira, Slack, Confluence, Postman, Jenkins logs, Circle CI logs, and more.
Modes: --results=verified (only show live secrets), --only-verified, --no-verification for air-gapped scans.
Enterprise: continuous monitoring of Git, Jira, Slack, Confluence, Teams, SharePoint.
License: AGPL-3.0 (open source), enterprise commercial product for continuous scanning.

Typical invocation:

trufflehog git https://github.com/org/repo --results=verified
trufflehog github --org=myorg --include-forks --include-members
trufflehog docker --image=myimage:latest
trufflehog filesystem ./

Gitleaks

Strengths: fast, Go-native, TOML-configurable rules, extensive default ruleset, first-class pre-commit and GitHub Action integration, playground for regex development.
Scope: git, dir, stdin. Does not verify secrets against APIs.
Config layers: flag → env var → repo .gitleaks.toml → default.
Reports: JSON, CSV, JUnit, SARIF (integrates with GitHub code scanning), custom templates.
Entropy, allowlists, baselining: supports .gitleaksignore and baseline files to suppress known issues.
License: MIT.

Typical invocation:

gitleaks git -v --log-opts="--all"
gitleaks dir ./src
gitleaks git --report-format sarif --report-path leaks.sarif

detect-secrets (Yelp)

Strengths: baseline-first workflow designed for gradual adoption on legacy repos; every developer inherits a single baseline and only new secrets block CI.
Weak on: active verification (not its design goal).
Plugin model: extensible detector types (base64, hex, AWS, Slack, Azure, private keys, etc.).

ggshield (GitGuardian)

Strengths: commercial backing, 350+ detectors, near-zero false positives due to ML classification layered on regex, dashboard-driven remediation workflows, first-class remediation ticketing.
Modes: pre-commit, pre-receive, CI, IDE integration, secrets scanner for Jira/Slack/Confluence.

Semgrep (Secrets)

Strengths: AST-aware rules can detect insecure secret handling (e.g. secrets passed as URL params, secrets logged), in addition to leaked values.
Integrates: PR comments, GitHub/GitLab checks.

GitHub native

Secret scanning (advanced security): scans public and private repos for known prefix patterns.
Push protection: blocks pushes containing recognised patterns. Limited against generic secrets — MySQL and MongoDB creds were measurably not impacted.
Partner program: vendors register prefixes (e.g. ghp_) so GitHub detects and auto-revokes them on leak.

Side-by-side

Feature	TruffleHog	Gitleaks	detect-secrets	ggshield	Semgrep
Open source	AGPL	MIT	Apache	CLI yes, backend no	Yes
Active verification	Yes	No	No	Yes	No
Number of detectors	800+	~150	~25 plugins	350+	Custom rules
Docker/image scanning	Yes	No (dir mode only)	No	Yes	No
SaaS source scanning (Jira/Slack)	Enterprise	No	No	Yes	No
AST rule support	No	No	No	No	Yes
GitHub Action	Yes	Official	Community	Yes	Yes
SARIF output	Yes	Yes	Partial	Yes	Yes
Baseline workflow	Partial	Yes	Yes (primary)	Yes	Yes

Choosing

Greenfield or aggressive rotation policy: TruffleHog with verification.
Legacy monorepo, cannot rotate everything: detect-secrets baseline.
Fast CI, SARIF to GitHub code scanning: Gitleaks.
Enterprise with Slack/Jira surface: ggshield or TruffleHog Enterprise.
Want to catch insecure handling, not just values: add Semgrep.

Most mature programs run two in parallel: a fast prefix scanner at pre-commit (Gitleaks) and a verifying scanner on merge (TruffleHog).

12. AI-Era Leakage Patterns

AI coding tools have fundamentally changed the leak surface in 2024–2025.

AI coding assistants

GitHub Copilot usage grew 27% from 2023 to 2024.
Public repos using Copilot leaked secrets at 6.4% vs the 4.6% baseline — a 40% higher exposure rate.
Causes: Copilot autocompletes plausible-looking API keys from its training data, suggests placeholder keys that get committed unchanged, generates .env files with example values, and rewrites config files without awareness of .gitignore.
Anti-pattern: developers who ask Copilot “how do I call the OpenAI API” get pasted their own OpenAI key from a previous file into the suggestion, which they then commit to a new repo.

AI service secret categories (2024 to 2025 growth)

Category	YoY increase
AI service leaks overall	+81%
Brave Search API	+1,255%
Supabase (AI backend)	+992%
Firecrawl (LLM scraping)	+796%
OpenAI / Anthropic / Cohere	hundreds of percent each
HuggingFace access tokens	steady high growth

MCP (Model Context Protocol) specifically

MCP became the connective tissue between LLMs and tools in 2025. Its convention is a local JSON config file (mcp.json, claude_desktop_config.json) containing server command, arguments, and environment variables. Secrets end up in these configs by design: API keys as env.OPENAI_API_KEY, database credentials as CLI args, GitHub tokens as env.

GitGuardian found 24,008 unique secrets in MCP-related config files on public GitHub in MCP’s first year, with 2,117 verified as valid. Expect this to grow exponentially as agentic AI adoption accelerates. Treat MCP configs as first-class secret-bearing files, ignore them via .gitignore, and vault any credentials they reference.

Vibe-coding / ElevenLabs pattern

Wiz, studying the Forbes AI 50, documented a specific pattern: AI startups shipping products with ElevenLabs API keys in plaintext in public repos, often left in by a developer “vibe-coding” through a prototype and pushing the working version. 65% of AI companies studied had verified secret leaks.

AI-assisted remediation is also AI-generated

Hardcoded secrets in AI-generated code is now a recognised detection category. Anthropic, GitHub, and third parties ship linting/pre-commit hooks that specifically check AI completions for plausible-looking credentials before they get written to disk. Scanners like ggshield now have “AI code” detection modes that trigger on typical Copilot/Cursor output signatures.

13. Real-World Breaches

European Commission via Trivy (April 2026)

Vector: Supply chain compromise of Trivy (open-source vuln scanner) by threat actor “TeamPCP”.
Initial access: European Commission downloaded a compromised Trivy version on 19 March 2026 through normal update channels.
Escalation: Malicious code inside Trivy executed within the Commission’s CI/CD pipelines and harvested an AWS secret with management rights over affiliated cloud accounts.
Tradecraft: Attackers deployed TruffleHog inside the victim environment to enumerate more credentials, then called AWS STS to validate and mint session tokens. Created new persistent access keys attached to an existing IAM user.
Impact: 340 GB exfiltrated, affecting 42 internal clients and 29 other EU entities. Data dumped by extortion group ShinyHunters on 28 March.
MITRE ATT&CK: T1195.002 (Supply Chain Compromise), T1586.003 (Cloud Account Compromise), T1078.004 (Valid Cloud Accounts), T1005 (Data from Local System).
Lessons: pin GitHub Actions/binaries to SHA, not mutable tags; restrict CI/CD IAM to least privilege; monitor CloudTrail for STS anomalies and TruffleHog signatures; maintain rapid AWS credential rotation capability.

AWS S3 Ransomware

Attackers used valid AWS credentials (harvested via secret leaks) to encrypt S3 buckets with customer-supplied encryption keys and demanded ransom for decryption. This weaponises legitimate AWS features — SSE-C — once credentials leak. The defence is IAM condition keys denying s3:PutObject with x-amz-server-side-encryption-customer-algorithm.

Artifactory token exposures

GitGuardian case study: 60% of leaked Artifactory tokens were in build configs affecting production environments in pharma and energy sectors. A single leaked Artifactory admin token permits arbitrary package publication into trusted registries — a supply chain compromise primitive.

Shai-Hulud 2 (2025)

Worm-style npm supply chain attack compromising developer machines. Forensic telemetry across 6,943 compromised systems yielded 33,185 unique secrets. 59% of compromised machines were CI/CD runners. Demonstrated that “developer endpoints as credential aggregation layer” is an organisational risk, not an individual-hygiene issue.

LiteLLM supply chain attack (2025)

Compromised LiteLLM packages harvested SSH keys, cloud credentials, and API tokens specifically from machines where AI development tools were concentrated. Followed exactly the Shai-Hulud playbook. Reinforces that the AI developer stack is now a primary target.

GitHub “oops commits” (2025)

Sharon Brizinov’s scan of all public GitHub force-pushed/deleted commits since 2020 recovered thousands of active secrets including a GitHub PAT with admin on the Istio repositories. Resulted in ~$25,000 in bounties and the release of the Force Push Scanner tool. Proves: force-pushed does not equal deleted.

Capital One (2019, for context)

SSRF in a WAF → IMDSv1 credential theft → 100M+ records exfiltrated from S3. Still the canonical example of a secret-leak attack chain and the reason IMDSv2 exists.

Toyota (2022, for context)

Five years of customer data exposed after a T-Connect source code containing an access key was published to a public GitHub repository by a contractor. 296,000 customer email addresses + IDs exposed.

Uber (2022)

18-year-old social engineered an Uber employee, found hardcoded PowerShell admin credentials to Uber’s PAM solution, and pivoted to full enterprise compromise. A single hardcoded credential in a shell script.

14. Rotation & Incident Response Playbook

When (not if) a secret leaks:

Minute 0–15: Contain

Assume compromise — treat the secret as in attacker hands the moment it left your machine.
Revoke, don’t rotate first. Delete the key from the provider console or API immediately. Rotation implies the old key remains briefly valid.
For AWS: aws iam delete-access-key --access-key-id AKIA....
For GitHub: revoke the PAT and force-expire any SSH keys.
For OAuth: invalidate client secrets and revoke issued tokens.
For database creds: ALTER USER ... WITH PASSWORD ... and kill active sessions.

Minute 15–60: Investigate

Audit logs — pull CloudTrail, GitHub audit log, provider access logs for the full lifetime of the exposed key. Build a list of every action taken under it.
Check for new IAM users, access keys, SSH keys, webhooks added since exposure. These are persistence.
Scan CloudTrail for sts:GetSessionToken, iam:CreateAccessKey, iam:AttachUserPolicy.
For GitHub: check recent pushes, newly created repos, changed webhook URLs, new org members.
Scan the exposed medium (public repo, Slack channel, Docker image) for other secrets — leaks cluster.

Hour 1–4: Eradicate

Rotate any credentials the compromised key could reach (blast radius).
Delete persistence artefacts (attacker-created users, keys, lambdas, ECS tasks).
Review network egress logs for data exfiltration.
Snapshot affected systems for forensics before further changes.

Day 1–3: Recover & report

Replace the secret in all legitimate consumers via your secrets manager.
Post-mortem: how did the secret get committed? Where were the detection layers supposed to catch this? Which of them failed and why?
Notify regulators, customers, and partners as required (GDPR 72-hour rule, SEC 4-day rule for public companies).
File a CVE if the vulnerability was in your software.

Week 1+: Harden

Add the leak pattern to your secret scanner as a custom rule.
Add the failure mode to CI gates.
Migrate the secret class from static to dynamic or short-lived.
Run the Force Push Scanner / historical scans across all your public repos.

Rotation patterns

Pattern	Description
Gradual rotation	Introduce new secret, phase out old over a deprecation window
Dual-key (write-new, read-old)	Support two valid secrets simultaneously during migration
Dynamic secrets	Vault mints a new secret per application startup; expires on shutdown
Scheduled rotation	Secrets manager rotates on cron (AWS Secrets Manager supports native Lambda-based rotation)
Event-driven rotation	Rotate on every deploy, suspicion, or leak

Dynamic secrets are the strategic target state: an application requests its DB password at startup, the vault generates a temporary credential scoped to that session, and revokes it when the session ends. Any exfiltrated credential becomes worthless within minutes.

15. Vaults & Secret Managers

HashiCorp Vault

Architecture: central server (HA via Consul/Raft), token-based auth, pluggable auth methods (Kubernetes, AWS IAM, OIDC, LDAP, AppRole), pluggable secret engines (KV, database, PKI, SSH, AWS, Azure, GCP, Transit).
Dynamic secrets: Vault generates short-lived DB users, AWS STS tokens, SSH certificates on demand.
Transit engine: encryption-as-a-service — the application sends plaintext, Vault returns ciphertext, plaintext is never stored.
Leasing & revocation: every secret has a TTL; expired leases are proactively revoked.
Auditing: every access is logged to append-only audit devices.
Unsealing: Vault starts sealed; Shamir’s Secret Sharing splits the master key into N of M shards that must be combined to unseal (or use auto-unseal with AWS KMS / GCP KMS / HSM).

Typical Vault flow:

# App authenticates via Kubernetes SA token
vault write auth/kubernetes/login role=myapp jwt=$SA_TOKEN

# Request a short-lived DB credential
vault read database/creds/myapp-role
# returns username + password valid for 1 hour

AWS Secrets Manager

Native rotation: Lambda-based rotators for RDS, Redshift, DocumentDB; custom Lambdas for anything else.
IAM-native auth: access controlled by IAM policies, no separate auth plane.
Integration: ECS task secrets, Lambda env injection, Parameter Store for non-sensitive config.
Cost: $0.40/secret/month + $0.05/10k API calls.
Rotation pattern (4 phases): createSecret → setSecret → testSecret → finishSecret. The Lambda moves a new version through AWSPENDING to AWSCURRENT while the old version becomes AWSPREVIOUS.

Azure Key Vault

Three object types: Keys (HSM-backed crypto), Secrets (generic blobs), Certificates (full lifecycle).
Authentication: Azure AD / Managed Identities — a workload with a Managed Identity requests tokens from IMDS and authenticates to Key Vault without ever seeing a secret.
Access control: legacy access policies OR Azure RBAC (recommended) with roles like Key Vault Secrets User.
Soft delete & purge protection: deleted secrets recoverable for 7–90 days, purge protection blocks permanent deletion.
HSM tier: keys backed by FIPS 140-2 Level 2 or Level 3 hardware.

Google Secret Manager

IAM-native, simple REST API, versioning, automatic replication across regions.
Direct integration with Cloud Run, GKE, Cloud Functions for env-injected secrets.

Comparison

Feature	Vault	AWS SM	Azure KV	GCP SM
Dynamic secrets	Yes (many engines)	RDS only	No	No
Cloud-agnostic	Yes	No	No	No
HSM-backed	Enterprise	No (separate KMS)	Yes (Premium)	Via Cloud HSM
PKI / cert mgmt	Yes	ACM separate	Yes	CAS separate
Encryption-as-a-service	Yes (Transit)	No (KMS separate)	No	No
Native Kubernetes auth	Yes	IRSA	Workload Identity	Workload Identity
Self-hosted option	Yes	No	No	No
Cost model	Self-host free / Ent. licence	$0.40/secret/mo	Per operation	Per operation

Rules of thumb:

Single-cloud workload: use the native cloud secret manager. Simpler auth, fewer moving parts, cheaper.
Multi-cloud or hybrid: Vault wins. One control plane, one audit log, portable.
Need dynamic DB credentials across cloud providers: Vault.
Need HSM-backed key operations for compliance: Azure Key Vault Premium or AWS CloudHSM.
Tiny shop, one AWS account: AWS Secrets Manager is hard to beat.

Anti-patterns seen in the wild

5.1% of repositories using secrets managers still leaked secrets (GitGuardian 2024). Having a vault doesn’t help if devs also commit the key to git.
Storing the vault’s own bootstrap credentials in the vault (circular dependency). Use break-glass out-of-band storage.
Granting vault:* or secretsmanager:* to CI roles. Scope per-path, per-secret.
Not auditing which non-human identities still hold long-lived tokens to the vault itself.

16. Developer Hygiene & Prevention

Pre-commit

Single highest-ROI control. Install once, runs on every git commit:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.24.2
    hooks:
      - id: gitleaks
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.5.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

Enforce centrally — pre-commit is bypassable with --no-verify, so also run the same check server-side in CI as a gate.

IDE integration

ggshield has VS Code and JetBrains plugins that scan on save.
TruffleHog has a VS Code extension.
GitHub Copilot and Cursor now both refuse to autocomplete strings that match known secret patterns (implemented 2025).

CI gates

Every PR runs:

Gitleaks (fast pattern scan)
TruffleHog with --results=verified (live verification on changed files)
SBOM + dependency scan (because leaks often enter through a compromised dep)
Container image scan if Docker build

Block merges on verified-secret findings. Allow annotation-based suppression only with explicit security team approval and a ticket reference.

Push protection

Enable GitHub’s Push Protection at the org level. It blocks known prefix patterns (GitHub PATs, AWS keys, Stripe keys, etc.). Remember its known blind spots — generic secrets, MySQL/MongoDB URLs.

`.gitignore` hygiene

Ship a standard template that excludes:

.env
.env.*
!.env.example
*.pem
*.key
*.p12
*.pfx
*.keystore
*.jks
.aws/credentials
.aws/config
id_rsa
id_ed25519
*.tfstate
*.tfstate.backup
.terraform/
secrets.yml
credentials.json
service-account.json
.npmrc
.pypirc
.mcp.json
claude_desktop_config.json

Training & culture

Run tabletop exercises: “a key leaked 30 minutes ago, walk me through the first hour.”
Publish a one-click revocation runbook for every credential type the org uses.
Celebrate people who self-report leaks; never punish. The alternative is concealment.
Track MTTR on secret incidents as a headline metric.

Honeytokens

Deploy intentionally-leaked fake credentials throughout the codebase and infrastructure. Any use of them is, by definition, unauthorised. GitGuardian, Canarytokens, and AWS IAM canaries provide turn-key honeytokens. During the Shai-Hulud investigation, honeytokens on developer workstations gave the cleanest telemetry about what attackers actually did post-compromise.

Shift-left without developer fatigue

Don’t ship a scanner that produces 1,000 findings on day one. Adopt a baseline, fix new findings only, and work the backlog on a schedule.
Fast feedback: pre-commit must finish in < 2 seconds for the common case or developers will disable it.
Clear remediation path: every finding must link to “how do I fix this” with an approved vault pattern for the team’s stack.
Measurable quiet-hours: no new findings for N days = the control is working.

17. Non-Human Identity Governance

The industry’s 2025–2026 conclusion from the sprawl data: secret scanning is necessary but insufficient. What security teams actually need is Non-Human Identity (NHI) governance.

The three questions

What non-human identities exist in my environment? (service accounts, IAM roles, API keys, OAuth apps, SSH keys, SA tokens, machine users)
Who owns each one? (not “which team inherited it five reorgs ago”)
What can each one access? (effective permissions, including via role chains)

Most orgs cannot answer any of the three at scale. NHIs now outnumber human users by 10–50x in typical cloud environments, and they are overwhelmingly long-lived, over-privileged, and un-owned.

Building an NHI programme

Discover: enumerate every identity source — cloud IAM, GitHub/GitLab PATs, OAuth apps, service principals, SSH keys, SA tokens, vendor API keys.
Attribute: assign an owning team (human) to each. Anything un-attributable after 30 days = candidate for retirement.
Scope: measure effective permissions and data access per identity.
Lifecycle: enforce creation → rotation → expiration → revocation flows. No identity exists without an expiration date and a rotation policy.
Govern: short-lived credentials by default. Long-lived only with explicit risk exception.
Monitor: alert on anomalous use (new IP, unusual API call mix, geographic impossibility).

Moving from static to identity-driven

Old	New
Hardcoded AWS access keys	IAM Roles + STS + IRSA (EKS) / IMDSv2 (EC2)
Long-lived GitHub PATs	GitHub Actions OIDC → cloud role assumption
Static DB passwords	Vault dynamic database credentials
Service account key JSON (GCP)	Workload Identity Federation
Azure service principal secrets	Managed Identities + IMDS
Embedded Slack tokens	Slack OAuth app with workspace installs

GitHub Actions OIDC is the canonical 2024–2026 pattern: your workflow requests a short-lived OIDC token from GitHub, exchanges it with AWS STS / Azure AD / GCP STS for a short-lived cloud credential, and uses that. No long-lived secrets stored in GitHub Actions variables. Every major CI system now supports the same pattern.

18. Quick Reference

Secret discovery one-liners

# Scan current repo history
trufflehog git file://. --results=verified
gitleaks git -v

# Scan remote repo
trufflehog git https://github.com/org/repo --results=verified

# Scan org
trufflehog github --org=myorg --include-forks --include-members

# Scan Docker image
trufflehog docker --image=myimage:latest

# Scan filesystem (downloaded JS bundle / APK)
trufflehog filesystem ./extracted_bundle

# Scan S3 bucket
trufflehog s3 --bucket=mybucket

# Pre-commit check
gitleaks protect --staged

# Scan stdin
cat suspicious_file | gitleaks stdin

AWS key triage

aws sts get-caller-identity                        # whose key is this?
aws iam list-attached-user-policies --user-name X  # effective perms
aws iam list-access-keys --user-name X             # other keys?
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA... \
  --max-results 50                                 # usage history

GitHub token triage

curl -H "Authorization: token $GH_TOKEN" https://api.github.com/user
curl -H "Authorization: token $GH_TOKEN" https://api.github.com/user/orgs
# Check token scopes via response header: X-OAuth-Scopes

Quick impact assessment on a leaked key

Who owns it? (org, team, individual)
What is its scope? (effective permissions)
When was it created?
When was it last used? From where?
What data can it reach?
Was it ever used from an unexpected IP / time / region?
Are there persistence artefacts created under it?
Has it been rotated?
Who has been notified?
What’s the post-mortem action to prevent recurrence?

File patterns that frequently leak secrets

.env  .env.local  .env.production  .env.backup
config.json  config.yml  settings.py  local.settings.json
application.properties  application.yml
docker-compose.yml  docker-compose.override.yml
kubeconfig  .kube/config
.aws/credentials  .aws/config
credentials.json  service-account.json  sa.json
id_rsa  id_ed25519  *.pem  *.key  *.pfx  *.p12  *.jks
.npmrc  .pypirc  .gem/credentials
wp-config.php  parameters.yml
terraform.tfstate  *.tfvars
.git-credentials  .netrc
.mcp.json  claude_desktop_config.json
*.sql  *.dump  *.bak

Red flags in code review

String API_KEY = "..." with a long alphanumeric literal
os.environ.get("KEY", "default-fallback-with-real-value")
requests.get(url, headers={"Authorization": "Bearer ..."}) with hardcoded bearer
curl -H "X-API-Key: ..." in shell scripts
git log --all -p | grep -i "password\|secret\|api_key" returning hits
Deleted lines removing a key without a rotation commit
.env.example files that contain real values (“for testing”)
Comments like // TODO: remove before commit
Long base64 or hex strings next to keywords like key, token, secret

Mental model

Every secret has a half-life. A key in your vault, scoped tight, rotated weekly, and monitored = long half-life. A key in git, on Docker Hub, in a public gist, or in a mobile binary = zero half-life, already compromised, must be rotated now.

Detection is not remediation. 64% of secrets leaked in 2022 are still valid in 2026. The rotation you don’t do is the breach you will have.

If a scanner finds it, an attacker already found it.

Compiled from 30 research articles under ~/Documents/obsidian/chs/raw/Secrets/ covering GitGuardian State of Secrets Sprawl 2025 & 2026, OWASP Secrets Management Cheat Sheet, TruffleHog & Gitleaks documentation, Wiz Forbes AI 50 research, Truffle Security’s “oops commits” analysis, the CERT-EU Trivy/European Commission breach advisory, Shai-Hulud 2 and LiteLLM supply chain post-mortems, AI-era hardcoded-secrets research, and practitioner commentary on secret scanning tooling and vaults. Defensive security reference material.

Comprehensive Secrets Management & Leakage Guide#

Table of Contents#

1. Fundamentals & Impact#

2. Threat Landscape & Statistics#

The sprawl curve#

Where the leaks actually live#

Top leaked secret categories (2024–2025)#

AI amplification#

3. Leak Locations & Attack Surface#

Source code hosts#

Beyond source code#

Developer endpoints as aggregation layer#

4. Secret Types & Regex Signatures#

Common prefixes and formats#

Generic secret detection (the hard part)#

5. JavaScript Bundle Extraction#

What leaks in JS bundles#

Extraction workflow#

Why this keeps happening#

6. Mobile App Secret Extraction#

Android (APK)#

iOS (IPA)#

Mitigations that actually help#

7. Cloud Metadata Exfiltration#

AWS#

Azure#

GCP#

Kubernetes#

Exfiltration once credentials are obtained#

8. Environment Variable & File Leakage#

.env files#

Environment variable leakage#

Defensive patterns#

9. JWT Leaks & Validation Failures#

Common JWT leak vectors#

Validation failures that turn a leak into a breach#

Defensive JWT handling#

10. Git History Mining#

The “oops commits” problem#

Historical scanning workflow#

Removing secrets from history (cautiously)#

11. Secret Scanners Compared#

TruffleHog#

Gitleaks#

detect-secrets (Yelp)#

ggshield (GitGuardian)#

Semgrep (Secrets)#

GitHub native#

Side-by-side#

Choosing#

12. AI-Era Leakage Patterns#

AI coding assistants#

AI service secret categories (2024 to 2025 growth)#

MCP (Model Context Protocol) specifically#

Vibe-coding / ElevenLabs pattern#

AI-assisted remediation is also AI-generated#

13. Real-World Breaches#

European Commission via Trivy (April 2026)#

AWS S3 Ransomware#

Artifactory token exposures#

Shai-Hulud 2 (2025)#

LiteLLM supply chain attack (2025)#

GitHub “oops commits” (2025)#

Capital One (2019, for context)#

Toyota (2022, for context)#

Uber (2022)#

14. Rotation & Incident Response Playbook#

Minute 0–15: Contain#

Minute 15–60: Investigate#

Hour 1–4: Eradicate#

Day 1–3: Recover & report#

Week 1+: Harden#

Rotation patterns#

15. Vaults & Secret Managers#

HashiCorp Vault#

AWS Secrets Manager#

Azure Key Vault#

Google Secret Manager#

Comparison#

Anti-patterns seen in the wild#