Comprehensive Secrets Management & Leakage Guide#
A practitioner’s reference for secrets sprawl, credential leakage, detection, remediation, and hardening. Compiled from 30 research sources covering GitGuardian State of Secrets Sprawl 2025/2026, OWASP Secrets Management Cheat Sheet, TruffleHog, Gitleaks, real-world breaches (Trivy/European Commission, Shai-Hulud, LiteLLM), AI-era leakage patterns, and vault/NHI governance guidance.
Table of Contents#
- Fundamentals & Impact
- Threat Landscape & Statistics
- Leak Locations & Attack Surface
- Secret Types & Regex Signatures
- JavaScript Bundle Extraction
- Mobile App Secret Extraction
- Cloud Metadata Exfiltration
- Environment Variable & File Leakage
- JWT Leaks & Validation Failures
- Git History Mining
- Secret Scanners Compared
- AI-Era Leakage Patterns
- Real-World Breaches
- Rotation & Incident Response Playbook
- Vaults & Secret Managers
- Developer Hygiene & Prevention
- Non-Human Identity Governance
- Quick Reference
1. Fundamentals & Impact#
A secret is any credential a machine or human uses to authenticate itself to another system: API keys, database passwords, private encryption keys, OAuth client secrets, tokens, SSH keys, TLS certificates, IAM credentials, webhook URLs, and service account JSON. Secrets are the connective tissue of modern distributed architectures, and they are simultaneously the shortest path from reconnaissance to full account takeover.
Impact spectrum of a leaked secret:
| Stage | Example |
|---|---|
| Recon | Attacker grabs a leaked key, enumerates scope via CLI (aws sts get-caller-identity, gh auth status) |
| Lateral expansion | Pivot from one API key to connected services (S3, RDS, Slack, private repos) |
| Data exfiltration | Download customer PII, source code, models, or training data |
| Persistence | Create new access keys, add SSH keys, invite IAM users |
| Supply chain | Push malicious packages to npm/PyPI, poison Docker images, tamper with CI/CD |
| Monetisation | Crypto mining on stolen cloud credentials, ransom of S3 buckets, resale on dark web |
Why it matters:
- Over the past 10 years (Verizon DBIR), stolen credentials appear in 31% of all breaches.
- IBM: breaches involving stolen or compromised credentials take an average of 292 days to identify and remediate — nearly a full year of attacker dwell time.
- GitGuardian found 70% of secrets leaked in 2022 are still valid in 2025, and 64% of 2022 leaks remain exploitable in 2026. Detection is not remediation.
- 35% of all private repositories scanned contained at least one plaintext secret; 32.2% of internal repos contain a hardcoded secret compared to 5.6% of public repos — internal repos are the highest-value target once an attacker gets a foothold.
A single leaked AWS IAM key, GitHub PAT, or Slack bot token has repeatedly produced full cloud account takeover, supply chain compromise, and nine-figure breach costs. Secrets are not an edge case — they are the primary initial-access vector of the 2020s.
2. Threat Landscape & Statistics#
The sprawl curve#
| Year | New hardcoded secrets on public GitHub | YoY change |
|---|---|---|
| 2021 | ~6M | baseline |
| 2023 | ~19M | — |
| 2024 | 23.77M | +25% |
| 2025 | ~29M (GitGuardian State of Secrets Sprawl 2026) | +34% — largest single-year jump ever recorded |
Since 2021, leaked secrets have grown 152%, while GitHub’s public developer base expanded only 98%. Secrets sprawl is outrunning developer growth.
Where the leaks actually live#
- 5.6% of public repos contain a secret.
- 32.2% of internal repos contain a secret (6x rate of public).
- 18% of scanned public Docker images contain secrets; 15% of those are valid.
- 28% of all incidents originate outside source code — Slack, Jira, Confluence, Teams.
- Collaboration-tool-only leaks are more severe: 56.7% rated critical vs 43.7% for code-only.
- 7,000+ valid AWS keys remain exposed on Docker Hub.
- 100,000 valid secrets in GitGuardian’s analysis of 15M public Docker images, including Fortune 500 AWS keys and GitHub tokens.
- Self-hosted GitLab & Docker registries expose secrets at 3–4x the rate of public GitHub.
- 15% of commit authors on public GitHub leaked a secret at least once.
Top leaked secret categories (2024–2025)#
- AWS IAM access keys
- Slack webhooks and bot tokens
- Azure AD API keys / service principal secrets
- GitHub PATs and fine-grained tokens
- MongoDB / MySQL / PostgreSQL connection strings (no standard prefix → hard to detect)
- Stripe keys, SendGrid, Twilio
- Generic / custom API keys (fastest-growing, hardest to scan)
- OpenAI, Anthropic, and other LLM provider keys
AI amplification#
- 29 million new hardcoded secrets in 2025 (+34% YoY).
- 1,275,105 leaked secrets tied specifically to AI services (+81% YoY).
- Eight of the ten fastest-growing leak categories are AI-related.
- Brave Search API keys: +1,255% YoY.
- Firecrawl: +796% YoY. Supabase: +992% YoY.
- Public repos using GitHub Copilot had a 6.4% secret leakage rate, vs ~4.6% baseline.
- Wiz audited the Forbes AI 50: 65% had leaked verified secrets on GitHub, frequently in deleted forks, gists, and personal developer repos.
- MCP (Model Context Protocol) config files exposed 24,008 unique secrets in 2025 (2,117 validated) in their first year.
3. Leak Locations & Attack Surface#
Source code hosts#
| Platform | Notes |
|---|---|
| GitHub public repos | Largest visible attack surface; push protection helps only for prefixed keys |
| GitHub internal/private repos | 6x more secret-dense than public; often misconfigured to public |
| GitHub Gists | Frequently ignored by scanning; personal devs paste snippets with embedded keys |
| GitHub forks | Force-deleted forks remain in the commit graph (see “oops commits”) |
| GitLab (SaaS & self-hosted) | Self-hosted instances frequently exposed to internet with default creds |
| Bitbucket | Less scanning tooling, frequently forgotten legacy repos |
| Azure DevOps | Pipeline variables, var groups, wiki pages |
Beyond source code#
| Surface | Leak vector |
|---|---|
| Docker Hub / GHCR | Secrets baked into layers via ENV, ARG, or forgotten COPY . |
| npm / PyPI / crates.io packages | .env, .npmrc, tarball artifacts |
| CI/CD logs | Jenkins, GitHub Actions, CircleCI echoing secrets or masking failures |
| Slack / Teams / Discord | Pasted credentials during incident response and onboarding |
| Jira / Confluence / Notion | “Temporary” creds in runbooks that never get removed |
| Shodan / Censys | Exposed .env, .git, config endpoints on the open internet |
| Pastebin / GitHub code search | Intentional and accidental dumps |
| Wayback Machine / Google cache | Historical copies of pages that briefly exposed keys |
| Mobile app bundles (APK/IPA) | Hardcoded API keys in decompiled smali / binary strings |
| JavaScript bundles | Keys in SPA main.js, chunk-*.js, source maps |
| Backup files | .sql, .bak, .tar.gz, config.php.swp, .DS_Store |
| Browser dev tools | Tokens visible in localStorage, sessionStorage, cookies |
| Artifacts & build outputs | WAR, JAR, PyInstaller .exe, Electron app.asar |
| Kubernetes manifests | stringData: in Secret resources stored in git |
| Terraform state files | terraform.tfstate committed with plaintext provider credentials |
Developer endpoints as aggregation layer#
The Shai-Hulud 2 supply chain incident gave rare telemetry across 6,943 compromised systems:
- 294,842 secret occurrences observed
- 33,185 unique secrets
- Average secret appeared in 8 different locations per machine (.env, shell history,
~/.aws/credentials, IDE configs,.git-credentials, cached tokens, build artifacts) - 59% of compromised machines were CI/CD runners, not personal laptops
Once secrets sprawl into build infrastructure, the blast radius becomes organisational, not individual.
4. Secret Types & Regex Signatures#
Detection tools work by combining pattern matching (regex + prefix), entropy analysis (Shannon entropy over the candidate substring), and live verification (attempting to authenticate).
Common prefixes and formats#
| Provider | Pattern | Example |
|---|---|---|
| AWS Access Key ID | AKIA[0-9A-Z]{16} | AKIAIOSFODNN7EXAMPLE |
| AWS Secret Access Key | [A-Za-z0-9/+=]{40} (entropy-based) | — |
| AWS Session Token | ASIA[0-9A-Z]{16} | ASIAxxx… |
| GitHub PAT (classic) | ghp_[A-Za-z0-9]{36} | — |
| GitHub fine-grained | github_pat_[A-Za-z0-9_]{82} | — |
| GitHub OAuth | gho_[A-Za-z0-9]{36} | — |
| GitHub app install | ghs_[A-Za-z0-9]{36} | — |
| Slack bot token | xoxb-[0-9]+-[0-9]+-[A-Za-z0-9]+ | — |
| Slack user token | xoxp-… | — |
| Slack webhook | https://hooks.slack.com/services/T[A-Z0-9]+/B[A-Z0-9]+/[A-Za-z0-9]+ | — |
| Stripe live secret | sk_live_[A-Za-z0-9]{24,} | — |
| Stripe restricted | rk_live_[A-Za-z0-9]{24,} | — |
| Google API key | AIza[0-9A-Za-z_-]{35} | — |
| Google OAuth client secret | GOCSPX-[A-Za-z0-9_-]{28} | — |
| OpenAI | sk-[A-Za-z0-9]{48} / sk-proj-… | — |
| Anthropic | sk-ant-api03-[A-Za-z0-9_-]{95} | — |
| HuggingFace | hf_[A-Za-z0-9]{34} | — |
| Twilio | SK[0-9a-fA-F]{32} / AC account SID | — |
| SendGrid | SG\.[A-Za-z0-9_-]{22}\.[A-Za-z0-9_-]{43} | — |
| NPM token | npm_[A-Za-z0-9]{36} | — |
| PyPI token | pypi-AgEIcHlwaS5vcmc[A-Za-z0-9_-]+ | — |
| JWT | eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+ | — |
| Private keys | `—–BEGIN (RSA | OPENSSH |
Generic secret detection (the hard part)#
Generic secrets (api_key = "...", password = "...", arbitrary database URLs) have no standard prefix and are the fastest-growing leak category. Detection strategies:
- Keyword proximity + entropy: look for
secret,key,token,password,passwd,pwd,auth,credentials,api_keywithin N characters of a high-entropy string. - Shannon entropy threshold: typically >= 3.5 for base64, >= 4.5 for hex-like strings.
- Connection-string parsers:
mysql://user:pass@host,postgres://…,mongodb+srv://…,redis://:pass@…. - ML-assisted classifiers: GitGuardian and TruffleHog both layer ML models on top of regex for generic secret classification.
- Live verification: the definitive signal — TruffleHog’s “verified” mode attempts authentication against the relevant API before raising an alert.
GitHub Push Protection struggles with generic secrets — MySQL and MongoDB credentials were measurably not impacted by Push Protection rollout because they lack a standardized prefix.
5. JavaScript Bundle Extraction#
Single-page applications frequently ship secrets inside compiled JS bundles under the mistaken belief that minification is obfuscation. A bundled SPA is client-side code — anything in it is public.
What leaks in JS bundles#
- Firebase configs (
apiKey,authDomain,projectId) — some fields are intended to be public, butdatabaseURLwith open rules leads to full read/write - Stripe publishable and accidental secret keys
- Algolia admin keys (vs the intended search-only key)
- Mapbox, Google Maps server keys
- Segment write keys with elevated scopes
- Hardcoded JWT signing secrets (symmetric HS256)
- AWS Cognito unauth pool IDs leading to IAM assumption
- Backend base URLs, staging endpoints, internal API paths
Extraction workflow#
- Crawl the target with a headless browser or
katana/hakrawlerand collect every.js,.mjs,.map, andchunk-*.js. - Fetch source maps (
.mapfiles) where available — they reconstruct original source trees including comments that often reveal keys. - Run secret scanners over collected JS:
trufflehog filesystem ./jsgitleaks dir ./jsSecretFinder,LinkFinder(Python, classic toolkit)jsluice(newer, AST-based extractor for URLs, params, and secrets)
- Beautify bundles with
js-beautifyto reveal string literals hidden by minification. - Grep for high-value markers:
firebaseConfig,accessKeyId,process.env,Bearer,Authorization. - Diff historical versions from Wayback Machine — developers often remove keys after disclosure without rotation.
Why this keeps happening#
Webpack/Vite inline process.env.* at build time when using define: plugins. A developer setting VITE_API_SECRET in .env and referencing it as import.meta.env.VITE_API_SECRET ships the value in main.js. The NEXT_PUBLIC_ convention is explicit about this; the VITE_ convention is not always respected.
6. Mobile App Secret Extraction#
Mobile apps are binary blobs distributed to every user — treat them as adversarial-read by default.
Android (APK)#
Extraction pipeline:
- Pull the APK:
adb shell pm path <pkg>thenadb pull. - Unpack with
apktool d app.apk— yieldssmali/,res/,AndroidManifest.xml,assets/. - Decompile DEX to Java with
jadx -d out app.apkfor readable source. - Scan with secret scanners:
trufflehog filesystem out/- Custom grep for
api_key,password,BuildConfig,R.string.
- Check
res/values/strings.xml— developers routinely put API keys here. - Check
assets/for bundled.env,.json,.properties,config.js. - Extract strings from native
.solibraries:strings lib/arm64-v8a/*.so | grep -iE 'key|token|secret'.
Common APK secret locations:
BuildConfig.java— Gradle build-time constantsstrings.xmlresources.propertiesfiles underassets/- Hardcoded in Java/Kotlin via
String KEY = "..." - Certificate pinning bypass targets that leak API keys
iOS (IPA)#
Extraction pipeline:
- Pull IPA from a jailbroken device or from a decrypted source (
frida-ios-dump,bagbak). - Unzip IPA — contains
Payload/App.app/bundle. - Inspect the main executable with
stringsorotool. - Class-dump Objective-C metadata:
class-dumporclassdumpios. - For Swift, use
HopperorGhidra; Swift symbols are mangled but string constants remain. - Check
Info.plistand embedded.plistfiles for API keys. - Check
Assets.carand bundled resources.
Runtime extraction: Frida scripts hook NSString allocation or SecItem keychain calls to dump secrets at runtime, bypassing any at-rest obfuscation.
Mitigations that actually help#
- Never embed a production secret in a mobile binary. Full stop.
- Use short-lived tokens minted by your backend after user authentication.
- For Firebase/GCP, use App Check / DeviceCheck to bind requests to genuine app installs.
- For Android, use Play Integrity API; iOS uses App Attest.
- Where client-side crypto is required, derive keys from user credentials + PBKDF2/Argon2.
- Server-side authorisation is the only real defence — client-side obfuscation is speed bump, not a gate.
7. Cloud Metadata Exfiltration#
When an attacker gains SSRF, RCE, or code execution inside a cloud workload, the Instance Metadata Service (IMDS) becomes the single highest-value internal target: it hands out short-lived IAM credentials that the workload itself is entitled to use.
AWS#
| Endpoint | Returns |
|---|---|
http://169.254.169.254/latest/meta-data/ | Index of metadata categories |
http://169.254.169.254/latest/meta-data/iam/security-credentials/ | IAM role names attached to the instance |
http://169.254.169.254/latest/meta-data/iam/security-credentials/<role> | AccessKeyId, SecretAccessKey, Token (STS session credentials) |
http://169.254.169.254/latest/user-data/ | EC2 user-data script (often contains bootstrap secrets) |
http://169.254.169.254/latest/dynamic/instance-identity/document | Account ID, region, instance ID |
IMDSv2 (token-based) is the mitigation: attacker must first PUT to /latest/api/token with X-aws-ec2-metadata-token-ttl-seconds, receive a token, then include it as X-aws-ec2-metadata-token on subsequent requests. Many SSRF primitives cannot issue PUT or arbitrary headers, and are thus blocked. Enforce IMDSv2 cluster-wide via the HttpTokens=required launch template setting.
Azure#
| Endpoint | Returns |
|---|---|
http://169.254.169.254/metadata/instance?api-version=2021-02-01 (requires Metadata: true header) | Instance metadata |
http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/ | Managed Identity access token |
http://169.254.169.254/metadata/identity/oauth2/token?...&resource=https://vault.azure.net | Key Vault access token |
The required Metadata: true header and resource parameter are Azure’s lightweight mitigation. Once an attacker obtains a Managed Identity token for Key Vault, every secret and key the MI can reach is accessible.
GCP#
| Endpoint | Returns |
|---|---|
http://metadata.google.internal/computeMetadata/v1/ (requires Metadata-Flavor: Google) | Metadata index |
http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token | OAuth2 access token for the attached service account |
http://metadata.google.internal/computeMetadata/v1/project/attributes/ | Project-level SSH keys, metadata |
Kubernetes#
| Location | Secret |
|---|---|
/var/run/secrets/kubernetes.io/serviceaccount/token | Pod service-account JWT |
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt | Cluster CA |
https://kubernetes.default.svc/api/v1/namespaces/<ns>/secrets (with SA token) | All secrets the SA can read |
Kubelet :10250/pods (if unauthenticated) | Pod metadata including env vars |
Exfiltration once credentials are obtained#
aws sts get-caller-identity # confirm scope
aws iam list-attached-role-policies # enumerate permissions
aws s3 ls # list buckets
aws secretsmanager list-secrets # enumerate centrally-stored secrets
aws ssm describe-parameters # SSM Parameter Store often holds legacy creds
A single SSRF → IMDS chain routinely produces full AWS account takeover. The Capital One 2019 breach (100M+ records) is the canonical case: a WAF SSRF → IMDSv1 → IAM credentials → S3 exfiltration.
8. Environment Variable & File Leakage#
Secrets migrated out of source code typically landed in .env files and process environment variables. Both introduce new leak channels.
.env files#
- Committed to git by developers who forgot
.gitignore. - Shipped inside Docker images via
COPY . .. - Exposed by web servers serving the project directory root (
https://victim.com/.envreturns 200 with full file). - Present in backup tarballs accidentally made public on S3 or FTP.
- Loaded by Next.js, Laravel, Rails, and Django during
dotenvinitialisation, then echoed into error pages and debug toolbars.
Recon one-liners (defensive — to test your own assets):
curl https://target/.envcurl https://target/.env.backupcurl https://target/.env.localcurl https://target/config/.envcurl https://target/.git/config
Environment variable leakage#
| Channel | Mechanism |
|---|---|
| Error pages | Django DEBUG=True, Flask debug toolbar, Rails WEB_CONSOLE, Laravel Ignition — all dump full env on exception |
| phpinfo() | Leftover diagnostic file dumps $_ENV, $_SERVER, including DB creds |
/proc/self/environ | On LFI, reading this file under a worker process returns that worker’s env vars |
/proc/<pid>/environ | Same, for other processes with appropriate permissions |
| Docker image history | docker history <image> leaks ENV directives from Dockerfile layers |
| Kubernetes API | kubectl describe pod shows env vars that reference configMapKeyRef but inlined value: entries leak |
| Process listing | ps auxe reveals env; long-running daemons started with --secret=... as argv leak via /proc/<pid>/cmdline |
| APM / logging stacks | Datadog, Sentry, New Relic, Elastic APM frequently capture full env on error |
| Core dumps | /var/crash/*.core contains full process memory including secrets |
Defensive patterns#
- Never set secrets as
ENVin Dockerfiles; use runtime--env-fileor orchestrator secret mounts. - Tmpfs-mount
.envfiles in containers; never bake them into the image. - Disable
DEBUGin production, verify explicit error pages. - Scrub env vars before sending events to APM/logging. Sentry, Datadog, and Honeycomb all have allowlist/denylist filters.
- Use memory-safe primitives: in Java/.NET, prefer
byte[]/char[]overString, zero the buffer after use. Strings are immutable and cannot be reliably garbage-collected.
9. JWT Leaks & Validation Failures#
JWTs are bearer tokens — anyone possessing one can use it. They are also among the most commonly leaked secrets because they appear liberally in logs, browser storage, and URLs.
Common JWT leak vectors#
- Stored in
localStorage(accessible from any XSS). - Passed in URL query strings (
?token=eyJ...) — logged by every proxy, analytics tool, and browser history. - Copied into Slack/Jira during debugging.
- Baked into mobile apps as “API key” equivalents.
- Logged by web frameworks on request errors.
- Cached by CDNs when cache keys don’t include the
Authorizationheader.
Validation failures that turn a leak into a breach#
| Failure | Impact |
|---|---|
alg: none | Library accepts unsigned tokens |
| HS256 vs RS256 confusion | Attacker signs tokens with the RSA public key as HMAC secret |
| Weak HS256 secret | Brute force hashcat -m 16500 cracks short secrets in seconds |
Missing iss/aud validation | Tokens from sibling tenants accepted |
Expired exp not enforced | Tokens live forever |
kid injection | Attacker points to a file under their control or injects SQL |
| JKU/JWK header trust | Attacker hosts their own JWK set and forges tokens |
Defensive JWT handling#
- Store tokens in
HttpOnly; Secure; SameSite=Strictcookies, never in localStorage. - Short-lived access tokens (5–15 min), refresh token rotation with reuse detection.
- Always validate
algagainst an allowlist — never trust the header’salgfield. - Validate
iss,aud,exp,nbf,subevery request. - Rotate signing keys regularly, publish via JWKS with
kidpinning. - Never log full tokens — truncate or hash.
10. Git History Mining#
Git’s immutable log is an attacker’s treasure map. Deleting a secret in a later commit does not remove it from history; force-pushing does not remove it from GitHub’s event archive.
The “oops commits” problem#
GitHub retains every public commit, even those developers attempt to erase through force pushes, as zero-commit PushEvent entries in its event archive. Sharon Brizinov (Truffle Security) scanned all force-pushed/deleted commits since 2020 via the GH Archive BigQuery dataset and found thousands of active secrets, including:
- A GitHub Personal Access Token with admin permissions over the Istio repositories (immediate supply chain compromise potential)
- Valid MongoDB credentials, AWS keys, GitHub PATs
- Bug bounties totalling ~$25,000
Truffle Security released the open-source Force Push Scanner (https://github.com/trufflesecurity/force-push-scanner) that queries GH Archive via BigQuery and runs TruffleHog on orphaned commits. The practical conclusion: once a secret has been pushed to a public repo, it must be considered permanently compromised. Rotate, don’t hide.
Historical scanning workflow#
# Full history, all branches, all tags
git clone --mirror https://github.com/org/repo
trufflehog git file://repo.git --results=verified
gitleaks git -v repo.git
# Scan a specific commit range
gitleaks git --log-opts="--all commitA..commitB" path/
# Scan every branch including deleted refs (if still reachable)
git fetch origin '+refs/pull/*:refs/remotes/origin/pr/*'
# Dangling blob scan
git fsck --full --unreachable --no-reflogs
Removing secrets from history (cautiously)#
Historical rewrites are destructive and break every clone of the repo. They do not retroactively invalidate the leaked secret — rotation is always step one.
git filter-repo— modern replacement forgit filter-branch; faster and safer.- BFG Repo-Cleaner — single-purpose tool that removes large files and secret strings.
- After rewriting: force-push, notify all collaborators to re-clone, and assume the old history is archived somewhere (GH Archive, clones, forks, mirrors). Rotate the secret first, always.
11. Secret Scanners Compared#
TruffleHog#
- Strengths: 800+ classified detectors, active verification (hits the vendor API to confirm the secret is live), “Analyze” mode that enumerates what the credential can access (IAM permissions, resources, owner).
- Scope: Git, GitHub, GitLab, filesystem, S3, GCS, Docker images, Jira, Slack, Confluence, Postman, Jenkins logs, Circle CI logs, and more.
- Modes:
--results=verified(only show live secrets),--only-verified,--no-verificationfor air-gapped scans. - Enterprise: continuous monitoring of Git, Jira, Slack, Confluence, Teams, SharePoint.
- License: AGPL-3.0 (open source), enterprise commercial product for continuous scanning.
- Typical invocation:
trufflehog git https://github.com/org/repo --results=verified trufflehog github --org=myorg --include-forks --include-members trufflehog docker --image=myimage:latest trufflehog filesystem ./
Gitleaks#
- Strengths: fast, Go-native, TOML-configurable rules, extensive default ruleset, first-class pre-commit and GitHub Action integration, playground for regex development.
- Scope:
git,dir,stdin. Does not verify secrets against APIs. - Config layers: flag → env var → repo
.gitleaks.toml→ default. - Reports: JSON, CSV, JUnit, SARIF (integrates with GitHub code scanning), custom templates.
- Entropy, allowlists, baselining: supports
.gitleaksignoreand baseline files to suppress known issues. - License: MIT.
- Typical invocation:
gitleaks git -v --log-opts="--all" gitleaks dir ./src gitleaks git --report-format sarif --report-path leaks.sarif
detect-secrets (Yelp)#
- Strengths: baseline-first workflow designed for gradual adoption on legacy repos; every developer inherits a single baseline and only new secrets block CI.
- Weak on: active verification (not its design goal).
- Plugin model: extensible detector types (base64, hex, AWS, Slack, Azure, private keys, etc.).
ggshield (GitGuardian)#
- Strengths: commercial backing, 350+ detectors, near-zero false positives due to ML classification layered on regex, dashboard-driven remediation workflows, first-class remediation ticketing.
- Modes: pre-commit, pre-receive, CI, IDE integration, secrets scanner for Jira/Slack/Confluence.
Semgrep (Secrets)#
- Strengths: AST-aware rules can detect insecure secret handling (e.g. secrets passed as URL params, secrets logged), in addition to leaked values.
- Integrates: PR comments, GitHub/GitLab checks.
GitHub native#
- Secret scanning (advanced security): scans public and private repos for known prefix patterns.
- Push protection: blocks pushes containing recognised patterns. Limited against generic secrets — MySQL and MongoDB creds were measurably not impacted.
- Partner program: vendors register prefixes (e.g.
ghp_) so GitHub detects and auto-revokes them on leak.
Side-by-side#
| Feature | TruffleHog | Gitleaks | detect-secrets | ggshield | Semgrep |
|---|---|---|---|---|---|
| Open source | AGPL | MIT | Apache | CLI yes, backend no | Yes |
| Active verification | Yes | No | No | Yes | No |
| Number of detectors | 800+ | ~150 | ~25 plugins | 350+ | Custom rules |
| Docker/image scanning | Yes | No (dir mode only) | No | Yes | No |
| SaaS source scanning (Jira/Slack) | Enterprise | No | No | Yes | No |
| AST rule support | No | No | No | No | Yes |
| GitHub Action | Yes | Official | Community | Yes | Yes |
| SARIF output | Yes | Yes | Partial | Yes | Yes |
| Baseline workflow | Partial | Yes | Yes (primary) | Yes | Yes |
Choosing#
- Greenfield or aggressive rotation policy: TruffleHog with verification.
- Legacy monorepo, cannot rotate everything: detect-secrets baseline.
- Fast CI, SARIF to GitHub code scanning: Gitleaks.
- Enterprise with Slack/Jira surface: ggshield or TruffleHog Enterprise.
- Want to catch insecure handling, not just values: add Semgrep.
Most mature programs run two in parallel: a fast prefix scanner at pre-commit (Gitleaks) and a verifying scanner on merge (TruffleHog).
12. AI-Era Leakage Patterns#
AI coding tools have fundamentally changed the leak surface in 2024–2025.
AI coding assistants#
- GitHub Copilot usage grew 27% from 2023 to 2024.
- Public repos using Copilot leaked secrets at 6.4% vs the 4.6% baseline — a 40% higher exposure rate.
- Causes: Copilot autocompletes plausible-looking API keys from its training data, suggests placeholder keys that get committed unchanged, generates
.envfiles with example values, and rewrites config files without awareness of.gitignore. - Anti-pattern: developers who ask Copilot “how do I call the OpenAI API” get pasted their own OpenAI key from a previous file into the suggestion, which they then commit to a new repo.
AI service secret categories (2024 to 2025 growth)#
| Category | YoY increase |
|---|---|
| AI service leaks overall | +81% |
| Brave Search API | +1,255% |
| Supabase (AI backend) | +992% |
| Firecrawl (LLM scraping) | +796% |
| OpenAI / Anthropic / Cohere | hundreds of percent each |
| HuggingFace access tokens | steady high growth |
MCP (Model Context Protocol) specifically#
MCP became the connective tissue between LLMs and tools in 2025. Its convention is a local JSON config file (mcp.json, claude_desktop_config.json) containing server command, arguments, and environment variables. Secrets end up in these configs by design: API keys as env.OPENAI_API_KEY, database credentials as CLI args, GitHub tokens as env.
GitGuardian found 24,008 unique secrets in MCP-related config files on public GitHub in MCP’s first year, with 2,117 verified as valid. Expect this to grow exponentially as agentic AI adoption accelerates. Treat MCP configs as first-class secret-bearing files, ignore them via .gitignore, and vault any credentials they reference.
Vibe-coding / ElevenLabs pattern#
Wiz, studying the Forbes AI 50, documented a specific pattern: AI startups shipping products with ElevenLabs API keys in plaintext in public repos, often left in by a developer “vibe-coding” through a prototype and pushing the working version. 65% of AI companies studied had verified secret leaks.
AI-assisted remediation is also AI-generated#
Hardcoded secrets in AI-generated code is now a recognised detection category. Anthropic, GitHub, and third parties ship linting/pre-commit hooks that specifically check AI completions for plausible-looking credentials before they get written to disk. Scanners like ggshield now have “AI code” detection modes that trigger on typical Copilot/Cursor output signatures.
13. Real-World Breaches#
European Commission via Trivy (April 2026)#
- Vector: Supply chain compromise of Trivy (open-source vuln scanner) by threat actor “TeamPCP”.
- Initial access: European Commission downloaded a compromised Trivy version on 19 March 2026 through normal update channels.
- Escalation: Malicious code inside Trivy executed within the Commission’s CI/CD pipelines and harvested an AWS secret with management rights over affiliated cloud accounts.
- Tradecraft: Attackers deployed TruffleHog inside the victim environment to enumerate more credentials, then called AWS STS to validate and mint session tokens. Created new persistent access keys attached to an existing IAM user.
- Impact: 340 GB exfiltrated, affecting 42 internal clients and 29 other EU entities. Data dumped by extortion group ShinyHunters on 28 March.
- MITRE ATT&CK: T1195.002 (Supply Chain Compromise), T1586.003 (Cloud Account Compromise), T1078.004 (Valid Cloud Accounts), T1005 (Data from Local System).
- Lessons: pin GitHub Actions/binaries to SHA, not mutable tags; restrict CI/CD IAM to least privilege; monitor CloudTrail for STS anomalies and TruffleHog signatures; maintain rapid AWS credential rotation capability.
AWS S3 Ransomware#
Attackers used valid AWS credentials (harvested via secret leaks) to encrypt S3 buckets with customer-supplied encryption keys and demanded ransom for decryption. This weaponises legitimate AWS features — SSE-C — once credentials leak. The defence is IAM condition keys denying s3:PutObject with x-amz-server-side-encryption-customer-algorithm.
Artifactory token exposures#
GitGuardian case study: 60% of leaked Artifactory tokens were in build configs affecting production environments in pharma and energy sectors. A single leaked Artifactory admin token permits arbitrary package publication into trusted registries — a supply chain compromise primitive.
Shai-Hulud 2 (2025)#
Worm-style npm supply chain attack compromising developer machines. Forensic telemetry across 6,943 compromised systems yielded 33,185 unique secrets. 59% of compromised machines were CI/CD runners. Demonstrated that “developer endpoints as credential aggregation layer” is an organisational risk, not an individual-hygiene issue.
LiteLLM supply chain attack (2025)#
Compromised LiteLLM packages harvested SSH keys, cloud credentials, and API tokens specifically from machines where AI development tools were concentrated. Followed exactly the Shai-Hulud playbook. Reinforces that the AI developer stack is now a primary target.
GitHub “oops commits” (2025)#
Sharon Brizinov’s scan of all public GitHub force-pushed/deleted commits since 2020 recovered thousands of active secrets including a GitHub PAT with admin on the Istio repositories. Resulted in ~$25,000 in bounties and the release of the Force Push Scanner tool. Proves: force-pushed does not equal deleted.
Capital One (2019, for context)#
SSRF in a WAF → IMDSv1 credential theft → 100M+ records exfiltrated from S3. Still the canonical example of a secret-leak attack chain and the reason IMDSv2 exists.
Toyota (2022, for context)#
Five years of customer data exposed after a T-Connect source code containing an access key was published to a public GitHub repository by a contractor. 296,000 customer email addresses + IDs exposed.
Uber (2022)#
18-year-old social engineered an Uber employee, found hardcoded PowerShell admin credentials to Uber’s PAM solution, and pivoted to full enterprise compromise. A single hardcoded credential in a shell script.
14. Rotation & Incident Response Playbook#
When (not if) a secret leaks:
Minute 0–15: Contain#
- Assume compromise — treat the secret as in attacker hands the moment it left your machine.
- Revoke, don’t rotate first. Delete the key from the provider console or API immediately. Rotation implies the old key remains briefly valid.
- For AWS:
aws iam delete-access-key --access-key-id AKIA.... - For GitHub: revoke the PAT and force-expire any SSH keys.
- For OAuth: invalidate client secrets and revoke issued tokens.
- For database creds:
ALTER USER ... WITH PASSWORD ...and kill active sessions.
Minute 15–60: Investigate#
- Audit logs — pull CloudTrail, GitHub audit log, provider access logs for the full lifetime of the exposed key. Build a list of every action taken under it.
- Check for new IAM users, access keys, SSH keys, webhooks added since exposure. These are persistence.
- Scan CloudTrail for
sts:GetSessionToken,iam:CreateAccessKey,iam:AttachUserPolicy. - For GitHub: check recent pushes, newly created repos, changed webhook URLs, new org members.
- Scan the exposed medium (public repo, Slack channel, Docker image) for other secrets — leaks cluster.
Hour 1–4: Eradicate#
- Rotate any credentials the compromised key could reach (blast radius).
- Delete persistence artefacts (attacker-created users, keys, lambdas, ECS tasks).
- Review network egress logs for data exfiltration.
- Snapshot affected systems for forensics before further changes.
Day 1–3: Recover & report#
- Replace the secret in all legitimate consumers via your secrets manager.
- Post-mortem: how did the secret get committed? Where were the detection layers supposed to catch this? Which of them failed and why?
- Notify regulators, customers, and partners as required (GDPR 72-hour rule, SEC 4-day rule for public companies).
- File a CVE if the vulnerability was in your software.
Week 1+: Harden#
- Add the leak pattern to your secret scanner as a custom rule.
- Add the failure mode to CI gates.
- Migrate the secret class from static to dynamic or short-lived.
- Run the Force Push Scanner / historical scans across all your public repos.
Rotation patterns#
| Pattern | Description |
|---|---|
| Gradual rotation | Introduce new secret, phase out old over a deprecation window |
| Dual-key (write-new, read-old) | Support two valid secrets simultaneously during migration |
| Dynamic secrets | Vault mints a new secret per application startup; expires on shutdown |
| Scheduled rotation | Secrets manager rotates on cron (AWS Secrets Manager supports native Lambda-based rotation) |
| Event-driven rotation | Rotate on every deploy, suspicion, or leak |
Dynamic secrets are the strategic target state: an application requests its DB password at startup, the vault generates a temporary credential scoped to that session, and revokes it when the session ends. Any exfiltrated credential becomes worthless within minutes.
15. Vaults & Secret Managers#
HashiCorp Vault#
- Architecture: central server (HA via Consul/Raft), token-based auth, pluggable auth methods (Kubernetes, AWS IAM, OIDC, LDAP, AppRole), pluggable secret engines (KV, database, PKI, SSH, AWS, Azure, GCP, Transit).
- Dynamic secrets: Vault generates short-lived DB users, AWS STS tokens, SSH certificates on demand.
- Transit engine: encryption-as-a-service — the application sends plaintext, Vault returns ciphertext, plaintext is never stored.
- Leasing & revocation: every secret has a TTL; expired leases are proactively revoked.
- Auditing: every access is logged to append-only audit devices.
- Unsealing: Vault starts sealed; Shamir’s Secret Sharing splits the master key into N of M shards that must be combined to unseal (or use auto-unseal with AWS KMS / GCP KMS / HSM).
Typical Vault flow:
# App authenticates via Kubernetes SA token
vault write auth/kubernetes/login role=myapp jwt=$SA_TOKEN
# Request a short-lived DB credential
vault read database/creds/myapp-role
# returns username + password valid for 1 hour
AWS Secrets Manager#
- Native rotation: Lambda-based rotators for RDS, Redshift, DocumentDB; custom Lambdas for anything else.
- IAM-native auth: access controlled by IAM policies, no separate auth plane.
- Integration: ECS task secrets, Lambda env injection, Parameter Store for non-sensitive config.
- Cost: $0.40/secret/month + $0.05/10k API calls.
- Rotation pattern (4 phases):
createSecret→setSecret→testSecret→finishSecret. The Lambda moves a new version throughAWSPENDINGtoAWSCURRENTwhile the old version becomesAWSPREVIOUS.
Azure Key Vault#
- Three object types: Keys (HSM-backed crypto), Secrets (generic blobs), Certificates (full lifecycle).
- Authentication: Azure AD / Managed Identities — a workload with a Managed Identity requests tokens from IMDS and authenticates to Key Vault without ever seeing a secret.
- Access control: legacy access policies OR Azure RBAC (recommended) with roles like
Key Vault Secrets User. - Soft delete & purge protection: deleted secrets recoverable for 7–90 days, purge protection blocks permanent deletion.
- HSM tier: keys backed by FIPS 140-2 Level 2 or Level 3 hardware.
Google Secret Manager#
- IAM-native, simple REST API, versioning, automatic replication across regions.
- Direct integration with Cloud Run, GKE, Cloud Functions for env-injected secrets.
Comparison#
| Feature | Vault | AWS SM | Azure KV | GCP SM |
|---|---|---|---|---|
| Dynamic secrets | Yes (many engines) | RDS only | No | No |
| Cloud-agnostic | Yes | No | No | No |
| HSM-backed | Enterprise | No (separate KMS) | Yes (Premium) | Via Cloud HSM |
| PKI / cert mgmt | Yes | ACM separate | Yes | CAS separate |
| Encryption-as-a-service | Yes (Transit) | No (KMS separate) | No | No |
| Native Kubernetes auth | Yes | IRSA | Workload Identity | Workload Identity |
| Self-hosted option | Yes | No | No | No |
| Cost model | Self-host free / Ent. licence | $0.40/secret/mo | Per operation | Per operation |
Rules of thumb:
- Single-cloud workload: use the native cloud secret manager. Simpler auth, fewer moving parts, cheaper.
- Multi-cloud or hybrid: Vault wins. One control plane, one audit log, portable.
- Need dynamic DB credentials across cloud providers: Vault.
- Need HSM-backed key operations for compliance: Azure Key Vault Premium or AWS CloudHSM.
- Tiny shop, one AWS account: AWS Secrets Manager is hard to beat.
Anti-patterns seen in the wild#
- 5.1% of repositories using secrets managers still leaked secrets (GitGuardian 2024). Having a vault doesn’t help if devs also commit the key to git.
- Storing the vault’s own bootstrap credentials in the vault (circular dependency). Use break-glass out-of-band storage.
- Granting
vault:*orsecretsmanager:*to CI roles. Scope per-path, per-secret. - Not auditing which non-human identities still hold long-lived tokens to the vault itself.
16. Developer Hygiene & Prevention#
Pre-commit#
Single highest-ROI control. Install once, runs on every git commit:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.24.2
hooks:
- id: gitleaks
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
Enforce centrally — pre-commit is bypassable with --no-verify, so also run the same check server-side in CI as a gate.
IDE integration#
- ggshield has VS Code and JetBrains plugins that scan on save.
- TruffleHog has a VS Code extension.
- GitHub Copilot and Cursor now both refuse to autocomplete strings that match known secret patterns (implemented 2025).
CI gates#
Every PR runs:
- Gitleaks (fast pattern scan)
- TruffleHog with
--results=verified(live verification on changed files) - SBOM + dependency scan (because leaks often enter through a compromised dep)
- Container image scan if Docker build
Block merges on verified-secret findings. Allow annotation-based suppression only with explicit security team approval and a ticket reference.
Push protection#
Enable GitHub’s Push Protection at the org level. It blocks known prefix patterns (GitHub PATs, AWS keys, Stripe keys, etc.). Remember its known blind spots — generic secrets, MySQL/MongoDB URLs.
.gitignore hygiene#
Ship a standard template that excludes:
.env
.env.*
!.env.example
*.pem
*.key
*.p12
*.pfx
*.keystore
*.jks
.aws/credentials
.aws/config
id_rsa
id_ed25519
*.tfstate
*.tfstate.backup
.terraform/
secrets.yml
credentials.json
service-account.json
.npmrc
.pypirc
.mcp.json
claude_desktop_config.json
Training & culture#
- Run tabletop exercises: “a key leaked 30 minutes ago, walk me through the first hour.”
- Publish a one-click revocation runbook for every credential type the org uses.
- Celebrate people who self-report leaks; never punish. The alternative is concealment.
- Track MTTR on secret incidents as a headline metric.
Honeytokens#
Deploy intentionally-leaked fake credentials throughout the codebase and infrastructure. Any use of them is, by definition, unauthorised. GitGuardian, Canarytokens, and AWS IAM canaries provide turn-key honeytokens. During the Shai-Hulud investigation, honeytokens on developer workstations gave the cleanest telemetry about what attackers actually did post-compromise.
Shift-left without developer fatigue#
- Don’t ship a scanner that produces 1,000 findings on day one. Adopt a baseline, fix new findings only, and work the backlog on a schedule.
- Fast feedback: pre-commit must finish in < 2 seconds for the common case or developers will disable it.
- Clear remediation path: every finding must link to “how do I fix this” with an approved vault pattern for the team’s stack.
- Measurable quiet-hours: no new findings for N days = the control is working.
17. Non-Human Identity Governance#
The industry’s 2025–2026 conclusion from the sprawl data: secret scanning is necessary but insufficient. What security teams actually need is Non-Human Identity (NHI) governance.
The three questions#
- What non-human identities exist in my environment? (service accounts, IAM roles, API keys, OAuth apps, SSH keys, SA tokens, machine users)
- Who owns each one? (not “which team inherited it five reorgs ago”)
- What can each one access? (effective permissions, including via role chains)
Most orgs cannot answer any of the three at scale. NHIs now outnumber human users by 10–50x in typical cloud environments, and they are overwhelmingly long-lived, over-privileged, and un-owned.
Building an NHI programme#
- Discover: enumerate every identity source — cloud IAM, GitHub/GitLab PATs, OAuth apps, service principals, SSH keys, SA tokens, vendor API keys.
- Attribute: assign an owning team (human) to each. Anything un-attributable after 30 days = candidate for retirement.
- Scope: measure effective permissions and data access per identity.
- Lifecycle: enforce creation → rotation → expiration → revocation flows. No identity exists without an expiration date and a rotation policy.
- Govern: short-lived credentials by default. Long-lived only with explicit risk exception.
- Monitor: alert on anomalous use (new IP, unusual API call mix, geographic impossibility).
Moving from static to identity-driven#
| Old | New |
|---|---|
| Hardcoded AWS access keys | IAM Roles + STS + IRSA (EKS) / IMDSv2 (EC2) |
| Long-lived GitHub PATs | GitHub Actions OIDC → cloud role assumption |
| Static DB passwords | Vault dynamic database credentials |
| Service account key JSON (GCP) | Workload Identity Federation |
| Azure service principal secrets | Managed Identities + IMDS |
| Embedded Slack tokens | Slack OAuth app with workspace installs |
GitHub Actions OIDC is the canonical 2024–2026 pattern: your workflow requests a short-lived OIDC token from GitHub, exchanges it with AWS STS / Azure AD / GCP STS for a short-lived cloud credential, and uses that. No long-lived secrets stored in GitHub Actions variables. Every major CI system now supports the same pattern.
18. Quick Reference#
Secret discovery one-liners#
# Scan current repo history
trufflehog git file://. --results=verified
gitleaks git -v
# Scan remote repo
trufflehog git https://github.com/org/repo --results=verified
# Scan org
trufflehog github --org=myorg --include-forks --include-members
# Scan Docker image
trufflehog docker --image=myimage:latest
# Scan filesystem (downloaded JS bundle / APK)
trufflehog filesystem ./extracted_bundle
# Scan S3 bucket
trufflehog s3 --bucket=mybucket
# Pre-commit check
gitleaks protect --staged
# Scan stdin
cat suspicious_file | gitleaks stdin
AWS key triage#
aws sts get-caller-identity # whose key is this?
aws iam list-attached-user-policies --user-name X # effective perms
aws iam list-access-keys --user-name X # other keys?
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA... \
--max-results 50 # usage history
GitHub token triage#
curl -H "Authorization: token $GH_TOKEN" https://api.github.com/user
curl -H "Authorization: token $GH_TOKEN" https://api.github.com/user/orgs
# Check token scopes via response header: X-OAuth-Scopes
Quick impact assessment on a leaked key#
- Who owns it? (org, team, individual)
- What is its scope? (effective permissions)
- When was it created?
- When was it last used? From where?
- What data can it reach?
- Was it ever used from an unexpected IP / time / region?
- Are there persistence artefacts created under it?
- Has it been rotated?
- Who has been notified?
- What’s the post-mortem action to prevent recurrence?
File patterns that frequently leak secrets#
.env .env.local .env.production .env.backup
config.json config.yml settings.py local.settings.json
application.properties application.yml
docker-compose.yml docker-compose.override.yml
kubeconfig .kube/config
.aws/credentials .aws/config
credentials.json service-account.json sa.json
id_rsa id_ed25519 *.pem *.key *.pfx *.p12 *.jks
.npmrc .pypirc .gem/credentials
wp-config.php parameters.yml
terraform.tfstate *.tfvars
.git-credentials .netrc
.mcp.json claude_desktop_config.json
*.sql *.dump *.bak
Red flags in code review#
String API_KEY = "..."with a long alphanumeric literalos.environ.get("KEY", "default-fallback-with-real-value")requests.get(url, headers={"Authorization": "Bearer ..."})with hardcoded bearercurl -H "X-API-Key: ..."in shell scriptsgit log --all -p | grep -i "password\|secret\|api_key"returning hits- Deleted lines removing a key without a rotation commit
.env.examplefiles that contain real values (“for testing”)- Comments like
// TODO: remove before commit - Long base64 or hex strings next to keywords like
key,token,secret
Mental model#
Every secret has a half-life. A key in your vault, scoped tight, rotated weekly, and monitored = long half-life. A key in git, on Docker Hub, in a public gist, or in a mobile binary = zero half-life, already compromised, must be rotated now.
Detection is not remediation. 64% of secrets leaked in 2022 are still valid in 2026. The rotation you don’t do is the breach you will have.
If a scanner finds it, an attacker already found it.
Compiled from 30 research articles under ~/Documents/obsidian/chs/raw/Secrets/ covering GitGuardian State of Secrets Sprawl 2025 & 2026, OWASP Secrets Management Cheat Sheet, TruffleHog & Gitleaks documentation, Wiz Forbes AI 50 research, Truffle Security’s “oops commits” analysis, the CERT-EU Trivy/European Commission breach advisory, Shai-Hulud 2 and LiteLLM supply chain post-mortems, AI-era hardcoded-secrets research, and practitioner commentary on secret scanning tooling and vaults. Defensive security reference material.