Comprehensive Secrets Management & Leakage Guide

A practitioner’s reference for secrets sprawl, credential leakage, detection, remediation, and hardening. Compiled from 30 research sources covering GitGuardian State of Secrets Sprawl 2025/2026, OWASP Secrets Management Cheat Sheet, TruffleHog, Gitleaks, real-world breaches (Trivy/European Commission, Shai-Hulud, LiteLLM), AI-era leakage patterns, and vault/NHI governance guidance.


Table of Contents

  1. Fundamentals & Impact
  2. Threat Landscape & Statistics
  3. Leak Locations & Attack Surface
  4. Secret Types & Regex Signatures
  5. JavaScript Bundle Extraction
  6. Mobile App Secret Extraction
  7. Cloud Metadata Exfiltration
  8. Environment Variable & File Leakage
  9. JWT Leaks & Validation Failures
  10. Git History Mining
  11. Secret Scanners Compared
  12. AI-Era Leakage Patterns
  13. Real-World Breaches
  14. Rotation & Incident Response Playbook
  15. Vaults & Secret Managers
  16. Developer Hygiene & Prevention
  17. Non-Human Identity Governance
  18. Quick Reference

1. Fundamentals & Impact

A secret is any credential a machine or human uses to authenticate itself to another system: API keys, database passwords, private encryption keys, OAuth client secrets, tokens, SSH keys, TLS certificates, IAM credentials, webhook URLs, and service account JSON. Secrets are the connective tissue of modern distributed architectures, and they are simultaneously the shortest path from reconnaissance to full account takeover.

Impact spectrum of a leaked secret:

StageExample
ReconAttacker grabs a leaked key, enumerates scope via CLI (aws sts get-caller-identity, gh auth status)
Lateral expansionPivot from one API key to connected services (S3, RDS, Slack, private repos)
Data exfiltrationDownload customer PII, source code, models, or training data
PersistenceCreate new access keys, add SSH keys, invite IAM users
Supply chainPush malicious packages to npm/PyPI, poison Docker images, tamper with CI/CD
MonetisationCrypto mining on stolen cloud credentials, ransom of S3 buckets, resale on dark web

Why it matters:

  • Over the past 10 years (Verizon DBIR), stolen credentials appear in 31% of all breaches.
  • IBM: breaches involving stolen or compromised credentials take an average of 292 days to identify and remediate — nearly a full year of attacker dwell time.
  • GitGuardian found 70% of secrets leaked in 2022 are still valid in 2025, and 64% of 2022 leaks remain exploitable in 2026. Detection is not remediation.
  • 35% of all private repositories scanned contained at least one plaintext secret; 32.2% of internal repos contain a hardcoded secret compared to 5.6% of public repos — internal repos are the highest-value target once an attacker gets a foothold.

A single leaked AWS IAM key, GitHub PAT, or Slack bot token has repeatedly produced full cloud account takeover, supply chain compromise, and nine-figure breach costs. Secrets are not an edge case — they are the primary initial-access vector of the 2020s.


2. Threat Landscape & Statistics

The sprawl curve

YearNew hardcoded secrets on public GitHubYoY change
2021~6Mbaseline
2023~19M
202423.77M+25%
2025~29M (GitGuardian State of Secrets Sprawl 2026)+34% — largest single-year jump ever recorded

Since 2021, leaked secrets have grown 152%, while GitHub’s public developer base expanded only 98%. Secrets sprawl is outrunning developer growth.

Where the leaks actually live

  • 5.6% of public repos contain a secret.
  • 32.2% of internal repos contain a secret (6x rate of public).
  • 18% of scanned public Docker images contain secrets; 15% of those are valid.
  • 28% of all incidents originate outside source code — Slack, Jira, Confluence, Teams.
  • Collaboration-tool-only leaks are more severe: 56.7% rated critical vs 43.7% for code-only.
  • 7,000+ valid AWS keys remain exposed on Docker Hub.
  • 100,000 valid secrets in GitGuardian’s analysis of 15M public Docker images, including Fortune 500 AWS keys and GitHub tokens.
  • Self-hosted GitLab & Docker registries expose secrets at 3–4x the rate of public GitHub.
  • 15% of commit authors on public GitHub leaked a secret at least once.

Top leaked secret categories (2024–2025)

  1. AWS IAM access keys
  2. Slack webhooks and bot tokens
  3. Azure AD API keys / service principal secrets
  4. GitHub PATs and fine-grained tokens
  5. MongoDB / MySQL / PostgreSQL connection strings (no standard prefix → hard to detect)
  6. Stripe keys, SendGrid, Twilio
  7. Generic / custom API keys (fastest-growing, hardest to scan)
  8. OpenAI, Anthropic, and other LLM provider keys

AI amplification

  • 29 million new hardcoded secrets in 2025 (+34% YoY).
  • 1,275,105 leaked secrets tied specifically to AI services (+81% YoY).
  • Eight of the ten fastest-growing leak categories are AI-related.
  • Brave Search API keys: +1,255% YoY.
  • Firecrawl: +796% YoY. Supabase: +992% YoY.
  • Public repos using GitHub Copilot had a 6.4% secret leakage rate, vs ~4.6% baseline.
  • Wiz audited the Forbes AI 50: 65% had leaked verified secrets on GitHub, frequently in deleted forks, gists, and personal developer repos.
  • MCP (Model Context Protocol) config files exposed 24,008 unique secrets in 2025 (2,117 validated) in their first year.

3. Leak Locations & Attack Surface

Source code hosts

PlatformNotes
GitHub public reposLargest visible attack surface; push protection helps only for prefixed keys
GitHub internal/private repos6x more secret-dense than public; often misconfigured to public
GitHub GistsFrequently ignored by scanning; personal devs paste snippets with embedded keys
GitHub forksForce-deleted forks remain in the commit graph (see “oops commits”)
GitLab (SaaS & self-hosted)Self-hosted instances frequently exposed to internet with default creds
BitbucketLess scanning tooling, frequently forgotten legacy repos
Azure DevOpsPipeline variables, var groups, wiki pages

Beyond source code

SurfaceLeak vector
Docker Hub / GHCRSecrets baked into layers via ENV, ARG, or forgotten COPY .
npm / PyPI / crates.io packages.env, .npmrc, tarball artifacts
CI/CD logsJenkins, GitHub Actions, CircleCI echoing secrets or masking failures
Slack / Teams / DiscordPasted credentials during incident response and onboarding
Jira / Confluence / Notion“Temporary” creds in runbooks that never get removed
Shodan / CensysExposed .env, .git, config endpoints on the open internet
Pastebin / GitHub code searchIntentional and accidental dumps
Wayback Machine / Google cacheHistorical copies of pages that briefly exposed keys
Mobile app bundles (APK/IPA)Hardcoded API keys in decompiled smali / binary strings
JavaScript bundlesKeys in SPA main.js, chunk-*.js, source maps
Backup files.sql, .bak, .tar.gz, config.php.swp, .DS_Store
Browser dev toolsTokens visible in localStorage, sessionStorage, cookies
Artifacts & build outputsWAR, JAR, PyInstaller .exe, Electron app.asar
Kubernetes manifestsstringData: in Secret resources stored in git
Terraform state filesterraform.tfstate committed with plaintext provider credentials

Developer endpoints as aggregation layer

The Shai-Hulud 2 supply chain incident gave rare telemetry across 6,943 compromised systems:

  • 294,842 secret occurrences observed
  • 33,185 unique secrets
  • Average secret appeared in 8 different locations per machine (.env, shell history, ~/.aws/credentials, IDE configs, .git-credentials, cached tokens, build artifacts)
  • 59% of compromised machines were CI/CD runners, not personal laptops

Once secrets sprawl into build infrastructure, the blast radius becomes organisational, not individual.


4. Secret Types & Regex Signatures

Detection tools work by combining pattern matching (regex + prefix), entropy analysis (Shannon entropy over the candidate substring), and live verification (attempting to authenticate).

Common prefixes and formats

ProviderPatternExample
AWS Access Key IDAKIA[0-9A-Z]{16}AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key[A-Za-z0-9/+=]{40} (entropy-based)
AWS Session TokenASIA[0-9A-Z]{16}ASIAxxx…
GitHub PAT (classic)ghp_[A-Za-z0-9]{36}
GitHub fine-grainedgithub_pat_[A-Za-z0-9_]{82}
GitHub OAuthgho_[A-Za-z0-9]{36}
GitHub app installghs_[A-Za-z0-9]{36}
Slack bot tokenxoxb-[0-9]+-[0-9]+-[A-Za-z0-9]+
Slack user tokenxoxp-…
Slack webhookhttps://hooks.slack.com/services/T[A-Z0-9]+/B[A-Z0-9]+/[A-Za-z0-9]+
Stripe live secretsk_live_[A-Za-z0-9]{24,}
Stripe restrictedrk_live_[A-Za-z0-9]{24,}
Google API keyAIza[0-9A-Za-z_-]{35}
Google OAuth client secretGOCSPX-[A-Za-z0-9_-]{28}
OpenAIsk-[A-Za-z0-9]{48} / sk-proj-…
Anthropicsk-ant-api03-[A-Za-z0-9_-]{95}
HuggingFacehf_[A-Za-z0-9]{34}
TwilioSK[0-9a-fA-F]{32} / AC account SID
SendGridSG\.[A-Za-z0-9_-]{22}\.[A-Za-z0-9_-]{43}
NPM tokennpm_[A-Za-z0-9]{36}
PyPI tokenpypi-AgEIcHlwaS5vcmc[A-Za-z0-9_-]+
JWTeyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
Private keys`—–BEGIN (RSAOPENSSH

Generic secret detection (the hard part)

Generic secrets (api_key = "...", password = "...", arbitrary database URLs) have no standard prefix and are the fastest-growing leak category. Detection strategies:

  • Keyword proximity + entropy: look for secret, key, token, password, passwd, pwd, auth, credentials, api_key within N characters of a high-entropy string.
  • Shannon entropy threshold: typically >= 3.5 for base64, >= 4.5 for hex-like strings.
  • Connection-string parsers: mysql://user:pass@host, postgres://…, mongodb+srv://…, redis://:pass@….
  • ML-assisted classifiers: GitGuardian and TruffleHog both layer ML models on top of regex for generic secret classification.
  • Live verification: the definitive signal — TruffleHog’s “verified” mode attempts authentication against the relevant API before raising an alert.

GitHub Push Protection struggles with generic secrets — MySQL and MongoDB credentials were measurably not impacted by Push Protection rollout because they lack a standardized prefix.


5. JavaScript Bundle Extraction

Single-page applications frequently ship secrets inside compiled JS bundles under the mistaken belief that minification is obfuscation. A bundled SPA is client-side code — anything in it is public.

What leaks in JS bundles

  • Firebase configs (apiKey, authDomain, projectId) — some fields are intended to be public, but databaseURL with open rules leads to full read/write
  • Stripe publishable and accidental secret keys
  • Algolia admin keys (vs the intended search-only key)
  • Mapbox, Google Maps server keys
  • Segment write keys with elevated scopes
  • Hardcoded JWT signing secrets (symmetric HS256)
  • AWS Cognito unauth pool IDs leading to IAM assumption
  • Backend base URLs, staging endpoints, internal API paths

Extraction workflow

  1. Crawl the target with a headless browser or katana / hakrawler and collect every .js, .mjs, .map, and chunk-*.js.
  2. Fetch source maps (.map files) where available — they reconstruct original source trees including comments that often reveal keys.
  3. Run secret scanners over collected JS:
    • trufflehog filesystem ./js
    • gitleaks dir ./js
    • SecretFinder, LinkFinder (Python, classic toolkit)
    • jsluice (newer, AST-based extractor for URLs, params, and secrets)
  4. Beautify bundles with js-beautify to reveal string literals hidden by minification.
  5. Grep for high-value markers: firebaseConfig, accessKeyId, process.env, Bearer , Authorization.
  6. Diff historical versions from Wayback Machine — developers often remove keys after disclosure without rotation.

Why this keeps happening

Webpack/Vite inline process.env.* at build time when using define: plugins. A developer setting VITE_API_SECRET in .env and referencing it as import.meta.env.VITE_API_SECRET ships the value in main.js. The NEXT_PUBLIC_ convention is explicit about this; the VITE_ convention is not always respected.


6. Mobile App Secret Extraction

Mobile apps are binary blobs distributed to every user — treat them as adversarial-read by default.

Android (APK)

Extraction pipeline:

  1. Pull the APK: adb shell pm path <pkg> then adb pull.
  2. Unpack with apktool d app.apk — yields smali/, res/, AndroidManifest.xml, assets/.
  3. Decompile DEX to Java with jadx -d out app.apk for readable source.
  4. Scan with secret scanners:
    • trufflehog filesystem out/
    • Custom grep for api_key, password, BuildConfig, R.string.
  5. Check res/values/strings.xml — developers routinely put API keys here.
  6. Check assets/ for bundled .env, .json, .properties, config.js.
  7. Extract strings from native .so libraries: strings lib/arm64-v8a/*.so | grep -iE 'key|token|secret'.

Common APK secret locations:

  • BuildConfig.java — Gradle build-time constants
  • strings.xml resources
  • .properties files under assets/
  • Hardcoded in Java/Kotlin via String KEY = "..."
  • Certificate pinning bypass targets that leak API keys

iOS (IPA)

Extraction pipeline:

  1. Pull IPA from a jailbroken device or from a decrypted source (frida-ios-dump, bagbak).
  2. Unzip IPA — contains Payload/App.app/ bundle.
  3. Inspect the main executable with strings or otool.
  4. Class-dump Objective-C metadata: class-dump or classdumpios.
  5. For Swift, use Hopper or Ghidra; Swift symbols are mangled but string constants remain.
  6. Check Info.plist and embedded .plist files for API keys.
  7. Check Assets.car and bundled resources.

Runtime extraction: Frida scripts hook NSString allocation or SecItem keychain calls to dump secrets at runtime, bypassing any at-rest obfuscation.

Mitigations that actually help

  • Never embed a production secret in a mobile binary. Full stop.
  • Use short-lived tokens minted by your backend after user authentication.
  • For Firebase/GCP, use App Check / DeviceCheck to bind requests to genuine app installs.
  • For Android, use Play Integrity API; iOS uses App Attest.
  • Where client-side crypto is required, derive keys from user credentials + PBKDF2/Argon2.
  • Server-side authorisation is the only real defence — client-side obfuscation is speed bump, not a gate.

7. Cloud Metadata Exfiltration

When an attacker gains SSRF, RCE, or code execution inside a cloud workload, the Instance Metadata Service (IMDS) becomes the single highest-value internal target: it hands out short-lived IAM credentials that the workload itself is entitled to use.

AWS

EndpointReturns
http://169.254.169.254/latest/meta-data/Index of metadata categories
http://169.254.169.254/latest/meta-data/iam/security-credentials/IAM role names attached to the instance
http://169.254.169.254/latest/meta-data/iam/security-credentials/<role>AccessKeyId, SecretAccessKey, Token (STS session credentials)
http://169.254.169.254/latest/user-data/EC2 user-data script (often contains bootstrap secrets)
http://169.254.169.254/latest/dynamic/instance-identity/documentAccount ID, region, instance ID

IMDSv2 (token-based) is the mitigation: attacker must first PUT to /latest/api/token with X-aws-ec2-metadata-token-ttl-seconds, receive a token, then include it as X-aws-ec2-metadata-token on subsequent requests. Many SSRF primitives cannot issue PUT or arbitrary headers, and are thus blocked. Enforce IMDSv2 cluster-wide via the HttpTokens=required launch template setting.

Azure

EndpointReturns
http://169.254.169.254/metadata/instance?api-version=2021-02-01 (requires Metadata: true header)Instance metadata
http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/Managed Identity access token
http://169.254.169.254/metadata/identity/oauth2/token?...&resource=https://vault.azure.netKey Vault access token

The required Metadata: true header and resource parameter are Azure’s lightweight mitigation. Once an attacker obtains a Managed Identity token for Key Vault, every secret and key the MI can reach is accessible.

GCP

EndpointReturns
http://metadata.google.internal/computeMetadata/v1/ (requires Metadata-Flavor: Google)Metadata index
http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/tokenOAuth2 access token for the attached service account
http://metadata.google.internal/computeMetadata/v1/project/attributes/Project-level SSH keys, metadata

Kubernetes

LocationSecret
/var/run/secrets/kubernetes.io/serviceaccount/tokenPod service-account JWT
/var/run/secrets/kubernetes.io/serviceaccount/ca.crtCluster CA
https://kubernetes.default.svc/api/v1/namespaces/<ns>/secrets (with SA token)All secrets the SA can read
Kubelet :10250/pods (if unauthenticated)Pod metadata including env vars

Exfiltration once credentials are obtained

aws sts get-caller-identity          # confirm scope
aws iam list-attached-role-policies  # enumerate permissions
aws s3 ls                            # list buckets
aws secretsmanager list-secrets      # enumerate centrally-stored secrets
aws ssm describe-parameters          # SSM Parameter Store often holds legacy creds

A single SSRF → IMDS chain routinely produces full AWS account takeover. The Capital One 2019 breach (100M+ records) is the canonical case: a WAF SSRF → IMDSv1 → IAM credentials → S3 exfiltration.


8. Environment Variable & File Leakage

Secrets migrated out of source code typically landed in .env files and process environment variables. Both introduce new leak channels.

.env files

  • Committed to git by developers who forgot .gitignore.
  • Shipped inside Docker images via COPY . ..
  • Exposed by web servers serving the project directory root (https://victim.com/.env returns 200 with full file).
  • Present in backup tarballs accidentally made public on S3 or FTP.
  • Loaded by Next.js, Laravel, Rails, and Django during dotenv initialisation, then echoed into error pages and debug toolbars.

Recon one-liners (defensive — to test your own assets):

  • curl https://target/.env
  • curl https://target/.env.backup
  • curl https://target/.env.local
  • curl https://target/config/.env
  • curl https://target/.git/config

Environment variable leakage

ChannelMechanism
Error pagesDjango DEBUG=True, Flask debug toolbar, Rails WEB_CONSOLE, Laravel Ignition — all dump full env on exception
phpinfo()Leftover diagnostic file dumps $_ENV, $_SERVER, including DB creds
/proc/self/environOn LFI, reading this file under a worker process returns that worker’s env vars
/proc/<pid>/environSame, for other processes with appropriate permissions
Docker image historydocker history <image> leaks ENV directives from Dockerfile layers
Kubernetes APIkubectl describe pod shows env vars that reference configMapKeyRef but inlined value: entries leak
Process listingps auxe reveals env; long-running daemons started with --secret=... as argv leak via /proc/<pid>/cmdline
APM / logging stacksDatadog, Sentry, New Relic, Elastic APM frequently capture full env on error
Core dumps/var/crash/*.core contains full process memory including secrets

Defensive patterns

  • Never set secrets as ENV in Dockerfiles; use runtime --env-file or orchestrator secret mounts.
  • Tmpfs-mount .env files in containers; never bake them into the image.
  • Disable DEBUG in production, verify explicit error pages.
  • Scrub env vars before sending events to APM/logging. Sentry, Datadog, and Honeycomb all have allowlist/denylist filters.
  • Use memory-safe primitives: in Java/.NET, prefer byte[]/char[] over String, zero the buffer after use. Strings are immutable and cannot be reliably garbage-collected.

9. JWT Leaks & Validation Failures

JWTs are bearer tokens — anyone possessing one can use it. They are also among the most commonly leaked secrets because they appear liberally in logs, browser storage, and URLs.

Common JWT leak vectors

  • Stored in localStorage (accessible from any XSS).
  • Passed in URL query strings (?token=eyJ...) — logged by every proxy, analytics tool, and browser history.
  • Copied into Slack/Jira during debugging.
  • Baked into mobile apps as “API key” equivalents.
  • Logged by web frameworks on request errors.
  • Cached by CDNs when cache keys don’t include the Authorization header.

Validation failures that turn a leak into a breach

FailureImpact
alg: noneLibrary accepts unsigned tokens
HS256 vs RS256 confusionAttacker signs tokens with the RSA public key as HMAC secret
Weak HS256 secretBrute force hashcat -m 16500 cracks short secrets in seconds
Missing iss/aud validationTokens from sibling tenants accepted
Expired exp not enforcedTokens live forever
kid injectionAttacker points to a file under their control or injects SQL
JKU/JWK header trustAttacker hosts their own JWK set and forges tokens

Defensive JWT handling

  • Store tokens in HttpOnly; Secure; SameSite=Strict cookies, never in localStorage.
  • Short-lived access tokens (5–15 min), refresh token rotation with reuse detection.
  • Always validate alg against an allowlist — never trust the header’s alg field.
  • Validate iss, aud, exp, nbf, sub every request.
  • Rotate signing keys regularly, publish via JWKS with kid pinning.
  • Never log full tokens — truncate or hash.

10. Git History Mining

Git’s immutable log is an attacker’s treasure map. Deleting a secret in a later commit does not remove it from history; force-pushing does not remove it from GitHub’s event archive.

The “oops commits” problem

GitHub retains every public commit, even those developers attempt to erase through force pushes, as zero-commit PushEvent entries in its event archive. Sharon Brizinov (Truffle Security) scanned all force-pushed/deleted commits since 2020 via the GH Archive BigQuery dataset and found thousands of active secrets, including:

  • A GitHub Personal Access Token with admin permissions over the Istio repositories (immediate supply chain compromise potential)
  • Valid MongoDB credentials, AWS keys, GitHub PATs
  • Bug bounties totalling ~$25,000

Truffle Security released the open-source Force Push Scanner (https://github.com/trufflesecurity/force-push-scanner) that queries GH Archive via BigQuery and runs TruffleHog on orphaned commits. The practical conclusion: once a secret has been pushed to a public repo, it must be considered permanently compromised. Rotate, don’t hide.

Historical scanning workflow

# Full history, all branches, all tags
git clone --mirror https://github.com/org/repo
trufflehog git file://repo.git --results=verified
gitleaks git -v repo.git

# Scan a specific commit range
gitleaks git --log-opts="--all commitA..commitB" path/

# Scan every branch including deleted refs (if still reachable)
git fetch origin '+refs/pull/*:refs/remotes/origin/pr/*'

# Dangling blob scan
git fsck --full --unreachable --no-reflogs

Removing secrets from history (cautiously)

Historical rewrites are destructive and break every clone of the repo. They do not retroactively invalidate the leaked secret — rotation is always step one.

  • git filter-repo — modern replacement for git filter-branch; faster and safer.
  • BFG Repo-Cleaner — single-purpose tool that removes large files and secret strings.
  • After rewriting: force-push, notify all collaborators to re-clone, and assume the old history is archived somewhere (GH Archive, clones, forks, mirrors). Rotate the secret first, always.

11. Secret Scanners Compared

TruffleHog

  • Strengths: 800+ classified detectors, active verification (hits the vendor API to confirm the secret is live), “Analyze” mode that enumerates what the credential can access (IAM permissions, resources, owner).
  • Scope: Git, GitHub, GitLab, filesystem, S3, GCS, Docker images, Jira, Slack, Confluence, Postman, Jenkins logs, Circle CI logs, and more.
  • Modes: --results=verified (only show live secrets), --only-verified, --no-verification for air-gapped scans.
  • Enterprise: continuous monitoring of Git, Jira, Slack, Confluence, Teams, SharePoint.
  • License: AGPL-3.0 (open source), enterprise commercial product for continuous scanning.
  • Typical invocation:
    trufflehog git https://github.com/org/repo --results=verified
    trufflehog github --org=myorg --include-forks --include-members
    trufflehog docker --image=myimage:latest
    trufflehog filesystem ./
    

Gitleaks

  • Strengths: fast, Go-native, TOML-configurable rules, extensive default ruleset, first-class pre-commit and GitHub Action integration, playground for regex development.
  • Scope: git, dir, stdin. Does not verify secrets against APIs.
  • Config layers: flag → env var → repo .gitleaks.toml → default.
  • Reports: JSON, CSV, JUnit, SARIF (integrates with GitHub code scanning), custom templates.
  • Entropy, allowlists, baselining: supports .gitleaksignore and baseline files to suppress known issues.
  • License: MIT.
  • Typical invocation:
    gitleaks git -v --log-opts="--all"
    gitleaks dir ./src
    gitleaks git --report-format sarif --report-path leaks.sarif
    

detect-secrets (Yelp)

  • Strengths: baseline-first workflow designed for gradual adoption on legacy repos; every developer inherits a single baseline and only new secrets block CI.
  • Weak on: active verification (not its design goal).
  • Plugin model: extensible detector types (base64, hex, AWS, Slack, Azure, private keys, etc.).

ggshield (GitGuardian)

  • Strengths: commercial backing, 350+ detectors, near-zero false positives due to ML classification layered on regex, dashboard-driven remediation workflows, first-class remediation ticketing.
  • Modes: pre-commit, pre-receive, CI, IDE integration, secrets scanner for Jira/Slack/Confluence.

Semgrep (Secrets)

  • Strengths: AST-aware rules can detect insecure secret handling (e.g. secrets passed as URL params, secrets logged), in addition to leaked values.
  • Integrates: PR comments, GitHub/GitLab checks.

GitHub native

  • Secret scanning (advanced security): scans public and private repos for known prefix patterns.
  • Push protection: blocks pushes containing recognised patterns. Limited against generic secrets — MySQL and MongoDB creds were measurably not impacted.
  • Partner program: vendors register prefixes (e.g. ghp_) so GitHub detects and auto-revokes them on leak.

Side-by-side

FeatureTruffleHogGitleaksdetect-secretsggshieldSemgrep
Open sourceAGPLMITApacheCLI yes, backend noYes
Active verificationYesNoNoYesNo
Number of detectors800+~150~25 plugins350+Custom rules
Docker/image scanningYesNo (dir mode only)NoYesNo
SaaS source scanning (Jira/Slack)EnterpriseNoNoYesNo
AST rule supportNoNoNoNoYes
GitHub ActionYesOfficialCommunityYesYes
SARIF outputYesYesPartialYesYes
Baseline workflowPartialYesYes (primary)YesYes

Choosing

  • Greenfield or aggressive rotation policy: TruffleHog with verification.
  • Legacy monorepo, cannot rotate everything: detect-secrets baseline.
  • Fast CI, SARIF to GitHub code scanning: Gitleaks.
  • Enterprise with Slack/Jira surface: ggshield or TruffleHog Enterprise.
  • Want to catch insecure handling, not just values: add Semgrep.

Most mature programs run two in parallel: a fast prefix scanner at pre-commit (Gitleaks) and a verifying scanner on merge (TruffleHog).


12. AI-Era Leakage Patterns

AI coding tools have fundamentally changed the leak surface in 2024–2025.

AI coding assistants

  • GitHub Copilot usage grew 27% from 2023 to 2024.
  • Public repos using Copilot leaked secrets at 6.4% vs the 4.6% baseline — a 40% higher exposure rate.
  • Causes: Copilot autocompletes plausible-looking API keys from its training data, suggests placeholder keys that get committed unchanged, generates .env files with example values, and rewrites config files without awareness of .gitignore.
  • Anti-pattern: developers who ask Copilot “how do I call the OpenAI API” get pasted their own OpenAI key from a previous file into the suggestion, which they then commit to a new repo.

AI service secret categories (2024 to 2025 growth)

CategoryYoY increase
AI service leaks overall+81%
Brave Search API+1,255%
Supabase (AI backend)+992%
Firecrawl (LLM scraping)+796%
OpenAI / Anthropic / Coherehundreds of percent each
HuggingFace access tokenssteady high growth

MCP (Model Context Protocol) specifically

MCP became the connective tissue between LLMs and tools in 2025. Its convention is a local JSON config file (mcp.json, claude_desktop_config.json) containing server command, arguments, and environment variables. Secrets end up in these configs by design: API keys as env.OPENAI_API_KEY, database credentials as CLI args, GitHub tokens as env.

GitGuardian found 24,008 unique secrets in MCP-related config files on public GitHub in MCP’s first year, with 2,117 verified as valid. Expect this to grow exponentially as agentic AI adoption accelerates. Treat MCP configs as first-class secret-bearing files, ignore them via .gitignore, and vault any credentials they reference.

Vibe-coding / ElevenLabs pattern

Wiz, studying the Forbes AI 50, documented a specific pattern: AI startups shipping products with ElevenLabs API keys in plaintext in public repos, often left in by a developer “vibe-coding” through a prototype and pushing the working version. 65% of AI companies studied had verified secret leaks.

AI-assisted remediation is also AI-generated

Hardcoded secrets in AI-generated code is now a recognised detection category. Anthropic, GitHub, and third parties ship linting/pre-commit hooks that specifically check AI completions for plausible-looking credentials before they get written to disk. Scanners like ggshield now have “AI code” detection modes that trigger on typical Copilot/Cursor output signatures.


13. Real-World Breaches

European Commission via Trivy (April 2026)

  • Vector: Supply chain compromise of Trivy (open-source vuln scanner) by threat actor “TeamPCP”.
  • Initial access: European Commission downloaded a compromised Trivy version on 19 March 2026 through normal update channels.
  • Escalation: Malicious code inside Trivy executed within the Commission’s CI/CD pipelines and harvested an AWS secret with management rights over affiliated cloud accounts.
  • Tradecraft: Attackers deployed TruffleHog inside the victim environment to enumerate more credentials, then called AWS STS to validate and mint session tokens. Created new persistent access keys attached to an existing IAM user.
  • Impact: 340 GB exfiltrated, affecting 42 internal clients and 29 other EU entities. Data dumped by extortion group ShinyHunters on 28 March.
  • MITRE ATT&CK: T1195.002 (Supply Chain Compromise), T1586.003 (Cloud Account Compromise), T1078.004 (Valid Cloud Accounts), T1005 (Data from Local System).
  • Lessons: pin GitHub Actions/binaries to SHA, not mutable tags; restrict CI/CD IAM to least privilege; monitor CloudTrail for STS anomalies and TruffleHog signatures; maintain rapid AWS credential rotation capability.

AWS S3 Ransomware

Attackers used valid AWS credentials (harvested via secret leaks) to encrypt S3 buckets with customer-supplied encryption keys and demanded ransom for decryption. This weaponises legitimate AWS features — SSE-C — once credentials leak. The defence is IAM condition keys denying s3:PutObject with x-amz-server-side-encryption-customer-algorithm.

Artifactory token exposures

GitGuardian case study: 60% of leaked Artifactory tokens were in build configs affecting production environments in pharma and energy sectors. A single leaked Artifactory admin token permits arbitrary package publication into trusted registries — a supply chain compromise primitive.

Shai-Hulud 2 (2025)

Worm-style npm supply chain attack compromising developer machines. Forensic telemetry across 6,943 compromised systems yielded 33,185 unique secrets. 59% of compromised machines were CI/CD runners. Demonstrated that “developer endpoints as credential aggregation layer” is an organisational risk, not an individual-hygiene issue.

LiteLLM supply chain attack (2025)

Compromised LiteLLM packages harvested SSH keys, cloud credentials, and API tokens specifically from machines where AI development tools were concentrated. Followed exactly the Shai-Hulud playbook. Reinforces that the AI developer stack is now a primary target.

GitHub “oops commits” (2025)

Sharon Brizinov’s scan of all public GitHub force-pushed/deleted commits since 2020 recovered thousands of active secrets including a GitHub PAT with admin on the Istio repositories. Resulted in ~$25,000 in bounties and the release of the Force Push Scanner tool. Proves: force-pushed does not equal deleted.

Capital One (2019, for context)

SSRF in a WAF → IMDSv1 credential theft → 100M+ records exfiltrated from S3. Still the canonical example of a secret-leak attack chain and the reason IMDSv2 exists.

Toyota (2022, for context)

Five years of customer data exposed after a T-Connect source code containing an access key was published to a public GitHub repository by a contractor. 296,000 customer email addresses + IDs exposed.

Uber (2022)

18-year-old social engineered an Uber employee, found hardcoded PowerShell admin credentials to Uber’s PAM solution, and pivoted to full enterprise compromise. A single hardcoded credential in a shell script.


14. Rotation & Incident Response Playbook

When (not if) a secret leaks:

Minute 0–15: Contain

  1. Assume compromise — treat the secret as in attacker hands the moment it left your machine.
  2. Revoke, don’t rotate first. Delete the key from the provider console or API immediately. Rotation implies the old key remains briefly valid.
  3. For AWS: aws iam delete-access-key --access-key-id AKIA....
  4. For GitHub: revoke the PAT and force-expire any SSH keys.
  5. For OAuth: invalidate client secrets and revoke issued tokens.
  6. For database creds: ALTER USER ... WITH PASSWORD ... and kill active sessions.

Minute 15–60: Investigate

  1. Audit logs — pull CloudTrail, GitHub audit log, provider access logs for the full lifetime of the exposed key. Build a list of every action taken under it.
  2. Check for new IAM users, access keys, SSH keys, webhooks added since exposure. These are persistence.
  3. Scan CloudTrail for sts:GetSessionToken, iam:CreateAccessKey, iam:AttachUserPolicy.
  4. For GitHub: check recent pushes, newly created repos, changed webhook URLs, new org members.
  5. Scan the exposed medium (public repo, Slack channel, Docker image) for other secrets — leaks cluster.

Hour 1–4: Eradicate

  1. Rotate any credentials the compromised key could reach (blast radius).
  2. Delete persistence artefacts (attacker-created users, keys, lambdas, ECS tasks).
  3. Review network egress logs for data exfiltration.
  4. Snapshot affected systems for forensics before further changes.

Day 1–3: Recover & report

  1. Replace the secret in all legitimate consumers via your secrets manager.
  2. Post-mortem: how did the secret get committed? Where were the detection layers supposed to catch this? Which of them failed and why?
  3. Notify regulators, customers, and partners as required (GDPR 72-hour rule, SEC 4-day rule for public companies).
  4. File a CVE if the vulnerability was in your software.

Week 1+: Harden

  1. Add the leak pattern to your secret scanner as a custom rule.
  2. Add the failure mode to CI gates.
  3. Migrate the secret class from static to dynamic or short-lived.
  4. Run the Force Push Scanner / historical scans across all your public repos.

Rotation patterns

PatternDescription
Gradual rotationIntroduce new secret, phase out old over a deprecation window
Dual-key (write-new, read-old)Support two valid secrets simultaneously during migration
Dynamic secretsVault mints a new secret per application startup; expires on shutdown
Scheduled rotationSecrets manager rotates on cron (AWS Secrets Manager supports native Lambda-based rotation)
Event-driven rotationRotate on every deploy, suspicion, or leak

Dynamic secrets are the strategic target state: an application requests its DB password at startup, the vault generates a temporary credential scoped to that session, and revokes it when the session ends. Any exfiltrated credential becomes worthless within minutes.


15. Vaults & Secret Managers

HashiCorp Vault

  • Architecture: central server (HA via Consul/Raft), token-based auth, pluggable auth methods (Kubernetes, AWS IAM, OIDC, LDAP, AppRole), pluggable secret engines (KV, database, PKI, SSH, AWS, Azure, GCP, Transit).
  • Dynamic secrets: Vault generates short-lived DB users, AWS STS tokens, SSH certificates on demand.
  • Transit engine: encryption-as-a-service — the application sends plaintext, Vault returns ciphertext, plaintext is never stored.
  • Leasing & revocation: every secret has a TTL; expired leases are proactively revoked.
  • Auditing: every access is logged to append-only audit devices.
  • Unsealing: Vault starts sealed; Shamir’s Secret Sharing splits the master key into N of M shards that must be combined to unseal (or use auto-unseal with AWS KMS / GCP KMS / HSM).

Typical Vault flow:

# App authenticates via Kubernetes SA token
vault write auth/kubernetes/login role=myapp jwt=$SA_TOKEN

# Request a short-lived DB credential
vault read database/creds/myapp-role
# returns username + password valid for 1 hour

AWS Secrets Manager

  • Native rotation: Lambda-based rotators for RDS, Redshift, DocumentDB; custom Lambdas for anything else.
  • IAM-native auth: access controlled by IAM policies, no separate auth plane.
  • Integration: ECS task secrets, Lambda env injection, Parameter Store for non-sensitive config.
  • Cost: $0.40/secret/month + $0.05/10k API calls.
  • Rotation pattern (4 phases): createSecretsetSecrettestSecretfinishSecret. The Lambda moves a new version through AWSPENDING to AWSCURRENT while the old version becomes AWSPREVIOUS.

Azure Key Vault

  • Three object types: Keys (HSM-backed crypto), Secrets (generic blobs), Certificates (full lifecycle).
  • Authentication: Azure AD / Managed Identities — a workload with a Managed Identity requests tokens from IMDS and authenticates to Key Vault without ever seeing a secret.
  • Access control: legacy access policies OR Azure RBAC (recommended) with roles like Key Vault Secrets User.
  • Soft delete & purge protection: deleted secrets recoverable for 7–90 days, purge protection blocks permanent deletion.
  • HSM tier: keys backed by FIPS 140-2 Level 2 or Level 3 hardware.

Google Secret Manager

  • IAM-native, simple REST API, versioning, automatic replication across regions.
  • Direct integration with Cloud Run, GKE, Cloud Functions for env-injected secrets.

Comparison

FeatureVaultAWS SMAzure KVGCP SM
Dynamic secretsYes (many engines)RDS onlyNoNo
Cloud-agnosticYesNoNoNo
HSM-backedEnterpriseNo (separate KMS)Yes (Premium)Via Cloud HSM
PKI / cert mgmtYesACM separateYesCAS separate
Encryption-as-a-serviceYes (Transit)No (KMS separate)NoNo
Native Kubernetes authYesIRSAWorkload IdentityWorkload Identity
Self-hosted optionYesNoNoNo
Cost modelSelf-host free / Ent. licence$0.40/secret/moPer operationPer operation

Rules of thumb:

  • Single-cloud workload: use the native cloud secret manager. Simpler auth, fewer moving parts, cheaper.
  • Multi-cloud or hybrid: Vault wins. One control plane, one audit log, portable.
  • Need dynamic DB credentials across cloud providers: Vault.
  • Need HSM-backed key operations for compliance: Azure Key Vault Premium or AWS CloudHSM.
  • Tiny shop, one AWS account: AWS Secrets Manager is hard to beat.

Anti-patterns seen in the wild

  • 5.1% of repositories using secrets managers still leaked secrets (GitGuardian 2024). Having a vault doesn’t help if devs also commit the key to git.
  • Storing the vault’s own bootstrap credentials in the vault (circular dependency). Use break-glass out-of-band storage.
  • Granting vault:* or secretsmanager:* to CI roles. Scope per-path, per-secret.
  • Not auditing which non-human identities still hold long-lived tokens to the vault itself.

16. Developer Hygiene & Prevention

Pre-commit

Single highest-ROI control. Install once, runs on every git commit:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.24.2
    hooks:
      - id: gitleaks
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.5.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

Enforce centrally — pre-commit is bypassable with --no-verify, so also run the same check server-side in CI as a gate.

IDE integration

  • ggshield has VS Code and JetBrains plugins that scan on save.
  • TruffleHog has a VS Code extension.
  • GitHub Copilot and Cursor now both refuse to autocomplete strings that match known secret patterns (implemented 2025).

CI gates

Every PR runs:

  1. Gitleaks (fast pattern scan)
  2. TruffleHog with --results=verified (live verification on changed files)
  3. SBOM + dependency scan (because leaks often enter through a compromised dep)
  4. Container image scan if Docker build

Block merges on verified-secret findings. Allow annotation-based suppression only with explicit security team approval and a ticket reference.

Push protection

Enable GitHub’s Push Protection at the org level. It blocks known prefix patterns (GitHub PATs, AWS keys, Stripe keys, etc.). Remember its known blind spots — generic secrets, MySQL/MongoDB URLs.

.gitignore hygiene

Ship a standard template that excludes:

.env
.env.*
!.env.example
*.pem
*.key
*.p12
*.pfx
*.keystore
*.jks
.aws/credentials
.aws/config
id_rsa
id_ed25519
*.tfstate
*.tfstate.backup
.terraform/
secrets.yml
credentials.json
service-account.json
.npmrc
.pypirc
.mcp.json
claude_desktop_config.json

Training & culture

  • Run tabletop exercises: “a key leaked 30 minutes ago, walk me through the first hour.”
  • Publish a one-click revocation runbook for every credential type the org uses.
  • Celebrate people who self-report leaks; never punish. The alternative is concealment.
  • Track MTTR on secret incidents as a headline metric.

Honeytokens

Deploy intentionally-leaked fake credentials throughout the codebase and infrastructure. Any use of them is, by definition, unauthorised. GitGuardian, Canarytokens, and AWS IAM canaries provide turn-key honeytokens. During the Shai-Hulud investigation, honeytokens on developer workstations gave the cleanest telemetry about what attackers actually did post-compromise.

Shift-left without developer fatigue

  • Don’t ship a scanner that produces 1,000 findings on day one. Adopt a baseline, fix new findings only, and work the backlog on a schedule.
  • Fast feedback: pre-commit must finish in < 2 seconds for the common case or developers will disable it.
  • Clear remediation path: every finding must link to “how do I fix this” with an approved vault pattern for the team’s stack.
  • Measurable quiet-hours: no new findings for N days = the control is working.

17. Non-Human Identity Governance

The industry’s 2025–2026 conclusion from the sprawl data: secret scanning is necessary but insufficient. What security teams actually need is Non-Human Identity (NHI) governance.

The three questions

  1. What non-human identities exist in my environment? (service accounts, IAM roles, API keys, OAuth apps, SSH keys, SA tokens, machine users)
  2. Who owns each one? (not “which team inherited it five reorgs ago”)
  3. What can each one access? (effective permissions, including via role chains)

Most orgs cannot answer any of the three at scale. NHIs now outnumber human users by 10–50x in typical cloud environments, and they are overwhelmingly long-lived, over-privileged, and un-owned.

Building an NHI programme

  1. Discover: enumerate every identity source — cloud IAM, GitHub/GitLab PATs, OAuth apps, service principals, SSH keys, SA tokens, vendor API keys.
  2. Attribute: assign an owning team (human) to each. Anything un-attributable after 30 days = candidate for retirement.
  3. Scope: measure effective permissions and data access per identity.
  4. Lifecycle: enforce creation → rotation → expiration → revocation flows. No identity exists without an expiration date and a rotation policy.
  5. Govern: short-lived credentials by default. Long-lived only with explicit risk exception.
  6. Monitor: alert on anomalous use (new IP, unusual API call mix, geographic impossibility).

Moving from static to identity-driven

OldNew
Hardcoded AWS access keysIAM Roles + STS + IRSA (EKS) / IMDSv2 (EC2)
Long-lived GitHub PATsGitHub Actions OIDC → cloud role assumption
Static DB passwordsVault dynamic database credentials
Service account key JSON (GCP)Workload Identity Federation
Azure service principal secretsManaged Identities + IMDS
Embedded Slack tokensSlack OAuth app with workspace installs

GitHub Actions OIDC is the canonical 2024–2026 pattern: your workflow requests a short-lived OIDC token from GitHub, exchanges it with AWS STS / Azure AD / GCP STS for a short-lived cloud credential, and uses that. No long-lived secrets stored in GitHub Actions variables. Every major CI system now supports the same pattern.


18. Quick Reference

Secret discovery one-liners

# Scan current repo history
trufflehog git file://. --results=verified
gitleaks git -v

# Scan remote repo
trufflehog git https://github.com/org/repo --results=verified

# Scan org
trufflehog github --org=myorg --include-forks --include-members

# Scan Docker image
trufflehog docker --image=myimage:latest

# Scan filesystem (downloaded JS bundle / APK)
trufflehog filesystem ./extracted_bundle

# Scan S3 bucket
trufflehog s3 --bucket=mybucket

# Pre-commit check
gitleaks protect --staged

# Scan stdin
cat suspicious_file | gitleaks stdin

AWS key triage

aws sts get-caller-identity                        # whose key is this?
aws iam list-attached-user-policies --user-name X  # effective perms
aws iam list-access-keys --user-name X             # other keys?
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA... \
  --max-results 50                                 # usage history

GitHub token triage

curl -H "Authorization: token $GH_TOKEN" https://api.github.com/user
curl -H "Authorization: token $GH_TOKEN" https://api.github.com/user/orgs
# Check token scopes via response header: X-OAuth-Scopes

Quick impact assessment on a leaked key

  1. Who owns it? (org, team, individual)
  2. What is its scope? (effective permissions)
  3. When was it created?
  4. When was it last used? From where?
  5. What data can it reach?
  6. Was it ever used from an unexpected IP / time / region?
  7. Are there persistence artefacts created under it?
  8. Has it been rotated?
  9. Who has been notified?
  10. What’s the post-mortem action to prevent recurrence?

File patterns that frequently leak secrets

.env  .env.local  .env.production  .env.backup
config.json  config.yml  settings.py  local.settings.json
application.properties  application.yml
docker-compose.yml  docker-compose.override.yml
kubeconfig  .kube/config
.aws/credentials  .aws/config
credentials.json  service-account.json  sa.json
id_rsa  id_ed25519  *.pem  *.key  *.pfx  *.p12  *.jks
.npmrc  .pypirc  .gem/credentials
wp-config.php  parameters.yml
terraform.tfstate  *.tfvars
.git-credentials  .netrc
.mcp.json  claude_desktop_config.json
*.sql  *.dump  *.bak

Red flags in code review

  • String API_KEY = "..." with a long alphanumeric literal
  • os.environ.get("KEY", "default-fallback-with-real-value")
  • requests.get(url, headers={"Authorization": "Bearer ..."}) with hardcoded bearer
  • curl -H "X-API-Key: ..." in shell scripts
  • git log --all -p | grep -i "password\|secret\|api_key" returning hits
  • Deleted lines removing a key without a rotation commit
  • .env.example files that contain real values (“for testing”)
  • Comments like // TODO: remove before commit
  • Long base64 or hex strings next to keywords like key, token, secret

Mental model

Every secret has a half-life. A key in your vault, scoped tight, rotated weekly, and monitored = long half-life. A key in git, on Docker Hub, in a public gist, or in a mobile binary = zero half-life, already compromised, must be rotated now.

Detection is not remediation. 64% of secrets leaked in 2022 are still valid in 2026. The rotation you don’t do is the breach you will have.

If a scanner finds it, an attacker already found it.


Compiled from 30 research articles under ~/Documents/obsidian/chs/raw/Secrets/ covering GitGuardian State of Secrets Sprawl 2025 & 2026, OWASP Secrets Management Cheat Sheet, TruffleHog & Gitleaks documentation, Wiz Forbes AI 50 research, Truffle Security’s “oops commits” analysis, the CERT-EU Trivy/European Commission breach advisory, Shai-Hulud 2 and LiteLLM supply chain post-mortems, AI-era hardcoded-secrets research, and practitioner commentary on secret scanning tooling and vaults. Defensive security reference material.