Bug Bounty Reconnaissance Cheat Sheet

CTF & Bug Bounty » Bug Bounty » Bug Bounty Cheatsheets » Bug Bounty Reconnaissance Cheat Sheet


You should spend as much time as possible on the target discovery phase.

Main goal: identify all subdomains/IPs

Then, for each subdomain:

  • Detect open ports
  • Identify technologies and services
  • Discover files, endpoints, and parameters
  • Understand how the application works

Table of contents


Context

  • Read the program policy
  • What is the scope of the bug bounty program?
  • What does the company do?
  • In which country is the company based?
  • Have there been previous breaches or data leaks?
  • Review previously reported vulnerabilities, blog posts, and patch announcements
  • Social media → employees and company-related information
  • Create an account to gain access to all available features

OSINT/Sensitive Information

  • Source code on GitHub?
    → Use grep.app and search dorks: filenames, extensions, keywords, etc.
  • Code on paste or code-sharing platforms?
    pastebin.com, codebeautify.org, gist.github.com, paste.ee, controlc.com, justpaste.it, ideone.com
  • Dig through Wikipedia
    → Edit history, List of acquisitions, company evolution
  • Use multiple search engines
    → Google, Baidu, Bing, Yandex, etc.
  • Known leaks?
    → Grayhat Warfare (cloud storage), LeakIX, checkleaked.cc, leaked.domains, etc.
  • Dorks to find sensitive files
    → Documents, source code, backups, configuration files, etc.
  • Dorks targeting cloud resources
    → Public AWS assets, misconfigured cloud instances, etc.
  • Extract metadata from all discovered files
    → Can reveal operating systems (e.g. macOS tools), author names, internal paths, etc.
    → Tools: exiftool, online-metadata.com
  • Dorks for interesting keywords
    → In URLs, page titles, and content
    → Examples: "panel", "admin", "console", "log in", "index of", etc.

Resources:

# LEAKS
https://buckets.grayhatwarfare.com/files?keywords=<domain>
https://leakix.net/search?scope=leak&q=host:<domain>
https://checkleaked.cc/breaches
https://leaked.domains/auth/Universal_Search/

# DORKS
site:gist.github.com "DOMAIN"
site:pastebin.com "DOMAIN"
site:codebeautify.org "DOMAIN"
site:paste.ee "DOMAIN"
site:controlc.com "DOMAIN"
site:justpaste.it "DOMAIN"
site:ideone.com "DOMAIN"

# SEARCH ENGINES
https://www.google.com/ ("", site, inbody, intext, filetype, ext, (all)intitle, (all)inurl, cache, info, OR, AND, NOT)
https://www.bing.com/ ("", ext, site, filetype, inbody, intitle, contains, AND, OR, NOT)
https://yahoo.com/ ("", site, hostname, filetype, intitle, inurl, OR)
https://yandex.com/
https://www.shodan.io/
https://www.zoomeye.ai/

# URL/TITLE/CLOUD
site:[target] inurl:foo
site:[target] intitle:foo
site:s3.amazonaws.com "[target]"
site:storage.googleapis.com "[target]"
site:blob.core.windows.net "[target]"
site:digitaloceanspaces.com "[target]"
site:wasabisys.com "[target]"
site:backblazeb2.com "[target]"
site:cloud-object-storage.appdomain.cloud "[target]"
site:aliyuncs.com "[target]"
site:oraclecloud.com "[target]"

# EXPOSED FILES
site:[target] ext:[extension] OR ext:[extension2] etc.
- MS365: doc, docx, docm, dotx, dotm, xls, xlsx, xlsm, xlsb, xltx, xltm, ppt, pptx, pptm, ppsx, ppsm, potx, potm, accdb, mdb, pub, vsd, vsdx, pst, ost, one
- LIBREOFFICE: odt, ott, odm, fodt, ods, ots, fods, odp, otp, fodp, odg, otg, odb, odf, odc, oxt
- GENERIC DOCUMENTS: pdf, rtf, md, markdown, rst, asciidoc, adoc, tex, cls, sty, bib, bst, epub, mobi, lit, azw, djvu
- CODE: php, php3, php4, php5, phtml, py, pyw, js, mjs, cjs, ts, tsx, jsx, go, java, kt, kts, scala, groovy, rb, erb, rhtml, do, jsp, jspf, asp, aspx, ascx, ashx, cshtml, vbhtml, cfml, cfm, cfc, pl, pm, lua, swift, rs, dart, ex, exs
- BACKUP FILES: bak, backup, bkp, old, orig, save, sav, dump, snapshot, archive, arc, tar, tgz, tbz, gz, bz2, xz, zip, 7z, rar, zst, log, logs, trace, journal, tmp, temp, swp, lock, cache, pid, dev, test, staging, prod, disabled, off, example, sample, dist, copy, copy1, copy2, prev, previous, tilde, til, tildebackup, autosave, recovery, crash, core, ~
- COMPRESSED FILES: cab, lzh, arj, z, cpio, rpm, deb, dmg, iso, ova, vmdk, vdi, vhd, vhdx, qcow2, box, tar.gz, tar.bz2, tar.xz
- CONF FILES: conf, cfg, config, cnf, ini, env, properties, prop, prefs, settings, options, json, jsonc, json5, yaml, yml, xml, toml, hcl, tf, tfvars, cue, ron, edn, babelrc, eslintrc, prettierrc, npmrc, yarnrc, pnpmfile, browserslistrc, webpack, vite, rollup, parcel
- OTHER CONF FILES: template, tpl, j2, mustache, reg, inf, admx, adml, policy, gitlab-ci, github, circleci, drone, jenkinsfile, travis, azure-pipelines, pipeline, htaccess, htpasswd, vhost, confd, zone, named, dns, nginx, apache, httpd
- KEYS: pem, crt, key, csr, keystore, truststore, vault, sops
- DATABASES: sql, sqlite, sqlite3, db, db3, rdb, sdb, kdb, isam, myd, myi, frm, bson, leveldb, rocksdb, couch, cdb, tdb, fdb, lmdb, csv, tsv, psv, dsv, dat, data, txt, flat, tab, parquet, avro, orc, feather, arrow, hdf, hdf5, h5, netcdf, nc, mat, dta, por, shp, shx, dbf, gpkg, kml, kmz, msgpack, protobuf, proto, thrift, capnproto, graphml, gexf, gml, logdb

Subdomains

Passive Discovery

  • crt.sh to identify domains registered at the same time
  • Dork: site:target.com -www
  • Dork: intext:"Copyright DOMAIN..."
    → Also use old copyright strings via the Wayback Machine
  • Subdomains exposed in GitHub code
    → Example: https://github.com/gwen001/github-search/blob/master/github-subdomains.py
  • Subdomains found on paste and data-sharing platforms
    → Pastebin, etc. (see OSINT section)
  • CORS / CSP policies may leak subdomains
  • SSL certificates reveal subdomains in the CN and SAN fields
  • DNS techniques
    → Reverse DNS queries (PTR records), reverse IP lookups (mxtoolbox)
  • ASN lookups and IP ranges
  • Favicon hashing on Shodan to identify related assets
    favihash.com + Shodan dork: http.favicon.hash:
  • Passive tools
    subfinder (API keys required), amass, online services

Active Discovery

  • Create or select wordlists based on the company’s country or subdomain naming conventions
    → Reference: wordlists.assetnote.io
  • Generate custom wordlists from page content using cewl
  • Active tools
    gobuster, knockpy (with cewl wordlists or wordlists.assetnote.io)
  • Virtual Host discovery using ffuf
  • Use altdns to generate subdomain permutations
    → Based on cewl-generated wordlists or wordlists.assetnote.io
  • Systematically extract subdomains from discovered JavaScript and CSS files

Resources:

# WEBSITES
https://crt.sh/json?q=DOMAIN
https://subdomainfinder.c99.nl/
https://osint.sh/subdomain
https://dnsdumpster.com/
https://www.virustotal.com/gui/domain/DOMAIN/relations
https://securitytrails.com/DOMAIN
https://www.zoomeye.ai/
http://toolbar.netcraft.com/site_report?url=DOMAIN

# ASN
https://bgp.he.net/ (ASN searches)
https://asnlookup.com/ (ASN --> IP)

# DORKS
site:*.domain.com -sub1.domain.com -sub2.domain.com
site:*.*.domain.com -www.sub1.domain.com
site:*-*.domain.com
site:-domain.com intext:"@ Copyright DOMAIN.COM 2022 [...]"
site:-domain.com intext:"@ Copyright DOMAIN.COM 2021 [...]"
site:zone-h.org domain.com

# PASSIVE TOOLS
$ amass enum -passive -d DOMAIN
$ subfinder -all -o output -d DOMAIN
$ assetfinder --subs-only DOMAIN

# ACTIVE TOOLS
$ cewl http://example.com -d 2 -m 5 -w wordlist.txt
$ knockpy -w wordlist.txt DOMAIN
$ gobuster dns -q -d DOMAIN -w wordlist.txt
$ ffuf -w wordlist.txt -u "https://FUZZ.domain.com" -mc all
$ ffuf -w wordlist.txt -H 'Host: FUZZ.domain.com' -mc all -fs <size_if_failure> -u https://IP/
$ altdns -i known_subdomains.txt -w words.txt -o /tmp/permutations_results -r -s output.txt 

# SCREENSHOTS
$ eyewitness -f domains.txt

Open ports

  • Identify active vs inactive subdomains
  • Which TCP/UDP ports are open?
  • Test common web ports: 80, 443, 8000, 8080, etc.
$ for domain in $(cat domains.txt); do nmap -sS -sV -O --top-ports 1000 DOMAIN; done
$ for domain in $(cat domains.txt); do nmap -sU -sV -O --top-ports 1000 DOMAIN; done
$ ffuf -w ports.txt -u 'http://DOMAIN:FUZZ' -r

$ cat domains.txt | httpx -silent -t 1 -title -method -status-code -follow-redirects -x all -p 80,81,443,445,1080,4000,4443,8000,8080,8443,10000 -fc 404,405
$ cat domains.txt | httpx -silent -t 1 -probe -ip -cdn -fc 404

For each active subdomain:port, we need to identify:

  • The technology
  • The mapping
  • The logic

Technologies and Services

  • WAF?
    wafw00f, whatwaf
  • CDN?
    whois → does the IP belong to a CDN?
  • Identify the technology stack used by the subdomain
  • Interesting HTTP headers
    Server, X-Powered-By, etc.
  • Use OWASP ZAP as a proxy for automated detection
  • Use technology fingerprinting tools
    wappalyzer, webanalyze, whatweb
  • Use SquareX to check whether results change based on:
    • Browser
    • Geographic location
  • Test different User-Agents
    → Mobile UA, different OS / browser combinations to observe page behavior
https://urlscan.io/
https://hackertarget.com/whatweb-scan/
https://www.wappalyzer.com/lookup/DOMAIN/

$ webanalyze -host domain -crawl 1
$ whatweb https://domain.com/

Endpoint Mapping

Notes:

  • Some endpoints may only be visible when authenticated
  • Some endpoints are only accessible using specific HTTP methods
    → Typically POST, PUT, DELETE, etc. (especially for APIs)

Passive Discovery

  • Passive tools
    gau, waymore, waybackurls, etc.
  • Look for URL patterns
    → Do the URLs returned by waybackurls follow identifiable patterns?
  • Dorks
    cache: + file types + specific pages

Active Discovery

  • Custom wordlists
    → Generated with cewl (multiple languages)
    → Generic wordlists: http://wordlists.assetnote.io/
  • Look for metadata and discovery files
    /*.txt, /.well-known/*, sitemaps, API documentation endpoints (swagger.json, etc.)
  • Bruteforce endpoints
    → Using meg with different HTTP methods
  • Recursively extract all visible URLs from known pages
    pyWebCrawler.py
  • Download all .css files
  • Download all .js files
    subjs
  • Bruteforce JavaScript file paths
    FUZZ.js, FUZZ.compiled.js, FUZZ.min.js, FUZZ.js.map, etc.
  • Same approach for CSS files
    → This is not a waste of time
  • For discovered files, test alternative extensions
    .old, .src, ~, .dev, .backup, etc.
    → (-e option in ffuf)
  • Also test common copy patterns
    "Copy of [file]", "file.[extension]"
    → Use the developers’ native language for "Copy of"
  • Parameter discovery
    arjun, x8
# WEBSITES
https://web.archive.org/
https://cachedview.com/
https://oldweb.today/
https://archive.is/
https://timetravel.mementoweb.org/
https://commoncrawl.org/

# DORKS
cache:domain.com
site:*.domain.com filetype:do OR filetype:txt OR filetype:csv OR filetype:xlsx [...]
site:*.domain.com "cheatsheet"
site:*.domain.com "@domain.com"
site:*.domain.com inurl:admin
site:*.domain.com intitle:"Index of"
site:*.domain.com intext:"username" filetype:log

# PASSIVE
$ gospider -s "https://DOMAIN/" -o output --other-source
$ waybackurls <domain> | sort -u
$ gau <domain> | sort -u
$ waymore <domain> | sort -u
$ cat domains.txt | gau

# ACTIVE
$ python3 linkfinder.py -i <url>
$ katana -u https://DOMAIN/
$ meg hosts.txt wordlist.txt -s 200,204,301,302,303,401,403,405,429,500,501,502,503,504 -X GET -c 5
$ ffuf -mc all -fc 404 -r -w <wordlist> -u 'https://domain.com/FUZZ' -recursion -recursion-depth 2
$ ffuf -u https://api.example.com/PATH -X METHOD -w /path/to/wordlist:PATH -w /path/to/http_methods:METHOD
$ ffuf -u "https://domain.com/FUZZ" -w <wordlist> -e $(cat web-mutations.txt)

$ x8 -u "https://domain.com/" -w <wordlist>
$ arjun -u https://domain.com/api/v1/endpoint -m POST

$ cat hosts.txt | gau | sort -u | subjs

Inspection

  • Keep only active endpoints
    httpx
  • Filter URLs using regex patterns
    → e.g. [a-zA-Z0-9+=_-]{6,} to identify tokens or secrets
  • Analyze all CSS files
    → Comments, hidden or interesting data
  • Analyze all JavaScript files
    jsa.py to extract useful information (endpoints, variables, etc.)
  • Are the JavaScript files minified?
    → Can they be read and analyzed?
    → Look for parameters, endpoints, API keys, secrets, etc.
  • Check older versions on the Wayback Machine
    → Old JS and CSS files, as well as previously exposed endpoints
  • Identify every endpoint used by the application
cat hosts.txt | gau | subjs | python3 jsa.py

After the discovery phase, the next step is to understand how the application works.


Logic

  • What does each endpoint / API function do?
  • What are the parameters for each endpoint?
  • What are the server responses for each endpoint?
  • Is access control enforced?
    → Through which parameter or mechanism?
  • What is the HTTP configuration of the endpoint?
  • How is this endpoint related to others?
  • Does the endpoint interact with other subdomains?
  • Is the endpoint linked to other features?
  • How is authentication handled?
    → Cookie? JWT? Session token? SSO?

Create a mind map to make things clearer, or simple notes answering questions like:

PUT /api/account - JSON - Updates account data
-> params: userId, email, username, password
-> auth: JWT session cookie

POST /api/account - JSON - Deletes an account
-> params: userId, captcha, disable_account
-> auth: JWT session cookie

This provides a clear overview of each endpoint and its parameters, and helps generate ideas for the exploitation phase.


Do not exploit or inject anything before properly mapping the subdomain.

Without a clear understanding of the application logic, exploitation will be a waste of time and will have a very low success rate, as we’ll be attacking blindly.

All this recon information should be organized per subdomain, using tools like:

  • Notion
  • Obsidian
  • Sublime Text
  • Any structured note-taking system

It takes a lot of time.


Disclaimer

All content published on this website is for educational purposes only.

The techniques, tools, and methodologies described here are intended to be used only on systems you own or have explicit permission to test.

I do not encourage or take responsibility for any illegal use of the information provided.

Leave a Comment