Bug Bounty Reconnaissance Cheat Sheet

CTF & Bug Bounty » Bug Bounty » Bug Bounty Cheatsheets » Bug Bounty Reconnaissance Cheat Sheet

You should spend as much time as possible on the target discovery phase.

Main goal: identify all subdomains/IPs

Then, for each subdomain:

Detect open ports
Identify technologies and services
Discover files, endpoints, and parameters
Understand how the application works

Table of contents

Context
OSINT/Sensitive Information
Subdomains
- Passive Discovery
- Active Discovery
Open ports
Technologies and Services
Endpoint Mapping
- Passive Discovery
- Active Discovery
Inspection
Logic

Context

Read the program policy
What is the scope of the bug bounty program?
What does the company do?
In which country is the company based?
Have there been previous breaches or data leaks?
Review previously reported vulnerabilities, blog posts, and patch announcements
Social media → employees and company-related information
Create an account to gain access to all available features

OSINT/Sensitive Information

Source code on GitHub?
→ Use grep.app and search dorks: filenames, extensions, keywords, etc.
Code on paste or code-sharing platforms?
→ pastebin.com, codebeautify.org, gist.github.com, paste.ee, controlc.com, justpaste.it, ideone.com
Dig through Wikipedia
→ Edit history, List of acquisitions, company evolution
Use multiple search engines
→ Google, Baidu, Bing, Yandex, etc.
Known leaks?
→ Grayhat Warfare (cloud storage), LeakIX, checkleaked.cc, leaked.domains, etc.
Dorks to find sensitive files
→ Documents, source code, backups, configuration files, etc.
Dorks targeting cloud resources
→ Public AWS assets, misconfigured cloud instances, etc.
Extract metadata from all discovered files
→ Can reveal operating systems (e.g. macOS tools), author names, internal paths, etc.
→ Tools: exiftool, online-metadata.com
Dorks for interesting keywords
→ In URLs, page titles, and content
→ Examples: "panel", "admin", "console", "log in", "index of", etc.

Resources:

# LEAKS
https://buckets.grayhatwarfare.com/files?keywords=<domain>
https://leakix.net/search?scope=leak&q=host:<domain>
https://checkleaked.cc/breaches
https://leaked.domains/auth/Universal_Search/

# DORKS
site:gist.github.com "DOMAIN"
site:pastebin.com "DOMAIN"
site:codebeautify.org "DOMAIN"
site:paste.ee "DOMAIN"
site:controlc.com "DOMAIN"
site:justpaste.it "DOMAIN"
site:ideone.com "DOMAIN"

# SEARCH ENGINES
https://www.google.com/ ("", site, inbody, intext, filetype, ext, (all)intitle, (all)inurl, cache, info, OR, AND, NOT)
https://www.bing.com/ ("", ext, site, filetype, inbody, intitle, contains, AND, OR, NOT)
https://yahoo.com/ ("", site, hostname, filetype, intitle, inurl, OR)
https://yandex.com/
https://www.shodan.io/
https://www.zoomeye.ai/

# URL/TITLE/CLOUD
site:[target] inurl:foo
site:[target] intitle:foo
site:s3.amazonaws.com "[target]"
site:storage.googleapis.com "[target]"
site:blob.core.windows.net "[target]"
site:digitaloceanspaces.com "[target]"
site:wasabisys.com "[target]"
site:backblazeb2.com "[target]"
site:cloud-object-storage.appdomain.cloud "[target]"
site:aliyuncs.com "[target]"
site:oraclecloud.com "[target]"

# EXPOSED FILES
site:[target] ext:[extension] OR ext:[extension2] etc.
- MS365: doc, docx, docm, dotx, dotm, xls, xlsx, xlsm, xlsb, xltx, xltm, ppt, pptx, pptm, ppsx, ppsm, potx, potm, accdb, mdb, pub, vsd, vsdx, pst, ost, one
- LIBREOFFICE: odt, ott, odm, fodt, ods, ots, fods, odp, otp, fodp, odg, otg, odb, odf, odc, oxt
- GENERIC DOCUMENTS: pdf, rtf, md, markdown, rst, asciidoc, adoc, tex, cls, sty, bib, bst, epub, mobi, lit, azw, djvu
- CODE: php, php3, php4, php5, phtml, py, pyw, js, mjs, cjs, ts, tsx, jsx, go, java, kt, kts, scala, groovy, rb, erb, rhtml, do, jsp, jspf, asp, aspx, ascx, ashx, cshtml, vbhtml, cfml, cfm, cfc, pl, pm, lua, swift, rs, dart, ex, exs
- BACKUP FILES: bak, backup, bkp, old, orig, save, sav, dump, snapshot, archive, arc, tar, tgz, tbz, gz, bz2, xz, zip, 7z, rar, zst, log, logs, trace, journal, tmp, temp, swp, lock, cache, pid, dev, test, staging, prod, disabled, off, example, sample, dist, copy, copy1, copy2, prev, previous, tilde, til, tildebackup, autosave, recovery, crash, core, ~
- COMPRESSED FILES: cab, lzh, arj, z, cpio, rpm, deb, dmg, iso, ova, vmdk, vdi, vhd, vhdx, qcow2, box, tar.gz, tar.bz2, tar.xz
- CONF FILES: conf, cfg, config, cnf, ini, env, properties, prop, prefs, settings, options, json, jsonc, json5, yaml, yml, xml, toml, hcl, tf, tfvars, cue, ron, edn, babelrc, eslintrc, prettierrc, npmrc, yarnrc, pnpmfile, browserslistrc, webpack, vite, rollup, parcel
- OTHER CONF FILES: template, tpl, j2, mustache, reg, inf, admx, adml, policy, gitlab-ci, github, circleci, drone, jenkinsfile, travis, azure-pipelines, pipeline, htaccess, htpasswd, vhost, confd, zone, named, dns, nginx, apache, httpd
- KEYS: pem, crt, key, csr, keystore, truststore, vault, sops
- DATABASES: sql, sqlite, sqlite3, db, db3, rdb, sdb, kdb, isam, myd, myi, frm, bson, leveldb, rocksdb, couch, cdb, tdb, fdb, lmdb, csv, tsv, psv, dsv, dat, data, txt, flat, tab, parquet, avro, orc, feather, arrow, hdf, hdf5, h5, netcdf, nc, mat, dta, por, shp, shx, dbf, gpkg, kml, kmz, msgpack, protobuf, proto, thrift, capnproto, graphml, gexf, gml, logdb

Subdomains

Passive Discovery

crt.sh to identify domains registered at the same time
Dork: site:target.com -www
Dork: intext:"Copyright DOMAIN..."
→ Also use old copyright strings via the Wayback Machine
Subdomains exposed in GitHub code
→ Example: https://github.com/gwen001/github-search/blob/master/github-subdomains.py
Subdomains found on paste and data-sharing platforms
→ Pastebin, etc. (see OSINT section)
CORS / CSP policies may leak subdomains
SSL certificates reveal subdomains in the CN and SAN fields
DNS techniques
→ Reverse DNS queries (PTR records), reverse IP lookups (mxtoolbox)
ASN lookups and IP ranges
Favicon hashing on Shodan to identify related assets
→ favihash.com + Shodan dork: http.favicon.hash:
Passive tools
→ subfinder (API keys required), amass, online services

Active Discovery

Create or select wordlists based on the company’s country or subdomain naming conventions
→ Reference: wordlists.assetnote.io
Generate custom wordlists from page content using cewl
Active tools
→ gobuster, knockpy (with cewl wordlists or wordlists.assetnote.io)
Virtual Host discovery using ffuf
Use altdns to generate subdomain permutations
→ Based on cewl-generated wordlists or wordlists.assetnote.io
Systematically extract subdomains from discovered JavaScript and CSS files

Resources:

# WEBSITES
https://crt.sh/json?q=DOMAIN
https://subdomainfinder.c99.nl/
https://osint.sh/subdomain
https://dnsdumpster.com/
https://www.virustotal.com/gui/domain/DOMAIN/relations
https://securitytrails.com/DOMAIN
https://www.zoomeye.ai/
http://toolbar.netcraft.com/site_report?url=DOMAIN

# ASN
https://bgp.he.net/ (ASN searches)
https://asnlookup.com/ (ASN --> IP)

# DORKS
site:*.domain.com -sub1.domain.com -sub2.domain.com
site:*.*.domain.com -www.sub1.domain.com
site:*-*.domain.com
site:-domain.com intext:"@ Copyright DOMAIN.COM 2022 [...]"
site:-domain.com intext:"@ Copyright DOMAIN.COM 2021 [...]"
site:zone-h.org domain.com

# PASSIVE TOOLS
$ amass enum -passive -d DOMAIN
$ subfinder -all -o output -d DOMAIN
$ assetfinder --subs-only DOMAIN

# ACTIVE TOOLS
$ cewl http://example.com -d 2 -m 5 -w wordlist.txt
$ knockpy -w wordlist.txt DOMAIN
$ gobuster dns -q -d DOMAIN -w wordlist.txt
$ ffuf -w wordlist.txt -u "https://FUZZ.domain.com" -mc all
$ ffuf -w wordlist.txt -H 'Host: FUZZ.domain.com' -mc all -fs <size_if_failure> -u https://IP/
$ altdns -i known_subdomains.txt -w words.txt -o /tmp/permutations_results -r -s output.txt 

# SCREENSHOTS
$ eyewitness -f domains.txt

Open ports

Identify active vs inactive subdomains
Which TCP/UDP ports are open?
Test common web ports: 80, 443, 8000, 8080, etc.

$ for domain in $(cat domains.txt); do nmap -sS -sV -O --top-ports 1000 DOMAIN; done
$ for domain in $(cat domains.txt); do nmap -sU -sV -O --top-ports 1000 DOMAIN; done
$ ffuf -w ports.txt -u 'http://DOMAIN:FUZZ' -r

$ cat domains.txt | httpx -silent -t 1 -title -method -status-code -follow-redirects -x all -p 80,81,443,445,1080,4000,4443,8000,8080,8443,10000 -fc 404,405
$ cat domains.txt | httpx -silent -t 1 -probe -ip -cdn -fc 404

For each active subdomain:port, we need to identify:

The technology
The mapping
The logic

Technologies and Services

WAF?
→ wafw00f, whatwaf
CDN?
→ whois → does the IP belong to a CDN?
Identify the technology stack used by the subdomain
Interesting HTTP headers
→ Server, X-Powered-By, etc.
Use OWASP ZAP as a proxy for automated detection
Use technology fingerprinting tools
→ wappalyzer, webanalyze, whatweb
Use SquareX to check whether results change based on:
- Browser
- Geographic location
Test different User-Agents
→ Mobile UA, different OS / browser combinations to observe page behavior

https://urlscan.io/
https://hackertarget.com/whatweb-scan/
https://www.wappalyzer.com/lookup/DOMAIN/

$ webanalyze -host domain -crawl 1
$ whatweb https://domain.com/

Endpoint Mapping

Notes:

Some endpoints may only be visible when authenticated
Some endpoints are only accessible using specific HTTP methods
→ Typically POST, PUT, DELETE, etc. (especially for APIs)

Passive Discovery

Passive tools
→ gau, waymore, waybackurls, etc.
Look for URL patterns
→ Do the URLs returned by waybackurls follow identifiable patterns?
Dorks
→ cache: + file types + specific pages

Active Discovery

Custom wordlists
→ Generated with cewl (multiple languages)
→ Generic wordlists: http://wordlists.assetnote.io/
Look for metadata and discovery files
→ /*.txt, /.well-known/*, sitemaps, API documentation endpoints (swagger.json, etc.)
Bruteforce endpoints
→ Using meg with different HTTP methods
Recursively extract all visible URLs from known pages
→ pyWebCrawler.py
Download all .css files
Download all .js files
→ subjs
Bruteforce JavaScript file paths
→ FUZZ.js, FUZZ.compiled.js, FUZZ.min.js, FUZZ.js.map, etc.
Same approach for CSS files
→ This is not a waste of time
For discovered files, test alternative extensions
→ .old, .src, ~, .dev, .backup, etc.
→ (-e option in ffuf)
Also test common copy patterns
→ "Copy of [file]", "file.[extension]"
→ Use the developers’ native language for "Copy of"
Parameter discovery
→ arjun, x8

# WEBSITES
https://web.archive.org/
https://cachedview.com/
https://oldweb.today/
https://archive.is/
https://timetravel.mementoweb.org/
https://commoncrawl.org/

# DORKS
cache:domain.com
site:*.domain.com filetype:do OR filetype:txt OR filetype:csv OR filetype:xlsx [...]
site:*.domain.com "cheatsheet"
site:*.domain.com "@domain.com"
site:*.domain.com inurl:admin
site:*.domain.com intitle:"Index of"
site:*.domain.com intext:"username" filetype:log

# PASSIVE
$ gospider -s "https://DOMAIN/" -o output --other-source
$ waybackurls <domain> | sort -u
$ gau <domain> | sort -u
$ waymore <domain> | sort -u
$ cat domains.txt | gau

# ACTIVE
$ python3 linkfinder.py -i <url>
$ katana -u https://DOMAIN/
$ meg hosts.txt wordlist.txt -s 200,204,301,302,303,401,403,405,429,500,501,502,503,504 -X GET -c 5
$ ffuf -mc all -fc 404 -r -w <wordlist> -u 'https://domain.com/FUZZ' -recursion -recursion-depth 2
$ ffuf -u https://api.example.com/PATH -X METHOD -w /path/to/wordlist:PATH -w /path/to/http_methods:METHOD
$ ffuf -u "https://domain.com/FUZZ" -w <wordlist> -e $(cat web-mutations.txt)

$ x8 -u "https://domain.com/" -w <wordlist>
$ arjun -u https://domain.com/api/v1/endpoint -m POST

$ cat hosts.txt | gau | sort -u | subjs

Inspection

Keep only active endpoints
→ httpx
Filter URLs using regex patterns
→ e.g. [a-zA-Z0-9+=_-]{6,} to identify tokens or secrets
Analyze all CSS files
→ Comments, hidden or interesting data
Analyze all JavaScript files
→ jsa.py to extract useful information (endpoints, variables, etc.)
Are the JavaScript files minified?
→ Can they be read and analyzed?
→ Look for parameters, endpoints, API keys, secrets, etc.
Check older versions on the Wayback Machine
→ Old JS and CSS files, as well as previously exposed endpoints
Identify every endpoint used by the application

cat hosts.txt | gau | subjs | python3 jsa.py

After the discovery phase, the next step is to understand how the application works.

Logic

What does each endpoint / API function do?
What are the parameters for each endpoint?
What are the server responses for each endpoint?
Is access control enforced?
→ Through which parameter or mechanism?
What is the HTTP configuration of the endpoint?
How is this endpoint related to others?
Does the endpoint interact with other subdomains?
Is the endpoint linked to other features?
How is authentication handled?
→ Cookie? JWT? Session token? SSO?

Create a mind map to make things clearer, or simple notes answering questions like:

PUT /api/account - JSON - Updates account data
-> params: userId, email, username, password
-> auth: JWT session cookie

POST /api/account - JSON - Deletes an account
-> params: userId, captcha, disable_account
-> auth: JWT session cookie

This provides a clear overview of each endpoint and its parameters, and helps generate ideas for the exploitation phase.

Do not exploit or inject anything before properly mapping the subdomain.

Without a clear understanding of the application logic, exploitation will be a waste of time and will have a very low success rate, as we’ll be attacking blindly.

All this recon information should be organized per subdomain, using tools like:

Notion
Obsidian
Sublime Text
Any structured note-taking system

It takes a lot of time.

Disclaimer

All content published on this website is for educational purposes only.

The techniques, tools, and methodologies described here are intended to be used only on systems you own or have explicit permission to test.

I do not encourage or take responsibility for any illegal use of the information provided.

Context

OSINT/Sensitive Information

Subdomains

Passive Discovery

Active Discovery

Open ports

Technologies and Services

Endpoint Mapping

Passive Discovery

Active Discovery

Inspection

Logic

Disclaimer

Leave a Comment Cancel reply