CTF & Bug Bounty » Bug Bounty » Bug Bounty Cheatsheets » Bug Bounty Reconnaissance Cheat Sheet
You should spend as much time as possible on the target discovery phase.
Main goal: identify all subdomains/IPs
Then, for each subdomain:
- Detect open ports
- Identify technologies and services
- Discover files, endpoints, and parameters
- Understand how the application works
Table of contents
Context
- Read the program policy
- What is the scope of the bug bounty program?
- What does the company do?
- In which country is the company based?
- Have there been previous breaches or data leaks?
- Review previously reported vulnerabilities, blog posts, and patch announcements
- Social media → employees and company-related information
- Create an account to gain access to all available features
OSINT/Sensitive Information
- Source code on GitHub?
→ Usegrep.appand search dorks: filenames, extensions, keywords, etc. - Code on paste or code-sharing platforms?
→pastebin.com,codebeautify.org,gist.github.com,paste.ee,controlc.com,justpaste.it,ideone.com - Dig through Wikipedia
→ Edit history, List of acquisitions, company evolution - Use multiple search engines
→ Google, Baidu, Bing, Yandex, etc. - Known leaks?
→ Grayhat Warfare (cloud storage), LeakIX,checkleaked.cc,leaked.domains, etc. - Dorks to find sensitive files
→ Documents, source code, backups, configuration files, etc. - Dorks targeting cloud resources
→ Public AWS assets, misconfigured cloud instances, etc. - Extract metadata from all discovered files
→ Can reveal operating systems (e.g. macOS tools), author names, internal paths, etc.
→ Tools:exiftool,online-metadata.com - Dorks for interesting keywords
→ In URLs, page titles, and content
→ Examples:"panel","admin","console","log in","index of", etc.
Resources:
# LEAKS
https://buckets.grayhatwarfare.com/files?keywords=<domain>
https://leakix.net/search?scope=leak&q=host:<domain>
https://checkleaked.cc/breaches
https://leaked.domains/auth/Universal_Search/
# DORKS
site:gist.github.com "DOMAIN"
site:pastebin.com "DOMAIN"
site:codebeautify.org "DOMAIN"
site:paste.ee "DOMAIN"
site:controlc.com "DOMAIN"
site:justpaste.it "DOMAIN"
site:ideone.com "DOMAIN"
# SEARCH ENGINES
https://www.google.com/ ("", site, inbody, intext, filetype, ext, (all)intitle, (all)inurl, cache, info, OR, AND, NOT)
https://www.bing.com/ ("", ext, site, filetype, inbody, intitle, contains, AND, OR, NOT)
https://yahoo.com/ ("", site, hostname, filetype, intitle, inurl, OR)
https://yandex.com/
https://www.shodan.io/
https://www.zoomeye.ai/
# URL/TITLE/CLOUD
site:[target] inurl:foo
site:[target] intitle:foo
site:s3.amazonaws.com "[target]"
site:storage.googleapis.com "[target]"
site:blob.core.windows.net "[target]"
site:digitaloceanspaces.com "[target]"
site:wasabisys.com "[target]"
site:backblazeb2.com "[target]"
site:cloud-object-storage.appdomain.cloud "[target]"
site:aliyuncs.com "[target]"
site:oraclecloud.com "[target]"
# EXPOSED FILES
site:[target] ext:[extension] OR ext:[extension2] etc.
- MS365: doc, docx, docm, dotx, dotm, xls, xlsx, xlsm, xlsb, xltx, xltm, ppt, pptx, pptm, ppsx, ppsm, potx, potm, accdb, mdb, pub, vsd, vsdx, pst, ost, one
- LIBREOFFICE: odt, ott, odm, fodt, ods, ots, fods, odp, otp, fodp, odg, otg, odb, odf, odc, oxt
- GENERIC DOCUMENTS: pdf, rtf, md, markdown, rst, asciidoc, adoc, tex, cls, sty, bib, bst, epub, mobi, lit, azw, djvu
- CODE: php, php3, php4, php5, phtml, py, pyw, js, mjs, cjs, ts, tsx, jsx, go, java, kt, kts, scala, groovy, rb, erb, rhtml, do, jsp, jspf, asp, aspx, ascx, ashx, cshtml, vbhtml, cfml, cfm, cfc, pl, pm, lua, swift, rs, dart, ex, exs
- BACKUP FILES: bak, backup, bkp, old, orig, save, sav, dump, snapshot, archive, arc, tar, tgz, tbz, gz, bz2, xz, zip, 7z, rar, zst, log, logs, trace, journal, tmp, temp, swp, lock, cache, pid, dev, test, staging, prod, disabled, off, example, sample, dist, copy, copy1, copy2, prev, previous, tilde, til, tildebackup, autosave, recovery, crash, core, ~
- COMPRESSED FILES: cab, lzh, arj, z, cpio, rpm, deb, dmg, iso, ova, vmdk, vdi, vhd, vhdx, qcow2, box, tar.gz, tar.bz2, tar.xz
- CONF FILES: conf, cfg, config, cnf, ini, env, properties, prop, prefs, settings, options, json, jsonc, json5, yaml, yml, xml, toml, hcl, tf, tfvars, cue, ron, edn, babelrc, eslintrc, prettierrc, npmrc, yarnrc, pnpmfile, browserslistrc, webpack, vite, rollup, parcel
- OTHER CONF FILES: template, tpl, j2, mustache, reg, inf, admx, adml, policy, gitlab-ci, github, circleci, drone, jenkinsfile, travis, azure-pipelines, pipeline, htaccess, htpasswd, vhost, confd, zone, named, dns, nginx, apache, httpd
- KEYS: pem, crt, key, csr, keystore, truststore, vault, sops
- DATABASES: sql, sqlite, sqlite3, db, db3, rdb, sdb, kdb, isam, myd, myi, frm, bson, leveldb, rocksdb, couch, cdb, tdb, fdb, lmdb, csv, tsv, psv, dsv, dat, data, txt, flat, tab, parquet, avro, orc, feather, arrow, hdf, hdf5, h5, netcdf, nc, mat, dta, por, shp, shx, dbf, gpkg, kml, kmz, msgpack, protobuf, proto, thrift, capnproto, graphml, gexf, gml, logdb
Subdomains
Passive Discovery
crt.shto identify domains registered at the same time- Dork:
site:target.com -www - Dork:
intext:"Copyright DOMAIN..."
→ Also use old copyright strings via the Wayback Machine - Subdomains exposed in GitHub code
→ Example: https://github.com/gwen001/github-search/blob/master/github-subdomains.py - Subdomains found on paste and data-sharing platforms
→ Pastebin, etc. (see OSINT section) - CORS / CSP policies may leak subdomains
- SSL certificates reveal subdomains in the CN and SAN fields
- DNS techniques
→ Reverse DNS queries (PTR records), reverse IP lookups (mxtoolbox) - ASN lookups and IP ranges
- Favicon hashing on Shodan to identify related assets
→favihash.com+ Shodan dork:http.favicon.hash: - Passive tools
→subfinder(API keys required),amass, online services
Active Discovery
- Create or select wordlists based on the company’s country or subdomain naming conventions
→ Reference:wordlists.assetnote.io - Generate custom wordlists from page content using
cewl - Active tools
→gobuster,knockpy(withcewlwordlists orwordlists.assetnote.io) - Virtual Host discovery using
ffuf - Use
altdnsto generate subdomain permutations
→ Based oncewl-generated wordlists orwordlists.assetnote.io - Systematically extract subdomains from discovered JavaScript and CSS files
Resources:
# WEBSITES
https://crt.sh/json?q=DOMAIN
https://subdomainfinder.c99.nl/
https://osint.sh/subdomain
https://dnsdumpster.com/
https://www.virustotal.com/gui/domain/DOMAIN/relations
https://securitytrails.com/DOMAIN
https://www.zoomeye.ai/
http://toolbar.netcraft.com/site_report?url=DOMAIN
# ASN
https://bgp.he.net/ (ASN searches)
https://asnlookup.com/ (ASN --> IP)
# DORKS
site:*.domain.com -sub1.domain.com -sub2.domain.com
site:*.*.domain.com -www.sub1.domain.com
site:*-*.domain.com
site:-domain.com intext:"@ Copyright DOMAIN.COM 2022 [...]"
site:-domain.com intext:"@ Copyright DOMAIN.COM 2021 [...]"
site:zone-h.org domain.com
# PASSIVE TOOLS
$ amass enum -passive -d DOMAIN
$ subfinder -all -o output -d DOMAIN
$ assetfinder --subs-only DOMAIN
# ACTIVE TOOLS
$ cewl http://example.com -d 2 -m 5 -w wordlist.txt
$ knockpy -w wordlist.txt DOMAIN
$ gobuster dns -q -d DOMAIN -w wordlist.txt
$ ffuf -w wordlist.txt -u "https://FUZZ.domain.com" -mc all
$ ffuf -w wordlist.txt -H 'Host: FUZZ.domain.com' -mc all -fs <size_if_failure> -u https://IP/
$ altdns -i known_subdomains.txt -w words.txt -o /tmp/permutations_results -r -s output.txt
# SCREENSHOTS
$ eyewitness -f domains.txt
Open ports
- Identify active vs inactive subdomains
- Which TCP/UDP ports are open?
- Test common web ports:
80,443,8000,8080, etc.
$ for domain in $(cat domains.txt); do nmap -sS -sV -O --top-ports 1000 DOMAIN; done
$ for domain in $(cat domains.txt); do nmap -sU -sV -O --top-ports 1000 DOMAIN; done
$ ffuf -w ports.txt -u 'http://DOMAIN:FUZZ' -r
$ cat domains.txt | httpx -silent -t 1 -title -method -status-code -follow-redirects -x all -p 80,81,443,445,1080,4000,4443,8000,8080,8443,10000 -fc 404,405
$ cat domains.txt | httpx -silent -t 1 -probe -ip -cdn -fc 404
For each active subdomain:port, we need to identify:
- The technology
- The mapping
- The logic
Technologies and Services
- WAF?
→wafw00f,whatwaf - CDN?
→whois→ does the IP belong to a CDN? - Identify the technology stack used by the subdomain
- Interesting HTTP headers
→Server,X-Powered-By, etc. - Use OWASP ZAP as a proxy for automated detection
- Use technology fingerprinting tools
→wappalyzer,webanalyze,whatweb - Use SquareX to check whether results change based on:
- Browser
- Geographic location
- Test different User-Agents
→ Mobile UA, different OS / browser combinations to observe page behavior
https://urlscan.io/
https://hackertarget.com/whatweb-scan/
https://www.wappalyzer.com/lookup/DOMAIN/
$ webanalyze -host domain -crawl 1
$ whatweb https://domain.com/
Endpoint Mapping
Notes:
- Some endpoints may only be visible when authenticated
- Some endpoints are only accessible using specific HTTP methods
→ TypicallyPOST,PUT,DELETE, etc. (especially for APIs)
Passive Discovery
- Passive tools
→gau,waymore,waybackurls, etc. - Look for URL patterns
→ Do the URLs returned bywaybackurlsfollow identifiable patterns? - Dorks
→cache:+ file types + specific pages
Active Discovery
- Custom wordlists
→ Generated withcewl(multiple languages)
→ Generic wordlists: http://wordlists.assetnote.io/ - Look for metadata and discovery files
→/*.txt,/.well-known/*, sitemaps, API documentation endpoints (swagger.json, etc.) - Bruteforce endpoints
→ Usingmegwith different HTTP methods - Recursively extract all visible URLs from known pages
→pyWebCrawler.py - Download all
.cssfiles - Download all
.jsfiles
→subjs - Bruteforce JavaScript file paths
→FUZZ.js,FUZZ.compiled.js,FUZZ.min.js,FUZZ.js.map, etc. - Same approach for CSS files
→ This is not a waste of time - For discovered files, test alternative extensions
→.old,.src,~,.dev,.backup, etc.
→ (-eoption inffuf) - Also test common copy patterns
→"Copy of [file]","file.[extension]"
→ Use the developers’ native language for"Copy of" - Parameter discovery
→arjun,x8
# WEBSITES
https://web.archive.org/
https://cachedview.com/
https://oldweb.today/
https://archive.is/
https://timetravel.mementoweb.org/
https://commoncrawl.org/
# DORKS
cache:domain.com
site:*.domain.com filetype:do OR filetype:txt OR filetype:csv OR filetype:xlsx [...]
site:*.domain.com "cheatsheet"
site:*.domain.com "@domain.com"
site:*.domain.com inurl:admin
site:*.domain.com intitle:"Index of"
site:*.domain.com intext:"username" filetype:log
# PASSIVE
$ gospider -s "https://DOMAIN/" -o output --other-source
$ waybackurls <domain> | sort -u
$ gau <domain> | sort -u
$ waymore <domain> | sort -u
$ cat domains.txt | gau
# ACTIVE
$ python3 linkfinder.py -i <url>
$ katana -u https://DOMAIN/
$ meg hosts.txt wordlist.txt -s 200,204,301,302,303,401,403,405,429,500,501,502,503,504 -X GET -c 5
$ ffuf -mc all -fc 404 -r -w <wordlist> -u 'https://domain.com/FUZZ' -recursion -recursion-depth 2
$ ffuf -u https://api.example.com/PATH -X METHOD -w /path/to/wordlist:PATH -w /path/to/http_methods:METHOD
$ ffuf -u "https://domain.com/FUZZ" -w <wordlist> -e $(cat web-mutations.txt)
$ x8 -u "https://domain.com/" -w <wordlist>
$ arjun -u https://domain.com/api/v1/endpoint -m POST
$ cat hosts.txt | gau | sort -u | subjs
Inspection
- Keep only active endpoints
→httpx - Filter URLs using regex patterns
→ e.g.[a-zA-Z0-9+=_-]{6,}to identify tokens or secrets - Analyze all CSS files
→ Comments, hidden or interesting data - Analyze all JavaScript files
→jsa.pyto extract useful information (endpoints, variables, etc.) - Are the JavaScript files minified?
→ Can they be read and analyzed?
→ Look for parameters, endpoints, API keys, secrets, etc. - Check older versions on the Wayback Machine
→ Old JS and CSS files, as well as previously exposed endpoints - Identify every endpoint used by the application
cat hosts.txt | gau | subjs | python3 jsa.py
After the discovery phase, the next step is to understand how the application works.
Logic
- What does each endpoint / API function do?
- What are the parameters for each endpoint?
- What are the server responses for each endpoint?
- Is access control enforced?
→ Through which parameter or mechanism? - What is the HTTP configuration of the endpoint?
- How is this endpoint related to others?
- Does the endpoint interact with other subdomains?
- Is the endpoint linked to other features?
- How is authentication handled?
→ Cookie? JWT? Session token? SSO?
Create a mind map to make things clearer, or simple notes answering questions like:
PUT /api/account - JSON - Updates account data
-> params: userId, email, username, password
-> auth: JWT session cookie
POST /api/account - JSON - Deletes an account
-> params: userId, captcha, disable_account
-> auth: JWT session cookie
This provides a clear overview of each endpoint and its parameters, and helps generate ideas for the exploitation phase.
Do not exploit or inject anything before properly mapping the subdomain.
Without a clear understanding of the application logic, exploitation will be a waste of time and will have a very low success rate, as we’ll be attacking blindly.
All this recon information should be organized per subdomain, using tools like:
- Notion
- Obsidian
- Sublime Text
- Any structured note-taking system
It takes a lot of time.
Disclaimer
All content published on this website is for educational purposes only.
The techniques, tools, and methodologies described here are intended to be used only on systems you own or have explicit permission to test.
I do not encourage or take responsibility for any illegal use of the information provided.