📝 docs: document wget spidering
parent
7645831a09
commit
b62c1424fa
|
@ -3,5 +3,5 @@
|
||||||
/ptth_server.toml
|
/ptth_server.toml
|
||||||
/ptth_relay.toml
|
/ptth_relay.toml
|
||||||
/ptth_build_L6KLMVS6/
|
/ptth_build_L6KLMVS6/
|
||||||
|
/scraper-secret.txt
|
||||||
/target
|
/target
|
||||||
|
|
||||||
|
|
|
@ -61,6 +61,28 @@ e.g. `0..3` means "0, 1, 2, 3". So 100-199 means 199 is the last byte retrieved.
|
||||||
By polling with HEAD and byte range requests, a scraper client can approximate
|
By polling with HEAD and byte range requests, a scraper client can approximate
|
||||||
`tail -f` behavior of a server-side file.
|
`tail -f` behavior of a server-side file.
|
||||||
|
|
||||||
|
`wget --continue --execute robots=off --no-parent --recursive --header "$(<scraper-secret.txt)" $API/v1/server/aliens_wildland/files/crates/`
|
||||||
|
|
||||||
|
Use wget's recursive spidering to download all the files in a folder.
|
||||||
|
The human-friendly HTML interface is exposed through the scraper
|
||||||
|
API, so this will also download the HTML directory listings.
|
||||||
|
|
||||||
|
- `--continue` uses the server's content-length header to skip over
|
||||||
|
files that are already fully downloaded to local disk. Partial
|
||||||
|
downloads will be resumed where they left off, which is fine
|
||||||
|
for long-running log files that may append new data but not
|
||||||
|
modify old data.
|
||||||
|
- `--execute robots=off` disables wget's handling of robots.txt.
|
||||||
|
We know we're a robot, the server doesn't care, it's fine.
|
||||||
|
- `--no-parent` prevents the `../` links from accidentally causing
|
||||||
|
infinite recursion.
|
||||||
|
- `--recursive` causes wget to recurse into individual files, and
|
||||||
|
into subdirectories.
|
||||||
|
- `--header $(<scraper-secret.txt)` tells Bash to load the
|
||||||
|
secret API key from disk and send it to wget. The secret will
|
||||||
|
leak into the process list, but at least it won't leak into
|
||||||
|
your bash_history file.
|
||||||
|
|
||||||
## Problem statement
|
## Problem statement
|
||||||
|
|
||||||
PTTH has 2 auth routes:
|
PTTH has 2 auth routes:
|
||||||
|
|
Loading…
Reference in New Issue