2020-12-11 21:04:59 +00:00
|
|
|
# Auth route for scrapers
|
|
|
|
|
|
|
|
(Find this issue with `git grep YNQAQKJS`)
|
|
|
|
|
|
|
|
## Problem statement
|
|
|
|
|
|
|
|
PTTH has 2 auth routes:
|
|
|
|
|
|
|
|
- A fixed API key for servers
|
|
|
|
- Whatever the end user puts in front of the HTML client
|
|
|
|
|
|
|
|
"Whatever" is hard for scrapers to deal with. This barrier to scraping
|
|
|
|
is blocking these issues:
|
|
|
|
|
|
|
|
- EOTPXGR3 Remote `tail -f`
|
|
|
|
- UPAQ3ZPT Audit logging of the relay itself
|
|
|
|
- YMFMSV2R Add Prometheus metrics
|
|
|
|
|
|
|
|
## Proposal
|
|
|
|
|
|
|
|
Add a 3rd auth route meeting these criteria:
|
|
|
|
|
|
|
|
- Enabled by a feature flag, disabled by default
|
|
|
|
- Bootstrapped by the user-friendly HTML frontend
|
|
|
|
- Suitable for headless automated scrapers
|
|
|
|
|
|
|
|
It will probably involve an API key like the servers use. Public-key
|
|
|
|
crypto is stronger, but involves more work. I think we should plan to
|
|
|
|
start with something weak, and also plan to deprecate it once something
|
|
|
|
stronger is ready.
|
|
|
|
|
|
|
|
## Proposed impl plan
|
|
|
|
|
2020-12-12 01:26:58 +00:00
|
|
|
- (X) Add feature flags to ptth_relay.toml for dev mode and scrapers
|
2020-12-12 01:53:20 +00:00
|
|
|
- (X) Make sure Docker release CAN build
|
2020-12-12 17:14:10 +00:00
|
|
|
- (X) Add hash of 1 scraper key to ptth_relay.toml, with 1 week expiration
|
2020-12-12 17:50:40 +00:00
|
|
|
- (X) Accept scraper key for some testing endpoint
|
|
|
|
- (X) (POC) Test with curl
|
2020-12-13 01:54:54 +00:00
|
|
|
- (X) Clean up scraper endpoint
|
|
|
|
- ( ) Add end-to-end tests for scraper endpoint
|
2020-12-12 15:10:14 +00:00
|
|
|
- ( ) Manually create SQLite DB for scraper keys, add 1 hash
|
2020-12-12 01:26:58 +00:00
|
|
|
- ( ) Impl DB reads
|
2020-12-12 15:10:14 +00:00
|
|
|
- ( ) Remove scraper key from config file
|
2020-12-12 01:26:58 +00:00
|
|
|
- ( ) Make sure `cargo test` passes and Docker CAN build
|
|
|
|
- ( ) (MVP) Test with curl
|
|
|
|
- ( ) Impl and test DB init / migration
|
|
|
|
- ( ) Impl DB writes (Add / revoke keys) as CLI commands
|
|
|
|
- ( ) Implement API (Behind X-Email auth) for that, test with curl
|
|
|
|
- ( ) Set up mitmproxy or something to add X-Email header in dev env
|
|
|
|
- ( ) Implement web UI (Behind X-Email)
|
2020-12-11 21:04:59 +00:00
|
|
|
|
|
|
|
POC is the proof-of-concept - At this point we will know that in theory the
|
|
|
|
feature can work.
|
|
|
|
|
|
|
|
MVP is the first deployable version - I could put it in prod, manually fudge
|
|
|
|
the SQLite DB to add a 1-month key, and let people start building scrapers.
|
|
|
|
|
|
|
|
Details:
|
|
|
|
|
|
|
|
Dev mode will allow anonymous users to generate scraper keys. In prod mode,
|
|
|
|
(the default) clients will need to have the X-Email header set or use a
|
|
|
|
scraper key to do anything.
|
|
|
|
|
|
|
|
Design the DB so that the servers can share it one day.
|
|
|
|
|
|
|
|
Design the API so that new types of auth / keys can be added one day, and
|
|
|
|
the old ones deprecated.
|
|
|
|
|
|
|
|
## Open questions
|
|
|
|
|
|
|
|
**Who generates the API key? The scraper client, or the PTTH relay server?**
|
|
|
|
|
|
|
|
The precedent from big cloud vendors seems to be that the server generates
|
|
|
|
tokens. This is probably to avoid a situation where clients with vulnerable
|
|
|
|
crypto code or just bad code generate low-entropy keys. By putting that
|
|
|
|
responsibility on the server, the server can enforce high-entropy keys.
|
|
|
|
|
|
|
|
**Should the key rotate? If so, how?**
|
|
|
|
|
|
|
|
The key should _at least_ expire. If it expires every 30 or 90 days, then a
|
|
|
|
human is slightly inconvenienced to service their scraper regularly.
|
|
|
|
|
|
|
|
When adding other features, we must consider the use cases:
|
|
|
|
|
|
|
|
1. A really dumb Bash script that shells out to curl
|
|
|
|
2. A Python script
|
|
|
|
3. A sophisticated desktop app in C#, Rust, or C++
|
|
|
|
4. Eventually replacing the fixed API keys used in ptth_server
|
|
|
|
|
|
|
|
For the Bash script, rotation will probably be difficult, and I'm okay if
|
|
|
|
our support for that is merely "It'll work for 30 days at a time, then you
|
|
|
|
need to rotate keys manually."
|
|
|
|
|
|
|
|
For the Python script, rotation could be automated, but cryptography is
|
|
|
|
still probably difficult. I think some AWS services require actual crypto
|
|
|
|
keys, and not just high-entropy password keys.
|
|
|
|
|
|
|
|
For the sophisticated desktop app, cryptography is on the table, but this
|
|
|
|
is the least likely use case to ever happen, too.
|