📝 docs (YNQAQKJS) add plan for 3rd auth route
parent
c4b12eb806
commit
4014290f98
|
@ -0,0 +1,98 @@
|
|||
# Auth route for scrapers
|
||||
|
||||
(Find this issue with `git grep YNQAQKJS`)
|
||||
|
||||
## Problem statement
|
||||
|
||||
PTTH has 2 auth routes:
|
||||
|
||||
- A fixed API key for servers
|
||||
- Whatever the end user puts in front of the HTML client
|
||||
|
||||
"Whatever" is hard for scrapers to deal with. This barrier to scraping
|
||||
is blocking these issues:
|
||||
|
||||
- EOTPXGR3 Remote `tail -f`
|
||||
- UPAQ3ZPT Audit logging of the relay itself
|
||||
- YMFMSV2R Add Prometheus metrics
|
||||
|
||||
## Proposal
|
||||
|
||||
Add a 3rd auth route meeting these criteria:
|
||||
|
||||
- Enabled by a feature flag, disabled by default
|
||||
- Bootstrapped by the user-friendly HTML frontend
|
||||
- Suitable for headless automated scrapers
|
||||
|
||||
It will probably involve an API key like the servers use. Public-key
|
||||
crypto is stronger, but involves more work. I think we should plan to
|
||||
start with something weak, and also plan to deprecate it once something
|
||||
stronger is ready.
|
||||
|
||||
## Proposed impl plan
|
||||
|
||||
- Add feature flags to ptth_relay.toml for dev mode and scrapers
|
||||
- Make sure Docker release CAN build
|
||||
- Add failing test to block releases
|
||||
- Make sure `cargo test` fails and Docker release can NOT build
|
||||
- Add hard-coded hash of 1 API key, with 1 week expiration
|
||||
- (POC) Test with curl
|
||||
- Manually create SQLite DB for API keys, add 1 hash
|
||||
- Impl DB reads
|
||||
- Remove hard-coded API key
|
||||
- Make sure `cargo test` passes and Docker CAN build
|
||||
- (MVP) Test with curl
|
||||
- Impl and test DB init / migration
|
||||
- Impl DB writes (Add / revoke keys) as CLI commands
|
||||
- Implement API (Behind X-Email auth) for that, test with curl
|
||||
- Set up mitmproxy or something to add X-Email header in dev env
|
||||
- Implement web UI (Behind X-Email)
|
||||
|
||||
POC is the proof-of-concept - At this point we will know that in theory the
|
||||
feature can work.
|
||||
|
||||
MVP is the first deployable version - I could put it in prod, manually fudge
|
||||
the SQLite DB to add a 1-month key, and let people start building scrapers.
|
||||
|
||||
Details:
|
||||
|
||||
Dev mode will allow anonymous users to generate scraper keys. In prod mode,
|
||||
(the default) clients will need to have the X-Email header set or use a
|
||||
scraper key to do anything.
|
||||
|
||||
Design the DB so that the servers can share it one day.
|
||||
|
||||
Design the API so that new types of auth / keys can be added one day, and
|
||||
the old ones deprecated.
|
||||
|
||||
## Open questions
|
||||
|
||||
**Who generates the API key? The scraper client, or the PTTH relay server?**
|
||||
|
||||
The precedent from big cloud vendors seems to be that the server generates
|
||||
tokens. This is probably to avoid a situation where clients with vulnerable
|
||||
crypto code or just bad code generate low-entropy keys. By putting that
|
||||
responsibility on the server, the server can enforce high-entropy keys.
|
||||
|
||||
**Should the key rotate? If so, how?**
|
||||
|
||||
The key should _at least_ expire. If it expires every 30 or 90 days, then a
|
||||
human is slightly inconvenienced to service their scraper regularly.
|
||||
|
||||
When adding other features, we must consider the use cases:
|
||||
|
||||
1. A really dumb Bash script that shells out to curl
|
||||
2. A Python script
|
||||
3. A sophisticated desktop app in C#, Rust, or C++
|
||||
4. Eventually replacing the fixed API keys used in ptth_server
|
||||
|
||||
For the Bash script, rotation will probably be difficult, and I'm okay if
|
||||
our support for that is merely "It'll work for 30 days at a time, then you
|
||||
need to rotate keys manually."
|
||||
|
||||
For the Python script, rotation could be automated, but cryptography is
|
||||
still probably difficult. I think some AWS services require actual crypto
|
||||
keys, and not just high-entropy password keys.
|
||||
|
||||
For the sophisticated desktop app, cryptography is on the table, but this
|
||||
is the least likely use case to ever happen, too.
|
19
todo.md
19
todo.md
|
@ -1,15 +1,20 @@
|
|||
- Estimate bandwidth per server?
|
||||
Interesting issues will get a unique ID with
|
||||
`dd if=/dev/urandom bs=5 count=1 | base32`
|
||||
|
||||
- Report server version in HTML
|
||||
- [YNQAQKJS](issues/2020-12Dec/auth-route-YNQAQKJS.md) Open new auth route for spiders / scrapers
|
||||
- Track / Estimate bandwidth per server?
|
||||
- EOTPXGR3 Remote `tail -f` (_Complicated_) (Maybe use chunked encoding or something?)
|
||||
- "Preview as" feature for Markdown (It's not threaded through the relay yet)
|
||||
- Remote `tail -f` (_Complicated_) (Maybe use chunked encoding or something?)
|
||||
- Make a debug client to replicate the issue Firefox is having with turtling
|
||||
- Add Prometheus metrics
|
||||
- YMFMSV2R Add Prometheus metrics
|
||||
- Not working great behind reverse proxies
|
||||
|
||||
- Impl multi-range / multi-part byte serving
|
||||
- Deny unused HTTP methods for endpoints
|
||||
- ETag cache based on mtime
|
||||
- Server-side hash?
|
||||
- Log / audit log?
|
||||
- UPAQ3ZPT Log / audit log?
|
||||
|
||||
- Prevent directory traversal attacks in file_server.rs
|
||||
- Error handling
|
||||
|
@ -75,7 +80,7 @@ what happens.
|
|||
I might have to build a client that imitates this behavior, since it's hard
|
||||
to control.
|
||||
|
||||
## Server won't work on Windows
|
||||
## Server can't protect its API key on Windows
|
||||
|
||||
This is because I use Unix-specific file permissions to protect the server
|
||||
config.
|
||||
This is because I use a dumb hack with Unix permissions to protect the config
|
||||
file on Linux.
|
||||
|
|
Loading…
Reference in New Issue