From 4014290f98cf3d0755aaf587f3ec881b676e9eb5 Mon Sep 17 00:00:00 2001 From: _ <> Date: Fri, 11 Dec 2020 21:04:59 +0000 Subject: [PATCH] :pencil: docs (YNQAQKJS) add plan for 3rd auth route --- issues/2020-12Dec/auth-route-YNQAQKJS.md | 98 ++++++++++++++++++++++++ todo.md | 19 +++-- 2 files changed, 110 insertions(+), 7 deletions(-) create mode 100644 issues/2020-12Dec/auth-route-YNQAQKJS.md diff --git a/issues/2020-12Dec/auth-route-YNQAQKJS.md b/issues/2020-12Dec/auth-route-YNQAQKJS.md new file mode 100644 index 0000000..2561d30 --- /dev/null +++ b/issues/2020-12Dec/auth-route-YNQAQKJS.md @@ -0,0 +1,98 @@ +# Auth route for scrapers + +(Find this issue with `git grep YNQAQKJS`) + +## Problem statement + +PTTH has 2 auth routes: + +- A fixed API key for servers +- Whatever the end user puts in front of the HTML client + +"Whatever" is hard for scrapers to deal with. This barrier to scraping +is blocking these issues: + +- EOTPXGR3 Remote `tail -f` +- UPAQ3ZPT Audit logging of the relay itself +- YMFMSV2R Add Prometheus metrics + +## Proposal + +Add a 3rd auth route meeting these criteria: + +- Enabled by a feature flag, disabled by default +- Bootstrapped by the user-friendly HTML frontend +- Suitable for headless automated scrapers + +It will probably involve an API key like the servers use. Public-key +crypto is stronger, but involves more work. I think we should plan to +start with something weak, and also plan to deprecate it once something +stronger is ready. + +## Proposed impl plan + +- Add feature flags to ptth_relay.toml for dev mode and scrapers +- Make sure Docker release CAN build +- Add failing test to block releases +- Make sure `cargo test` fails and Docker release can NOT build +- Add hard-coded hash of 1 API key, with 1 week expiration +- (POC) Test with curl +- Manually create SQLite DB for API keys, add 1 hash +- Impl DB reads +- Remove hard-coded API key +- Make sure `cargo test` passes and Docker CAN build +- (MVP) Test with curl +- Impl and test DB init / migration +- Impl DB writes (Add / revoke keys) as CLI commands +- Implement API (Behind X-Email auth) for that, test with curl +- Set up mitmproxy or something to add X-Email header in dev env +- Implement web UI (Behind X-Email) + +POC is the proof-of-concept - At this point we will know that in theory the +feature can work. + +MVP is the first deployable version - I could put it in prod, manually fudge +the SQLite DB to add a 1-month key, and let people start building scrapers. + +Details: + +Dev mode will allow anonymous users to generate scraper keys. In prod mode, +(the default) clients will need to have the X-Email header set or use a +scraper key to do anything. + +Design the DB so that the servers can share it one day. + +Design the API so that new types of auth / keys can be added one day, and +the old ones deprecated. + +## Open questions + +**Who generates the API key? The scraper client, or the PTTH relay server?** + +The precedent from big cloud vendors seems to be that the server generates +tokens. This is probably to avoid a situation where clients with vulnerable +crypto code or just bad code generate low-entropy keys. By putting that +responsibility on the server, the server can enforce high-entropy keys. + +**Should the key rotate? If so, how?** + +The key should _at least_ expire. If it expires every 30 or 90 days, then a +human is slightly inconvenienced to service their scraper regularly. + +When adding other features, we must consider the use cases: + +1. A really dumb Bash script that shells out to curl +2. A Python script +3. A sophisticated desktop app in C#, Rust, or C++ +4. Eventually replacing the fixed API keys used in ptth_server + +For the Bash script, rotation will probably be difficult, and I'm okay if +our support for that is merely "It'll work for 30 days at a time, then you +need to rotate keys manually." + +For the Python script, rotation could be automated, but cryptography is +still probably difficult. I think some AWS services require actual crypto +keys, and not just high-entropy password keys. + +For the sophisticated desktop app, cryptography is on the table, but this +is the least likely use case to ever happen, too. diff --git a/todo.md b/todo.md index 02488b8..da42c81 100644 --- a/todo.md +++ b/todo.md @@ -1,15 +1,20 @@ -- Estimate bandwidth per server? +Interesting issues will get a unique ID with +`dd if=/dev/urandom bs=5 count=1 | base32` + +- Report server version in HTML +- [YNQAQKJS](issues/2020-12Dec/auth-route-YNQAQKJS.md) Open new auth route for spiders / scrapers +- Track / Estimate bandwidth per server? +- EOTPXGR3 Remote `tail -f` (_Complicated_) (Maybe use chunked encoding or something?) - "Preview as" feature for Markdown (It's not threaded through the relay yet) -- Remote `tail -f` (_Complicated_) (Maybe use chunked encoding or something?) - Make a debug client to replicate the issue Firefox is having with turtling -- Add Prometheus metrics +- YMFMSV2R Add Prometheus metrics - Not working great behind reverse proxies - Impl multi-range / multi-part byte serving - Deny unused HTTP methods for endpoints - ETag cache based on mtime - Server-side hash? -- Log / audit log? +- UPAQ3ZPT Log / audit log? - Prevent directory traversal attacks in file_server.rs - Error handling @@ -75,7 +80,7 @@ what happens. I might have to build a client that imitates this behavior, since it's hard to control. -## Server won't work on Windows +## Server can't protect its API key on Windows -This is because I use Unix-specific file permissions to protect the server -config. +This is because I use a dumb hack with Unix permissions to protect the config +file on Linux.