# Auth route for scrapers (Find this issue with `git grep YNQAQKJS`) ## Problem statement PTTH has 2 auth routes: - A fixed API key for servers - Whatever the end user puts in front of the HTML client "Whatever" is hard for scrapers to deal with. This barrier to scraping is blocking these issues: - EOTPXGR3 Remote `tail -f` - UPAQ3ZPT Audit logging of the relay itself - YMFMSV2R Add Prometheus metrics ## Proposal Add a 3rd auth route meeting these criteria: - Enabled by a feature flag, disabled by default - Bootstrapped by the user-friendly HTML frontend - Suitable for headless automated scrapers It will probably involve an API key like the servers use. Public-key crypto is stronger, but involves more work. I think we should plan to start with something weak, and also plan to deprecate it once something stronger is ready. ## Proposed impl plan - Add feature flags to ptth_relay.toml for dev mode and scrapers - Make sure Docker release CAN build - Add failing test to block releases - Make sure `cargo test` fails and Docker release can NOT build - Add hard-coded hash of 1 API key, with 1 week expiration - (POC) Test with curl - Manually create SQLite DB for API keys, add 1 hash - Impl DB reads - Remove hard-coded API key - Make sure `cargo test` passes and Docker CAN build - (MVP) Test with curl - Impl and test DB init / migration - Impl DB writes (Add / revoke keys) as CLI commands - Implement API (Behind X-Email auth) for that, test with curl - Set up mitmproxy or something to add X-Email header in dev env - Implement web UI (Behind X-Email) POC is the proof-of-concept - At this point we will know that in theory the feature can work. MVP is the first deployable version - I could put it in prod, manually fudge the SQLite DB to add a 1-month key, and let people start building scrapers. Details: Dev mode will allow anonymous users to generate scraper keys. In prod mode, (the default) clients will need to have the X-Email header set or use a scraper key to do anything. Design the DB so that the servers can share it one day. Design the API so that new types of auth / keys can be added one day, and the old ones deprecated. ## Open questions **Who generates the API key? The scraper client, or the PTTH relay server?** The precedent from big cloud vendors seems to be that the server generates tokens. This is probably to avoid a situation where clients with vulnerable crypto code or just bad code generate low-entropy keys. By putting that responsibility on the server, the server can enforce high-entropy keys. **Should the key rotate? If so, how?** The key should _at least_ expire. If it expires every 30 or 90 days, then a human is slightly inconvenienced to service their scraper regularly. When adding other features, we must consider the use cases: 1. A really dumb Bash script that shells out to curl 2. A Python script 3. A sophisticated desktop app in C#, Rust, or C++ 4. Eventually replacing the fixed API keys used in ptth_server For the Bash script, rotation will probably be difficult, and I'm okay if our support for that is merely "It'll work for 30 days at a time, then you need to rotate keys manually." For the Python script, rotation could be automated, but cryptography is still probably difficult. I think some AWS services require actual crypto keys, and not just high-entropy password keys. For the sophisticated desktop app, cryptography is on the table, but this is the least likely use case to ever happen, too.