📝 docs: plan remaining tasks on scraper API

main
_ 2020-12-13 05:04:04 +00:00
parent 4c52d88be0
commit 78bffc74c3
2 changed files with 46 additions and 35 deletions

View File

@ -5,47 +5,18 @@ An HTTP server that can run behind a firewall by connecting out to a relay.
``` ```
Outside the tunnel Outside the tunnel
+--------+ +------------+ +-------------+ +--------+ +------------+ +-------------+
| Client | ------> | PTTH relay | <----- | PTTH server | | Client | >>> | PTTH relay | <<< | PTTH server |
+--------+ +------------+ +-------------+ +--------+ +------------+ +-------------+
Inside the tunnel Inside the tunnel
+--------+ -------------- +-------------+ +--------+ -------------- +-------------+
| Client | ----------------------------> | Server | | Client | >>> >>> >>> | Server |
+--------+ -------------- +-------------+ +--------+ -------------- +-------------+
``` ```
The server can run behind a firewall, because it is actually a special HTTP The server can run behind a firewall, because it is actually a special HTTP
client. client.
## Glossary
(sorted alphabetically)
- **Backend API** - The HTTP API that ptth_server uses to establish the tunnel.
Noted in the code with the cookie "7ZSFUKGV".
- **Client** - Any client that connects to ptth_relay in order to reach a
destination server. Admins must terminate TLS between
ptth_relay and all clients.
- **Frontend** - The human-friendly, browser-friendly HTTP+HTML interface
that ptth_relay serves directly or relays from ptth_server.
This interface has no auth by default. Admins must provide their own auth
in front of ptth_relay. OAuth2 is recommended.
- **ptth_file_server** - A standalone file server. It uses the same code
as ptth_server, so production environments don't need it.
- **ptth_relay** or **Relay server** - The ptth_relay app. This must run on a server
that can accept incoming HTTP connections.
- **ptth_server** or **Destination server** - The ptth_server app. This should run behind
a firewall. It will connect out to the relay and accept incoming connections
through the PTTH tunnel.
- **Scraper API** - An optional HTTP API for scraper clients to access ptth_relay and
the destination servers using machine-friendly auth.
- **Tripcode** - The base64 hash of a server's private API key. When adding
a new server, the tripcode must be copied to ptth_relay.toml on the relay
server.
- **Tunnel** - The reverse HTTP tunnel between ptth_relay and ptth_server.
ptth_server connects out to ptth_relay, then ptth_relay forwards incoming
connections to ptth_server through the tunnel.
## Configuration ## Configuration
ptth_server: ptth_server:
@ -109,6 +80,35 @@ proxy_request_buffering off;
proxy_buffering off; proxy_buffering off;
``` ```
## Glossary
(sorted alphabetically)
- **Backend API** - The HTTP API that ptth_server uses to establish the tunnel.
Noted in the code with the cookie "7ZSFUKGV".
- **Client** - Any client that connects to ptth_relay in order to reach a
destination server. Admins must terminate TLS between
ptth_relay and all clients.
- **Frontend** - The human-friendly, browser-friendly HTTP+HTML interface
that ptth_relay serves directly or relays from ptth_server.
This interface has no auth by default. Admins must provide their own auth
in front of ptth_relay. OAuth2 is recommended.
- **ptth_file_server** - A standalone file server. It uses the same code
as ptth_server, so production environments don't need it.
- **ptth_relay** or **Relay server** - The ptth_relay app. This must run on a server
that can accept incoming HTTP connections.
- **ptth_server** or **Destination server** - The ptth_server app. This should run behind
a firewall. It will connect out to the relay and accept incoming connections
through the PTTH tunnel.
- **Scraper API** - An optional HTTP API for scraper clients to access ptth_relay and
the destination servers using machine-friendly auth.
- **Tripcode** - The base64 hash of a server's private API key. When adding
a new server, the tripcode must be copied to ptth_relay.toml on the relay
server.
- **Tunnel** - The reverse HTTP tunnel between ptth_relay and ptth_server.
ptth_server connects out to ptth_relay, then ptth_relay forwards incoming
connections to ptth_server through the tunnel.
## Comparison with normal HTTP ## Comparison with normal HTTP
Normal HTTP: Normal HTTP:

View File

@ -38,6 +38,7 @@ stronger is ready.
- (X) (POC) Test with curl - (X) (POC) Test with curl
- (X) Clean up scraper endpoint - (X) Clean up scraper endpoint
- (X) Add (almost) end-to-end tests for scraper endpoint - (X) Add (almost) end-to-end tests for scraper endpoint
- ( ) Add real scraper endpoints
- ( ) Manually create SQLite DB for scraper keys, add 1 hash - ( ) Manually create SQLite DB for scraper keys, add 1 hash
- ( ) Impl DB reads - ( ) Impl DB reads
- ( ) Remove scraper key from config file - ( ) Remove scraper key from config file
@ -66,6 +67,16 @@ Design the DB so that the servers can share it one day.
Design the API so that new types of auth / keys can be added one day, and Design the API so that new types of auth / keys can be added one day, and
the old ones deprecated. the old ones deprecated.
Endpoints needed:
- Query server list
- Query directory in server
- GET file with byte range (identical to frontend file API)
These will all be JSON for now since Python, Rust, C++, C#, etc. can handle it.
For compatibility with wget spidering, I _might_ do XML or HTML that's
machine-readable. We'll see.
## Open questions ## Open questions
**Who generates the API key? The scraper client, or the PTTH relay server?** **Who generates the API key? The scraper client, or the PTTH relay server?**