Use case: Run a ZIPNet DC‑net node (anytrust server + optional aggregator) in a TEE
What ZIPNet is
ZIPNet is a DC‑net–style anonymous broadcast system introduced in ZIPNet: Low-bandwidth anonymous broadcast from (dis)Trusted Execution Environments. It supports many anytrust servers while offloading inbound client aggregation to untrusted relays/aggregators. Server work is symmetric crypto (PRF via AES‑CTR + XOR) and simple signatures; no MPC on the hot path. In the paper’s eval ZIPNet cuts server runtime by 4.2×–7.6× vs prior art (CPU Blinder) and makes cover traffic cheap—≈84 bytes of extra server bandwidth per non‑talking client per round (with 160 B messages and 1,024 talkers). Privacy does not depend on TEEs; TEEs are used for DoS resistance and integrity (“falsifiable trust”).
Software / reproducibility.
- Reference implementation (paper): Rust; client in Intel SGX (Teaclave SGX SDK); aggregator & servers in Rust using aes-ctr, hkdf, x25519-dalek, ed25519-dalek; servers use AES‑NI.
- Community WIP Go implementation: interfaces + load gen: GitHub - Ruteri/go-zipnet: WIP zipnet protocol implementation (includes aggregator hierarchy knobs and an example config).
- Baselines used by the paper: CPU Blinder (repo: cryptobiu/MPCAnonymousBloging) and OrgAn (repo: zhtluo/organ).
Why a TEE is needed for this workload
- DoS & integrity enforcement on clients: client TEEs enforce per‑round participation limits & message format; violations are detectable. (Privacy holds even if the client TEE fails.)
- Optional hardening for servers: running anytrust servers in a TEE adds attested integrity and key protection without changing the anytrust privacy model.
Parameters & notation
Let:
- M = registered clients, Mᵣ = participants this round (talkers + cover),
- N = talkers per round, |m| = message size (bytes),
- B = N · |m| = broadcast payload per round (bytes),
- S = size of the schedule vector (bytes), i.e., #sched_slots × footprint_bytes (e.g., the Go repo shows SchedulingSlots=4096, FootprintBits=64 ⇒ S≈32 KiB),
- N_S = number of anytrust servers.
Message sizes used in the paper’s eval: ~400 B (Bitcoin), ~108 B (Ethereum), ~2 KB (Zcash), ~2.38 KB (Monero), and 160 B for microblog examples.
Role A — Anytrust server
Hot path per round (Algorithm 3).
For each userPK that participated this round (the aggregator sends userPKs, a schedule aggregate, and a message aggregate), the server:
- derives (pad1, pad2) = KDF(shared_secret[userPK], round, publishedSchedule),
- XORs pad1 into the schedule aggregate and pad2 into the message aggregate, then
- signs the output. Work scales with Mᵣ and (S+B).
Compute model (what to measure):
- PRF expansion bytes/round ≈ Mᵣ · (S + B) bytes (AES‑CTR output).
- XOR bytes/round ≈ Mᵣ · (S + B) (pads) + ~B (final combine).
- KDF ops/round: Mᵣ HKDFs (or your PRF’s equivalent).
- Sig verify/sign: 1 verify (aggregator) + 1 sign per round (e.g., Ed25519).Measure AES‑CTR GB/s (in‑TEE), XOR GB/s, HKDF/s, Ed25519 verify/s. (The paper’s reference server used AES‑NI to accelerate the PRNG/CTR expansion.)
Memory:
- State: O(M) shared‑secret/ratchet entries (e.g., 32‑byte keys + small metadata) plus small sealed state if you run inside a TEE.
- Buffers: O(B) for the broadcast vector(s) and O(S) for the schedule vector. (With N=1,024 and |m|=160 B, B≈160 KiB.)
Networking (empirically validated shape):
- Ingress per server per round: ≈ B + S + IDs + signature—the paper reports ~535,607 B at N=1,024 talkers & 0 cover, plus ~84 B per additional non‑talker (cover) at 160 B messages. So with 1,024 talkers and total clients 8,000 (i.e., 6,976 cover), a server ingests ≈ 1,123,952 B/round.
- Egress: one signed final broadcast (≈B + small metadata); inter‑server traffic is small (“a single broadcast message”).
Latency (from the paper’s WAN runs): see “End‑to‑end round time” below; server time rises roughly linearly with cover (Mᵣ↑, B fixed) and quadratically with talkers (B = N·|m| grows with N).
What to publish (server microbench & round metrics):
- AES‑CTR GB/s and XOR GB/s inside the TEE on (S+B)‑sized buffers.
- Per‑round CPU% and P99 vs N, |m|, Mᵣ, N_S.
- Bandwidth per round vs talkers (slope ≈|m|) and vs cover (≈84 B/non‑talker at 160 B, 1,024 talkers—re‑check for your S/encoding).
Role B — Aggregator (untrusted; optional TEE for ops assurance)
Hot path per round (Algorithm 2).
Validate signature, XOR client (or lower‑tier aggregate) payloads into a single schedule vector and a single B‑byte message vector, sign the running aggregate. It’s XOR + bookkeeping.
Scaling & what to measure:
- Runtime is linear in payload size B for fixed #clients; insensitive to the count of clients when B is fixed (it’s memory‑bandwidth bound). Publish XOR GiB/s vs B and P99 contribution under realistic WAN RTTs.
Networking:
- Ingress: many small client packets (or aggregates).
- Egress: per round, a single aggregate to each server; with one (root) aggregator this is ~N_S × (B + S + IDs + sig). (The Go implementation exposes hierarchy levels if you want a tree.)
Role C — Client TEE (mandatory TEE role)
Hot path per round (Algorithm 1 in‑enclave):
- Rate‑limit/format: compute footprint slot + attestation tag;
- Pad & inject: PRF to create slot pad; XOR message or zeros (cover);
- Ratchet: HKDF update. Client runtime scales with N_S (OTP length B per server) and |m|. In evals, client runtime stayed sub‑second for the small message sizes considered. Measure AES‑CTR/HKDF GB/s, Ed25519 sign/s, x25519 handshakes/s.
State model: sealed state only; no enclave monotonic counters or trusted timers required (fits “falsifiable trust”).
Concrete sizing examples (to anchor hardware targets)
Example 1 (paper’s table): N=1,024 talkers, |m|=160 B ⇒ B≈160 KiB .
Per server, bandwidth/round (ingress + small control) was:
- 0 cover (Mᵣ=1,024): ~535,607 B
- 8,000 total clients (6,976 cover): ~1,123,952 B (≈ 1.12 MB/round).At 1 s rounds, budget ≈ 1.12 MB/s per server for this channel.
Example 2 (aggregator egress sizing): same scenario, N_S=10 servers ⇒ root aggregator egress ≈ 10 × 1.12 MB ≈ 11.2 MB/round (≈ 90 Mb/s at 1 s rounds), plus client ingress. (Adjust if S or ID encoding changes.)
Message sizes to provision for (paper): ~400 B (Bitcoin), ~108 B (Ethereum), ~2 KB (Zcash), ~2.38 KB (Monero). Use these to sweep B = N·|m|.
Relative compute vs Blinder: ZIPNet shows 4.2×–7.6× lower server runtime than CPU Blinder on the same hardware (WAN), with 5–10 servers in their setup. Hardware should favor symmetric‑crypto throughput & memory bandwidth, not big‑integer/MPC accelerators.
End‑to‑end latency target (what to report)
Report total round time (aggregator → servers → final broadcast) under WAN and LAN placements, sweeping Mᵣ, N, |m|, N_S. The paper’s Figure 5 shows end‑to‑end round time rising with users and only slightly exceeding the sum of aggregator+server microbenches (due to network fan‑out). Publish your target round duration alongside microbench results.
What we ask from “trust‑minimized TEE” hardware for this workload
Crypto + memory:
- High‑throughput AES‑CTR (or a fast PRF) and large XOR bandwidth inside the TEE; strong DRBG; HKDF; constant‑time primitives. (The reference uses AES‑NI; similar hardware offload on ARM/others is desirable.)
Enclave I/O:
Low‑overhead, high‑pps secure I/O so the server/aggregator can process bursty (B+S) payloads entirely inside enclave hot paths (avoid frequent exits).
Attestation:
Cheap remote attestation at client scale; ability to verify peer attestation in‑enclave (client→aggregator/server). Keep sealed storage simple (no monotonic counters).
NIC targets:
- Aggregator: sustain ~N_S × (B+S+IDs) egress per round (plus client ingress).
- Servers: sustain ≈(B+S+IDs) ingress and ≈B broadcast egress per round, plus ~84 B per non‑talker (for the 160 B/1,024‑talker configuration; re‑measure under your encoding).
Benchmarks we care about
- End‑to‑end round time (WAN/LAN): aggregator→servers→final broadcast; sweep Mᵣ, N, |m|, N_S. Compare measured round time to the sum of your server/aggregator microbenches to surface I/O overhead.
- TEE microbenches (server & client):
- AES‑CTR GB/s on buffers of sizes B and S,
- XOR GB/s on the same sizes,
- HKDF keys/s , Ed25519 verify/s , x25519 handshakes/s .Structure loops to match Algorithm 3 (server) and Algorithm 1 (client).
- Bandwidth scaling (server): re‑plot bytes/round vs cover set size to validate the ~84 B per extra non‑talker slope at your message size and encoding; re‑plot vs talkers to show linear growth with slope ≈ |m| and your constant terms. (Replicate Table 3 with your stack.)
- Aggregator scaling: XOR‑only runtime vs B and egress fan‑out cost vs N_S; report P99 contribution to round time under WAN RTTs.