Gute Einsicht — du hast recht: **nur Disk-Hashes** allein reichen nicht, der Sync-Client muss die **gesamte Objekt-Graph-Struktur** (Disk → Partition → RootDir → Dirs → Files → Chunks) erzeugen und der Server muss diese Meta-Objekte speichern/prüfen. Ich aktualisiere das Protokoll so, dass es komplett, binär und KI-implementierbar ist — mit klaren Message-Formaten, Meta-Objekt-Binärlayouts und Pseudocode für Client **(was er erzeugen / berechnen muss)** und für Server **(wie er prüfen & persistieren muss)**. Ich halte es pragmatisch: keine Kompression-Extensions, keine Encryption — alles einfach, später erweiterbar. --- # Kurzüberblick (Was passiert) 1. Client scannt Disk/Partition, chunked Dateien (4 MiB), berechnet Chunks-Hashes (BLAKE3-256, 32B). 2. Client baut Binär-Meta-Objekte: FileObj, DirObj, PartitionObj, DiskObj, SnapshotObj (jeweils Body → `meta_hash = BLAKE3(body)`). 3. Client spricht Server über TCP an (HELLO → AUTH), fragt in Batches nach fehlenden Chunks/Metas, sendet nur fehlende Chunks/Metas. 4. Am Ende sendet er das Snapshot-Commit; Server validiert, schreibt Snapshot-Referenz (Server führt Pointer). --- # Nachrichtengeneralstruktur (Envelopes) Jede Nachricht: fixer 24-Byte Header + Payload: ``` struct MsgHeader { u8 cmd; // Befehlscode (siehe Tabelle) u8 flags; // reserved u8 reserved[2]; u8 session_id[16]; // 0..0 bevor AUTH_OK u32 payload_len; // LE } ``` Antwort-Nachrichten haben dieselbe Hülle. --- # Command-Codes (u8) * 0x01 HELLO * 0x02 HELLO_OK * 0x10 AUTH_USERPASS * 0x11 AUTH_CODE * 0x12 AUTH_OK * 0x13 AUTH_FAIL * 0x20 BATCH_CHECK_CHUNK * 0x21 CHECK_CHUNK_RESP * 0x22 SEND_CHUNK * 0x23 CHUNK_OK * 0x24 CHUNK_FAIL * 0x30 BATCH_CHECK_META * 0x31 CHECK_META_RESP * 0x32 SEND_META * 0x33 META_OK * 0x34 META_FAIL * 0x40 SEND_SNAPSHOT (Snapshot-Commit) * 0x41 SNAPSHOT_OK * 0x42 SNAPSHOT_FAIL * 0xFF CLOSE --- # Wichtige Designentscheidungen (Kurz) * **Hashes**: BLAKE3-256 (32 Bytes). Client berechnet alle Hashes (Chunks + Meta bodies). * **Chunks auf Wire**: unkomprimiert (einfach & verlässlich). Kompression wäre später Erweiterung. * **Meta-Objekt-Body**: kompakte binäre Strukturen (siehe unten). `meta_hash = BLAKE3(body)`. * **Batch-Checks**: Client fragt in Batches nach fehlenden Chunks/Metas (+ Server liefert nur die fehlenden Hashes zurück). Minimiert RTT. * **Server persistiert**: `chunks///.chk`, `meta////.meta`. Server verwaltet Snapshot-Pointers (z. B. `machines//snapshots/.ref`). * **Snapshot Commit**: Server validiert Objekt-Graph vor Abschluss; falls etwas fehlt, sendet Liste zurück (Snapshot_FAIL mit missing list). --- # Binary Payload-Formate Alle mehrteiligen Zähler / Längen sind little-endian (`LE`). ## A) BATCH_CHECK_CHUNK (Client → Server) ``` payload: u32 count for i in 0..count: u8[32] chunk_hash ``` ## CHECK_CHUNK_RESP (Server → Client) ``` payload: u32 missing_count for i in 0..missing_count: u8[32] missing_chunk_hash ``` ## SEND_CHUNK (Client → Server) ``` payload: u8[32] chunk_hash u32 size u8[size] data // raw chunk bytes ``` Server computes BLAKE3(data) and compares to chunk_hash; if equal -> speichert. ## A) BATCH_CHECK_META ``` payload: u32 count for i in 0..count: u8 meta_type // 1=file,2=dir,3=partition,4=disk,5=snapshot u8[32] meta_hash ``` ## CHECK_META_RESP ``` payload: u32 missing_count for i in 0..missing_count: u8 meta_type u8[32] meta_hash ``` ## SEND_META ``` payload: u8 meta_type // 1..5 u8[32] meta_hash u32 body_len u8[body_len] body_bytes // the canonical body; server will BLAKE3(body_bytes) and compare to meta_hash ``` ## SEND_SNAPSHOT (Commit) ``` payload: u8[32] snapshot_hash u32 body_len u8[body_len] snapshot_body // Snapshot body same encoding as meta (server validates body hash == snapshot_hash) ``` Server validates that snapshot_body references only existing meta objects (recursive / direct check). If OK → creates persistent snapshot pointer and replies SNAPSHOT_OK; if not, reply SNAPSHOT_FAIL with missing list (same format as CHECK_META_RESP). --- # Meta-Objekt-Binärformate (Bodies) > Client erzeugt `body_bytes` für jedes Meta-Objekt; `meta_hash = BLAKE3(body_bytes)`. ### FileObj (meta_type = 1) ``` FileObjBody: u8 version (1) u32 fs_type_code // e.g. 1=ext*, 2=ntfs, 3=fat32 (enum) u64 size u32 mode // POSIX mode for linux; 0 for FS without u32 uid u32 gid u64 mtime_unixsec u32 chunk_count for i in 0..chunk_count: u8[32] chunk_hash // optional: xattrs/ACLs TLV (not in v1) ``` ### DirObj (meta_type = 2) ``` DirObjBody: u8 version (1) u32 entry_count for each entry: u8 entry_type // 0 = file, 1 = dir, 2 = symlink u16 name_len u8[name_len] name (UTF-8) u8[32] target_meta_hash ``` ### PartitionObj (meta_type = 3) ``` PartitionObjBody: u8 version (1) u32 fs_type_code u8[32] root_dir_hash // DirObj hash for root of this partition u64 start_lba u64 end_lba u8[16] type_guid // zeroed if unused ``` ### DiskObj (meta_type = 4) ``` DiskObjBody: u8 version (1) u32 partition_count for i in 0..partition_count: u8[32] partition_hash u64 disk_size_bytes u16 serial_len u8[serial_len] serial_bytes ``` ### SnapshotObj (meta_type = 5) ``` SnapshotObjBody: u8 version (1) u64 created_at_unixsec u32 disk_count for i in 0..disk_count: u8[32] disk_hash // optional: snapshot metadata (user, note) as TLV extension later ``` --- # Ablauf (Pseudocode) — **Client-Seite (Sync-Client)** (Erzeugt alle Hashes; sendet nur fehlendes per Batch) ```text FUNCTION client_backup(tcp_conn, computer_id, disks): send_msg(HELLO{client_type=0, auth_type=0}) await HELLO_OK send_msg(AUTH_USERPASS{username,password}) resp = await if resp != AUTH_OK: abort session_id = resp.session_id // traverse per-partition to limit memory snapshot_disk_hashes = [] FOR disk IN disks: partition_hashes = [] FOR part IN disk.partitions: root_dir_hash = process_dir(part.root_path, tcp_conn) part_body = build_partition_body(part.fs_type, root_dir_hash, part.start, part.end, part.guid) part_hash = blake3(part_body) batch_check_and_send_meta_if_missing(tcp_conn, meta_type=3, [(part_hash,part_body)]) partition_hashes.append(part_hash) disk_body = build_disk_body(partition_hashes, disk.size, disk.serial) disk_hash = blake3(disk_body) batch_check_and_send_meta_if_missing(tcp_conn, meta_type=4, [(disk_hash,disk_body)]) snapshot_disk_hashes.append(disk_hash) snapshot_body = build_snapshot_body(now(), snapshot_disk_hashes) snapshot_hash = blake3(snapshot_body) // final TRY: ask server if snapshot can be committed (server will verify) send_msg(SEND_SNAPSHOT(snapshot_hash, snapshot_body)) resp = await if resp == SNAPSHOT_OK: success else if resp == SNAPSHOT_FAIL: // server returns missing meta list // receive missing metas; client should send the remaining missing meta/chunks (loop) handle_missing_and_retry() ``` Hilfsfunktionen: ```text FUNCTION process_dir(path, tcp_conn): entries_meta = [] // list of (name, entry_type, target_hash) collect a list meta_to_check_for_this_dir = [] FOR entry IN readdir(path): IF entry.is_file: file_hash = process_file(entry.path, tcp_conn) // below entries_meta.append((entry.name, 0, file_hash)) ELSE IF entry.is_dir: subdir_hash = process_dir(entry.path, tcp_conn) entries_meta.append((entry.name, 1, subdir_hash)) ELSE IF symlink: symlink_body = build_symlink_body(target) symlink_hash = blake3(symlink_body) batch_check_and_send_meta_if_missing(tcp_conn, meta_type=1, [(symlink_hash, symlink_body)]) entries_meta.append((entry.name, 2, symlink_hash)) dir_body = build_dir_body(entries_meta) dir_hash = blake3(dir_body) batch_check_and_send_meta_if_missing(tcp_conn, meta_type=2, [(dir_hash,dir_body)]) RETURN dir_hash ``` ```text FUNCTION process_file(path, tcp_conn): chunk_hashes = [] FOR each chunk IN read_in_chunks(path, 4*1024*1024): chunk_hash = blake3(chunk) chunk_hashes.append(chunk_hash) // Batch-check chunks for this file missing = batch_check_chunks(tcp_conn, chunk_hashes) FOR each missing_hash IN missing: chunk_bytes = read_chunk_by_hash_from_disk(path, missing_hash) // or buffer earlier send_msg(SEND_CHUNK {hash,size,data}) await CHUNK_OK file_body = build_file_body(fs_type, size, mode, uid, gid, mtime, chunk_hashes) file_hash = blake3(file_body) batch_check_and_send_meta_if_missing(tcp_conn, meta_type=1, [(file_hash,file_body)]) RETURN file_hash ``` `batch_check_and_send_meta_if_missing`: * Send BATCH_CHECK_META for all items * Server returns list of missing metas * For each missing, send SEND_META(meta_type, meta_hash, body) * Await META_OK Bemerkung: batching per directory/file-group reduziert RTT. --- # Ablauf (Pseudocode) — **Server-Seite (Sync-Server)** ```text ON connection: read HELLO -> verify allowed client type send HELLO_OK OR HELLO_FAIL ON AUTH_USERPASS: validate credentials if ok: generate session_id (16B), send AUTH_OK{session_id} else send AUTH_FAIL ON BATCH_CHECK_CHUNK: read list of hashes missing_list = [] for hash in hashes: if not exists chunks/shard(hash): missing_list.append(hash) send CHECK_CHUNK_RESP {missing_list} ON SEND_CHUNK: read chunk_hash, size, data computed = blake3(data) if computed != chunk_hash: send CHUNK_FAIL{reason} and drop else if exists chunk already: send CHUNK_OK else: write atomic to chunks///.chk and send CHUNK_OK ON BATCH_CHECK_META: similar: check meta//.meta exists — return missing list ON SEND_META: verify blake3(body) == meta_hash; if ok write meta////.meta atomically; respond META_OK ON SEND_SNAPSHOT: verify blake3(snapshot_body) == snapshot_hash // Validate the object graph: missing = validate_graph(snapshot_body) // DFS: disks -> partitions -> dirs -> files -> chunks if missing not empty: send SNAPSHOT_FAIL {missing (as meta list and/or chunk list)} else: store snapshot file and create pointer machines//snapshots/.ref send SNAPSHOT_OK {snapshot_id} ``` `validate_graph`: * parse snapshot_body → disk_hashes * for each disk_hash check meta exists; load disk meta → for each partition_hash check meta exists … recursively for dir entries -> file metas -> check chunk existence for each chunk_hash. Collect missing set and return. --- # Verhalten bei `SNAPSHOT_FAIL` * Server liefert fehlende meta/chunk-Hashes. * Client sendet diese gezielt (batch) und wiederholt `SEND_SNAPSHOT` (retry). * Alternativ: Client kann beim ersten Versuch inkrementell alle benötigten metas/chunks hochladen (das ist die übliche Reihenfolge dieses Pseudocodes — so fehlt beim Commit nichts mehr). --- # Speicherung / Pfade (Server intern) * `chunks///.chk` (ab = first 2 hex chars; cd = next 2) * `meta/files///.meta` * `meta/dirs/<...>` * `meta/parts/...` * `meta/disks/...` * `meta/snapshots/.meta` * `machines//snapshots/.ref` (Pointer -> snapshot_hash + timestamp) Atomic writes: `tmp -> rename`. --- # Wichtige Implementations-Hinweise für die KI/Server-Implementierung * **Batching ist Pflicht**: Implementiere `BATCH_CHECK_CHUNK` & `BATCH_CHECK_META` effizient (Bitset, HashSet lookups). * **Limits**: begrenze `count` pro Batch (z. B. 1000) — Client muss chunk lists stückeln. * **Validation:** Server muss auf `SEND_SNAPSHOT` den Graph validieren (sonst verliert man Konsistenz). * **Atomic Snapshot Commit:** erst persistieren, wenn Graph vollständig vorhanden. * **SessionID**: muss in Header für alle Nachfolgemsgs verwendet werden. * **Perf:** parallelisiere Chunk-Uploads (mehrere TCP-Tasks) und erlaubt Server mehrere parallele Handshakes. * **Sicherheit:** produktiv TLS/TCP oder VPN; Rate-limit / brute-force Schutz; Provisioning-Codes mit TTL.