Files
Arkendro/PROTOCOL.md
2025-09-09 22:23:01 +02:00

12 KiB

Gute Einsicht — du hast recht: nur Disk-Hashes allein reichen nicht, der Sync-Client muss die gesamte Objekt-Graph-Struktur (Disk → Partition → RootDir → Dirs → Files → Chunks) erzeugen und der Server muss diese Meta-Objekte speichern/prüfen. Ich aktualisiere das Protokoll so, dass es komplett, binär und KI-implementierbar ist — mit klaren Message-Formaten, Meta-Objekt-Binärlayouts und Pseudocode für Client (was er erzeugen / berechnen muss) und für Server (wie er prüfen & persistieren muss).

Ich halte es pragmatisch: keine Kompression-Extensions, keine Encryption — alles einfach, später erweiterbar.


Kurzüberblick (Was passiert)

  1. Client scannt Disk/Partition, chunked Dateien (4 MiB), berechnet Chunks-Hashes (BLAKE3-256, 32B).
  2. Client baut Binär-Meta-Objekte: FileObj, DirObj, PartitionObj, DiskObj, SnapshotObj (jeweils Body → meta_hash = BLAKE3(body)).
  3. Client spricht Server über TCP an (HELLO → AUTH), fragt in Batches nach fehlenden Chunks/Metas, sendet nur fehlende Chunks/Metas.
  4. Am Ende sendet er das Snapshot-Commit; Server validiert, schreibt Snapshot-Referenz (Server führt Pointer).

Nachrichtengeneralstruktur (Envelopes)

Jede Nachricht: fixer 24-Byte Header + Payload:

struct MsgHeader {
    u8  cmd;               // Befehlscode (siehe Tabelle)
    u8  flags;             // reserved
    u8  reserved[2];
    u8  session_id[16];    // 0..0 bevor AUTH_OK
    u32 payload_len;       // LE
}

Antwort-Nachrichten haben dieselbe Hülle.


Command-Codes (u8)

  • 0x01 HELLO
  • 0x02 HELLO_OK
  • 0x10 AUTH_USERPASS
  • 0x11 AUTH_CODE
  • 0x12 AUTH_OK
  • 0x13 AUTH_FAIL
  • 0x20 BATCH_CHECK_CHUNK
  • 0x21 CHECK_CHUNK_RESP
  • 0x22 SEND_CHUNK
  • 0x23 CHUNK_OK
  • 0x24 CHUNK_FAIL
  • 0x30 BATCH_CHECK_META
  • 0x31 CHECK_META_RESP
  • 0x32 SEND_META
  • 0x33 META_OK
  • 0x34 META_FAIL
  • 0x40 SEND_SNAPSHOT (Snapshot-Commit)
  • 0x41 SNAPSHOT_OK
  • 0x42 SNAPSHOT_FAIL
  • 0xFF CLOSE

Wichtige Designentscheidungen (Kurz)

  • Hashes: BLAKE3-256 (32 Bytes). Client berechnet alle Hashes (Chunks + Meta bodies).
  • Chunks auf Wire: unkomprimiert (einfach & verlässlich). Kompression wäre später Erweiterung.
  • Meta-Objekt-Body: kompakte binäre Strukturen (siehe unten). meta_hash = BLAKE3(body).
  • Batch-Checks: Client fragt in Batches nach fehlenden Chunks/Metas (+ Server liefert nur die fehlenden Hashes zurück). Minimiert RTT.
  • Server persistiert: chunks/<ab>/<cd>/<hash>.chk, meta/<type>/<ab>/<cd>/<hash>.meta. Server verwaltet Snapshot-Pointers (z. B. machines/<client>/snapshots/<id>.ref).
  • Snapshot Commit: Server validiert Objekt-Graph vor Abschluss; falls etwas fehlt, sendet Liste zurück (Snapshot_FAIL mit missing list).

Binary Payload-Formate

Alle mehrteiligen Zähler / Längen sind little-endian (LE).

A) BATCH_CHECK_CHUNK (Client → Server)

payload:
u32 count
for i in 0..count:
  u8[32] chunk_hash

CHECK_CHUNK_RESP (Server → Client)

payload:
u32 missing_count
for i in 0..missing_count:
  u8[32] missing_chunk_hash

SEND_CHUNK (Client → Server)

payload:
u8[32] chunk_hash
u32   size
u8[size] data   // raw chunk bytes

Server computes BLAKE3(data) and compares to chunk_hash; if equal -> speichert.

A) BATCH_CHECK_META

payload:
u32 count
for i in 0..count:
  u8  meta_type   // 1=file,2=dir,3=partition,4=disk,5=snapshot
  u8[32] meta_hash

CHECK_META_RESP

payload:
u32 missing_count
for i in 0..missing_count:
  u8  meta_type
  u8[32] meta_hash

SEND_META

payload:
u8   meta_type      // 1..5
u8[32] meta_hash
u32  body_len
u8[body_len] body_bytes   // the canonical body; server will BLAKE3(body_bytes) and compare to meta_hash

SEND_SNAPSHOT (Commit)

payload:
u8[32] snapshot_hash
u32   body_len
u8[body_len] snapshot_body  // Snapshot body same encoding as meta (server validates body hash == snapshot_hash)

Server validates that snapshot_body references only existing meta objects (recursive / direct check). If OK → creates persistent snapshot pointer and replies SNAPSHOT_OK; if not, reply SNAPSHOT_FAIL with missing list (same format as CHECK_META_RESP).


Meta-Objekt-Binärformate (Bodies)

Client erzeugt body_bytes für jedes Meta-Objekt; meta_hash = BLAKE3(body_bytes).

FileObj (meta_type = 1)

FileObjBody:
u8  version (1)
u32 fs_type_code        // e.g. 1=ext*, 2=ntfs, 3=fat32 (enum)
u64 size
u32 mode               // POSIX mode for linux; 0 for FS without
u32 uid
u32 gid
u64 mtime_unixsec
u32 chunk_count
for i in 0..chunk_count:
  u8[32] chunk_hash
// optional: xattrs/ACLs TLV (not in v1)

DirObj (meta_type = 2)

DirObjBody:
u8 version (1)
u32 entry_count
for each entry:
  u8 entry_type      // 0 = file, 1 = dir, 2 = symlink
  u16 name_len
  u8[name_len] name (UTF-8)
  u8[32] target_meta_hash

PartitionObj (meta_type = 3)

PartitionObjBody:
u8 version (1)
u32 fs_type_code
u8[32] root_dir_hash   // DirObj hash for root of this partition
u64 start_lba
u64 end_lba
u8[16] type_guid       // zeroed if unused

DiskObj (meta_type = 4)

DiskObjBody:
u8 version (1)
u32 partition_count
for i in 0..partition_count:
  u8[32] partition_hash
u64 disk_size_bytes
u16 serial_len
u8[serial_len] serial_bytes

SnapshotObj (meta_type = 5)

SnapshotObjBody:
u8 version (1)
u64 created_at_unixsec
u32 disk_count
for i in 0..disk_count:
  u8[32] disk_hash
// optional: snapshot metadata (user, note) as TLV extension later

Ablauf (Pseudocode) — Client-Seite (Sync-Client)

(Erzeugt alle Hashes; sendet nur fehlendes per Batch)

FUNCTION client_backup(tcp_conn, computer_id, disks):
    send_msg(HELLO{client_type=0, auth_type=0})
    await HELLO_OK

    send_msg(AUTH_USERPASS{username,password})
    resp = await
    if resp != AUTH_OK: abort
    session_id = resp.session_id

    // traverse per-partition to limit memory
    snapshot_disk_hashes = []
    FOR disk IN disks:
        partition_hashes = []
        FOR part IN disk.partitions:
            root_dir_hash = process_dir(part.root_path, tcp_conn)
            part_body = build_partition_body(part.fs_type, root_dir_hash, part.start, part.end, part.guid)
            part_hash = blake3(part_body)
            batch_check_and_send_meta_if_missing(tcp_conn, meta_type=3, [(part_hash,part_body)])
            partition_hashes.append(part_hash)

        disk_body = build_disk_body(partition_hashes, disk.size, disk.serial)
        disk_hash = blake3(disk_body)
        batch_check_and_send_meta_if_missing(tcp_conn, meta_type=4, [(disk_hash,disk_body)])
        snapshot_disk_hashes.append(disk_hash)

    snapshot_body = build_snapshot_body(now(), snapshot_disk_hashes)
    snapshot_hash = blake3(snapshot_body)
    // final TRY: ask server if snapshot can be committed (server will verify)
    send_msg(SEND_SNAPSHOT(snapshot_hash, snapshot_body))
    resp = await
    if resp == SNAPSHOT_OK: success
    else if resp == SNAPSHOT_FAIL: // server returns missing meta list
        // receive missing metas; client should send the remaining missing meta/chunks (loop)
        handle_missing_and_retry()

Hilfsfunktionen:

FUNCTION process_dir(path, tcp_conn):
    entries_meta = []   // list of (name, entry_type, target_hash)
    collect a list meta_to_check_for_this_dir = []
    FOR entry IN readdir(path):
        IF entry.is_file:
            file_hash = process_file(entry.path, tcp_conn)   // below
            entries_meta.append((entry.name, 0, file_hash))
        ELSE IF entry.is_dir:
            subdir_hash = process_dir(entry.path, tcp_conn)
            entries_meta.append((entry.name, 1, subdir_hash))
        ELSE IF symlink:
            symlink_body = build_symlink_body(target)
            symlink_hash = blake3(symlink_body)
            batch_check_and_send_meta_if_missing(tcp_conn, meta_type=1, [(symlink_hash, symlink_body)])
            entries_meta.append((entry.name, 2, symlink_hash))

    dir_body = build_dir_body(entries_meta)
    dir_hash = blake3(dir_body)
    batch_check_and_send_meta_if_missing(tcp_conn, meta_type=2, [(dir_hash,dir_body)])
    RETURN dir_hash
FUNCTION process_file(path, tcp_conn):
    chunk_hashes = []
    FOR each chunk IN read_in_chunks(path, 4*1024*1024):
        chunk_hash = blake3(chunk)
        chunk_hashes.append(chunk_hash)
    // Batch-check chunks for this file
    missing = batch_check_chunks(tcp_conn, chunk_hashes)
    FOR each missing_hash IN missing:
        chunk_bytes = read_chunk_by_hash_from_disk(path, missing_hash) // or buffer earlier
        send_msg(SEND_CHUNK {hash,size,data})
        await CHUNK_OK

    file_body = build_file_body(fs_type, size, mode, uid, gid, mtime, chunk_hashes)
    file_hash = blake3(file_body)
    batch_check_and_send_meta_if_missing(tcp_conn, meta_type=1, [(file_hash,file_body)])
    RETURN file_hash

batch_check_and_send_meta_if_missing:

  • Send BATCH_CHECK_META for all items
  • Server returns list of missing metas
  • For each missing, send SEND_META(meta_type, meta_hash, body)
  • Await META_OK

Bemerkung: batching per directory/file-group reduziert RTT.


Ablauf (Pseudocode) — Server-Seite (Sync-Server)

ON connection:
  read HELLO -> verify allowed client type
  send HELLO_OK OR HELLO_FAIL

ON AUTH_USERPASS:
  validate credentials
  if ok: generate session_id (16B), send AUTH_OK{session_id}
  else send AUTH_FAIL

ON BATCH_CHECK_CHUNK:
  read list of hashes
  missing_list = []
  for hash in hashes:
    if not exists chunks/shard(hash): missing_list.append(hash)
  send CHECK_CHUNK_RESP {missing_list}

ON SEND_CHUNK:
  read chunk_hash, size, data
  computed = blake3(data)
  if computed != chunk_hash: send CHUNK_FAIL{reason} and drop
  else if exists chunk already: send CHUNK_OK
  else: write atomic to chunks/<ab>/<cd>/<hash>.chk and send CHUNK_OK

ON BATCH_CHECK_META:
  similar: check meta/<type>/<hash>.meta exists — return missing list

ON SEND_META:
  verify blake3(body) == meta_hash; if ok write meta/<type>/<ab>/<cd>/<hash>.meta atomically; respond META_OK

ON SEND_SNAPSHOT:
  verify blake3(snapshot_body) == snapshot_hash
  // Validate the object graph:
  missing = validate_graph(snapshot_body) // DFS: disks -> partitions -> dirs -> files -> chunks
  if missing not empty:
    send SNAPSHOT_FAIL {missing (as meta list and/or chunk list)}
  else:
    store snapshot file and create pointer machines/<client_id>/snapshots/<id>.ref
    send SNAPSHOT_OK {snapshot_id}

validate_graph:

  • parse snapshot_body → disk_hashes
  • for each disk_hash check meta exists; load disk meta → for each partition_hash check meta exists … recursively for dir entries -> file metas -> check chunk existence for each chunk_hash. Collect missing set and return.

Verhalten bei SNAPSHOT_FAIL

  • Server liefert fehlende meta/chunk-Hashes.
  • Client sendet diese gezielt (batch) und wiederholt SEND_SNAPSHOT (retry).
  • Alternativ: Client kann beim ersten Versuch inkrementell alle benötigten metas/chunks hochladen (das ist die übliche Reihenfolge dieses Pseudocodes — so fehlt beim Commit nichts mehr).

Speicherung / Pfade (Server intern)

  • chunks/<ab>/<cd>/<hash>.chk (ab = first 2 hex chars; cd = next 2)
  • meta/files/<ab>/<cd>/<hash>.meta
  • meta/dirs/<...>
  • meta/parts/...
  • meta/disks/...
  • meta/snapshots/<snapshot_hash>.meta
  • machines/<client_id>/snapshots/<snapshot_id>.ref (Pointer -> snapshot_hash + timestamp)

Atomic writes: tmp -> rename.


Wichtige Implementations-Hinweise für die KI/Server-Implementierung

  • Batching ist Pflicht: Implementiere BATCH_CHECK_CHUNK & BATCH_CHECK_META effizient (Bitset, HashSet lookups).
  • Limits: begrenze count pro Batch (z. B. 1000) — Client muss chunk lists stückeln.
  • Validation: Server muss auf SEND_SNAPSHOT den Graph validieren (sonst verliert man Konsistenz).
  • Atomic Snapshot Commit: erst persistieren, wenn Graph vollständig vorhanden.
  • SessionID: muss in Header für alle Nachfolgemsgs verwendet werden.
  • Perf: parallelisiere Chunk-Uploads (mehrere TCP-Tasks) und erlaubt Server mehrere parallele Handshakes.
  • Sicherheit: produktiv TLS/TCP oder VPN; Rate-limit / brute-force Schutz; Provisioning-Codes mit TTL.