Data Lifecycle
The complete journey of a dataset — from a file on your computer to a trained model in a researcher's hands.
Data Lifecycle
Five phases. Here's every step a dataset takes from upload to model delivery in the current demo, plus the production upgrade path for confidential compute.
Phase 1: Publish
Actor: Dataset Owner (browser)
- Publisher selects a JSONL file
- Browser generates a random AES-256-GCM key
- Dataset is encrypted client-side → encrypted blob
- AES key is ECIES-sealed with the Orchestrator's public key → key envelope
- Encrypted blob + envelope uploaded to 0G Storage →
datasetRoot(Merkle root) returned - Policy manifest JSON uploaded to 0G Storage →
manifestHashreturned - Publisher signs a transaction calling
registerDataset(datasetRoot, manifestHash, policyParams)on the DataPolicy contract
State after phase: Dataset is on-chain. The owner's full usage policy is enforced. Plaintext has never touched LICEN's servers.
Phase 2: Discovery
Actor: Envio Indexer + Marketplace UI
- Envio detects
DatasetRegistered(datasetRoot, owner, manifestHash)event - Indexes the event into GraphQL
- Marketplace UI hydrates dataset cards from Envio GraphQL
State after phase: Dataset is visible to researchers in the marketplace, with live on-chain policy data.
Phase 3: Access Request
Actor: AI Researcher
- Researcher selects dataset, configures epoch count
- UI calculates
escrow = royaltyPerEpoch × requestedEpochs - Researcher signs
requestAccess(datasetRoot, purposeId, epochs)— escrow is locked - DataPolicy contract validates against the owner's usage policy (epoch caps, requester caps, session TTL, expiry, allowed purposes)
- Contract emits
AccessGranted(jobId, datasetRoot, requester, epochs) - Envio indexes the event
State after phase: Escrow is locked. Job is Granted. Researcher cannot access data yet — the Orchestrator hasn't acted.
Phase 4: Orchestration
Actor: Orchestrator (background Node.js process)
The current hackathon build simulates the 0G Compute lifecycle. It proves access control, storage roots, escrow, job tracking, and settlement without claiming that today's public 0G fine-tuning API supports encrypted input key release.
- Orchestrator polls Envio every 5 seconds for
Grantedjobs - For each new job:
- Verifies on-chain state is
Granted(contract is ground truth, not Envio) - Fetches sealed key envelope from Web App API (
/api/orchestrator/key-envelope?datasetRoot=...) - Coordinates the simulated compute job lifecycle
- Tracks progress and actual epochs
- Produces a task/result reference for settlement
- Saves
taskId+providerAddresstocompute_jobsDB table - Calls
startJob(jobId)on DataPolicy contract → state moves toRunning
- Verifies on-chain state is
State after phase: Job is Running. The UI and contract now follow the same lifecycle a real 0G-compatible compute job would follow.
Production Upgrade: 0G-Compatible Confidential Node
The planned production path replaces the simulated compute step with an attested provider node:
- Provider boots the training container inside a TEE/CVM.
- Provider produces a remote attestation quote and ephemeral public key generated inside the TEE.
- LICEN verifies the quote against approved hardware, image hash, and training code hash.
- Dataset AES key is released encrypted to the TEE public key only after successful verification.
- Provider downloads encrypted dataset from 0G Storage, decrypts it inside the TEE, trains, wipes plaintext, encrypts the model artifact, and uploads the result to 0G Storage.
- Provider signs a result manifest used by the contract and audit UI.
Phase 5: Training & Settlement
Actor: Simulated compute lifecycle today; 0G-compatible confidential provider in production
- Training lifecycle progresses through running, delivered, and finished states
- LoRA adapter/result manifest is represented by a
resultHash - Orchestrator's
jobTrackerpolls task status every few seconds in demo mode
When status = Delivered:
- Orchestrator records result delivery
When status = Finished:
- Orchestrator calls
confirmTrainingComplete(jobId, actualEpochs, resultHash, attestationRef) - DataPolicy contract:
- Releases
actualEpochs × royaltyPerEpochto the publisher - Refunds unused escrow
(requestedEpochs − actualEpochs) × royaltyPerEpochto researcher - Sets job state to
Completed
- Releases
- Researcher sees job as
Completedin their session view - Researcher downloads LoRA adapter using
resultHash
State after phase: Publisher paid. Researcher has their model. Settlement is on-chain and auditable.
Failure Handling
If any step in Phase 4 or 5 fails:
| Failure point | What happens |
|---|---|
| Key release/decryption error | Orchestrator calls markJobFailed() → researcher auto-refunded |
| 0G Storage download error | Same — markJobFailed() + refund |
| Compute dispatch error | Same — markJobFailed() + refund |
Compute task Failed | jobTracker calls markJobFailed() + refund |
| Orchestrator crash/restart | jobTracker resumes all running jobs from DB on startup |
The researcher is never stuck waiting — failed jobs always trigger a refund path.