Device Timeline

Goldy tracks GPU completion with a monotonic timeline counter — a u64 value (TimelineValue) that increments with each submission. This replaces fence-per-submission models with a single, always-increasing counter on the device.

TimelineValue

Every non-blocking submission returns a TimelineValue:

#![allow(unused)]
fn main() {
let tv: TimelineValue = graph.submit(&device)?;
}

This value represents a point on the device's timeline. When the GPU finishes executing that submission, the timeline advances past tv.

Both TaskGraph::submit and ComputeEncoder::submit return timeline values. Surface presentation via Frame::present also returns one.

Querying GPU progress

device.gpu_progress() returns the latest completed timeline value without blocking:

#![allow(unused)]
fn main() {
let current = device.gpu_progress();
if current >= tv {
    // submission has finished — safe to read back results
}
}

This is a lightweight query (single atomic read on most backends) suitable for polling in a loop or checking once per frame.

Waiting for completion

device.wait_until(value) blocks the current thread until the GPU timeline reaches at least value:

#![allow(unused)]
fn main() {
let tv = graph.submit(&device)?;

// CPU work while GPU executes...
prepare_next_frame();

// Block until this specific submission completes
device.wait_until(tv)?;
}

For bounded waits, use wait_until_timeout:

#![allow(unused)]
fn main() {
let completed = device.wait_until_timeout(tv, 1000)?; // 1 second timeout
if !completed {
    // GPU hasn't finished yet — handle timeout
}
}

Blocking dispatch

For simple cases where you don't need CPU/GPU overlap, dispatch combines submit + wait:

#![allow(unused)]
fn main() {
graph.dispatch(&device)?; // submits and blocks until complete
}

This is equivalent to:

#![allow(unused)]
fn main() {
let tv = graph.submit(&device)?;
device.wait_until(tv)?;
}

How this differs from fence-based synchronization

Traditional GPU APIs use one fence object per submission. You create a fence, attach it to a submit call, then query or wait on that specific fence. Managing multiple in-flight submissions means tracking multiple fence objects.

Goldy's timeline is a single monotonic counter shared across all submissions on a device:

Fence-basedTimeline-based
TrackingOne fence per submissionOne counter for the device
QueryPoll each fence individuallygpu_progress() >= value
WaitWait on a specific fencewait_until(value)
OrderingFences are independentValues are monotonically ordered
Multi-frameTrack N fence objectsCompare N u64 values

Because timeline values are ordered, you can reason about completion transitively: if gpu_progress() >= tv_b and tv_b > tv_a, then tv_a has also completed.

Practical use cases

CPU readback after compute

#![allow(unused)]
fn main() {
let tv = graph.submit(&device)?;
device.wait_until(tv)?;

let result: Vec<f32> = buffer.read_data(0)?;
}

Multi-frame pipelining

Overlap CPU frame N+1 preparation with GPU frame N execution:

#![allow(unused)]
fn main() {
let mut pending: Option<TimelineValue> = None;

loop {
    // Wait for the previous frame to finish before reusing its resources
    if let Some(tv) = pending {
        device.wait_until(tv)?;
    }

    // Prepare frame N+1 on the CPU
    update_uniforms(&uniform_buffer)?;

    // Submit frame N+1 — GPU starts working, CPU continues
    let tv = graph.submit(&device)?;
    pending = Some(tv);

    // CPU work for the next iteration...
}
}

Polling without blocking

Check completion in a non-blocking render loop:

#![allow(unused)]
fn main() {
let tv = graph.submit(&device)?;

loop {
    if device.gpu_progress() >= tv {
        break; // done
    }
    // do other work, yield, etc.
}
}

Resource lifetime

Dropping a Buffer or Texture may be deferred internally: the GPU memory stays alive until all submissions that reference it have completed. Submit (or present a frame) before dropping resources that must outlive those commands.