Your First Compute Shader
This tutorial renders an animated plasma effect by dispatching a compute shader directly to the swapchain texture — no graphics pipeline, no vertex buffers, no render passes.
The Shader
The compute shader uses goldy_exp virtual entry points. It reads uniforms via BufRO<Uniforms> and writes pixels to the swapchain texture via DirectSpatial<float4>:
import goldy_exp;
struct Uniforms {
uint width;
uint height;
float time;
float _padding;
};
[goldy_compute]
[numthreads(8, 8, 1)]
void cs_main(BufRO<Uniforms> uniforms_buf, DirectSpatial<float4> output, ThreadId tid) {
Uniforms u = uniforms_buf[0];
if (tid.x >= u.width || tid.y >= u.height)
return;
float2 uv = float2(float(tid.x) / float(u.width),
float(tid.y) / float(u.height));
float2 p = uv * 2.0 - 1.0;
p.x *= float(u.width) / float(u.height);
float t = u.time;
float v = 0.0;
v += sin(p.x * 6.0 + t);
v += sin(p.y * 6.0 + t * 1.3);
v += sin((p.x + p.y) * 4.0 + t * 0.7);
v += sin(length(p) * 8.0 - t * 2.0);
v *= 0.25;
float3 col = float3(0.5 + 0.5 * sin(v * 3.14159 + 0.0),
0.5 + 0.5 * sin(v * 3.14159 + 2.094),
0.5 + 0.5 * sin(v * 3.14159 + 4.188));
output[tid.xy] = float4(col, 1.0);
}
Key points:
BufRO<Uniforms>is a read-only structured buffer. Index with[0]to load the single element.DirectSpatial<float4>is anRWTexture2D<float4>— write to it withoutput[tid.xy].ThreadIdmaps toSV_DispatchThreadID. Each thread handles one pixel.- The
[goldy_compute]attribute tells the Goldy compiler to wire up bindless slots automatically.
Rust Side
Uniform Buffer
Define the uniform struct on the Rust side with matching layout:
#![allow(unused)] fn main() { #[repr(C)] #[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)] struct Uniforms { width: u32, height: u32, time: f32, _padding: f32, } impl goldy::StructuredBufferElement for Uniforms {} }
Create the buffer with DataAccess::Scattered so it gets a bindless descriptor:
#![allow(unused)] fn main() { let uniform_buffer = Buffer::with_data( &device, &[Uniforms { width, height, time: 0.0, _padding: 0.0 }], DataAccess::Scattered, )?; }
Pass a typed &[Uniforms] slice, not raw bytes. Buffer::with_data::<T> uses size_of::<T>() as the structured-buffer stride, which backends rely on for correct addressing.
Compute Pipeline
Compile the Slang source and create a ComputePipeline:
#![allow(unused)] fn main() { let shader = ShaderModule::from_slang(&device, COMPUTE_SHADER)?; let compute_pipeline = ComputePipeline::new(&device, &shader)?; }
Rendering a Frame
Each frame follows this pattern: update uniforms, acquire the swapchain texture, build a TaskGraph, submit, present.
#![allow(unused)] fn main() { fn render_frame(state: &mut RenderState) -> Result<()> { let (width, height) = state.surface.size(); let elapsed = state.start_time.elapsed().as_secs_f32(); state.uniform_buffer.write( 0, bytemuck::bytes_of(&Uniforms { width, height, time: elapsed, _padding: 0.0, }), )?; let frame = state.surface.begin()?; let texture = frame.texture(); let wg_x = width.div_ceil(8); let wg_y = height.div_ceil(8); let uniform_handle = state.uniform_buffer .bindless_srv_handle() .expect("Uniform buffer has no bindless SRV handle"); let texture_handle = texture .bindless_handle() .expect("Surface texture has no bindless handle"); let mut graph = TaskGraph::new(); graph .node("compute", &state.compute_pipeline) .bind_buffer(&state.uniform_buffer, NodeAccess::Read) .bind_resources_raw(&[uniform_handle.index(), texture_handle.index()]) .dispatch(wg_x, wg_y, 1); frame.submit_compute(&graph)?; frame.present()?; Ok(()) } }
Step by Step
Update uniforms — Buffer::write uploads new time/size values each frame.
Acquire the frame — surface.begin() returns a Frame. frame.texture() gives you the swapchain Texture for this frame.
Get bindless handles — bindless_srv_handle() returns the read-only descriptor index for the uniform buffer. bindless_handle() returns the storage-image descriptor index for the swapchain texture. These indices are passed to the shader as the BufRO<Uniforms> and DirectSpatial<float4> slots respectively.
Build the TaskGraph — graph.node() creates a compute node bound to a pipeline. bind_buffer() declares the dependency (the uniform buffer is read). bind_resources_raw() passes the bindless descriptor indices as push-constant slots. dispatch() sets the workgroup count.
Submit and present — frame.submit_compute(&graph) records and submits the compute work to the GPU. frame.present() presents the swapchain image. The compute shader already wrote the pixels — there is no blit or copy step.
Run It
cargo run --example compute_to_surface
You should see an animated plasma pattern filling the window, rendered entirely from compute.
Next Steps
- Task Graph — multi-node graphs, transient resources, indirect dispatch
- Examples — particles, game of life, and more compute examples