Pooling and Sub-Allocation

GPU resource allocation is expensive. Creating many small buffers or textures each frame produces allocation overhead, descriptor churn, and VRAM fragmentation. Goldy provides three pooling types to amortize these costs.

BufferPool

BufferPool sub-allocates typed regions from a single large BufferKind::Scattered backing buffer. Each region gets its own bindless descriptor, so shaders see independent zero-based buffers.

Creating a Pool

#![allow(unused)]
fn main() {
use goldy::BufferPool;

let mut pool = BufferPool::new(&device, 1024 * 1024)?; // 1 MB pool
}

The backing buffer uses BufferKind::Scattered and a default sub-allocation alignment of 256 bytes (satisfies minStorageBufferOffsetAlignment on all known Vulkan/DX12 hardware).

For custom alignment:

#![allow(unused)]
fn main() {
let mut pool = BufferPool::with_alignment(&device, total_size, 512)?;
}

Allocating Regions

Typed allocation — stride is inferred from T:

#![allow(unused)]
fn main() {
let tiles: BufferView = pool.alloc::<[u32; 2]>(1024)?;    // 1024 elements
let segments: BufferView = pool.alloc::<[f32; 6]>(4096)?;  // 4096 elements
}

Allocate and fill in one call:

#![allow(unused)]
fn main() {
let data = vec![[1.0f32, 0.0, 0.0]; 100];
let view: BufferView = pool.alloc_with_data(&data)?;
}

Raw byte allocation with explicit stride:

#![allow(unused)]
fn main() {
let view = pool.alloc_bytes(4096, Some(16))?;
}

Each allocation is aligned to satisfy both the pool alignment (256) and offset % element_stride == 0 (required by DX12 StructuredBuffer views).

Using Allocated Views

Every BufferView from a pool has its own bindless descriptor. Bind it like any buffer:

#![allow(unused)]
fn main() {
let tile_handle = tiles.handle(ResourceAccess::Write).unwrap();
pass.bind_resources_typed(&[tile_handle]);

// Or as a vertex/index buffer
pass.set_vertex_buffer(0, &tiles);
}

Write data into a view:

#![allow(unused)]
fn main() {
view.write_data(&new_data)?;
}

Sizing a Pool

Use BufferPool::padded_size to compute the exact byte capacity needed for a known set of allocations, including alignment padding:

#![allow(unused)]
fn main() {
let size = BufferPool::padded_size(&[
    (1024, std::mem::size_of::<[u32; 2]>()),  // tiles
    (4096, std::mem::size_of::<[f32; 6]>()),  // segments
    (512,  std::mem::size_of::<u32>()),        // indices
]);
let mut pool = BufferPool::new(&device, size)?;
}

Resetting

reset() moves the bump pointer back to zero without invalidating existing views. Use for frame-to-frame reuse when previous views are no longer in flight.

#![allow(unused)]
fn main() {
pool.reset();
}

Pool Queries

#![allow(unused)]
fn main() {
pool.used();             // bytes currently allocated
pool.capacity();         // total pool size
pool.remaining();        // bytes available
pool.backing_buffer();   // reference to the underlying Buffer
}

TexturePool

TexturePool caches released textures for reuse, avoiding repeated GPU allocation and deallocation. This is particularly valuable on DX12 where texture allocation involves descriptor heap management.

Creating a Pool

#![allow(unused)]
fn main() {
use goldy::{TexturePool, TexturePoolConfig};

let mut pool = TexturePool::new(TexturePoolConfig {
    max_per_key: 4, // keep up to 4 textures per (width, height, format, access, flags) key
});

// Or use defaults (max_per_key = 8)
let mut pool = TexturePool::default();
}

Acquire and Release

#![allow(unused)]
fn main() {
use goldy::{TextureKind, TextureFormat, TextureFlags};

// Acquire — returns a pooled texture if available, otherwise creates a new one
let texture = pool.acquire(
    &device,
    1920, 1080,
    TextureFormat::Rgba16Float,
    TextureKind::Direct,
    TextureFlags::COPY_SRC | TextureFlags::COPY_DST,
)?;

// ... use the texture for this frame's work ...

// Release — return to pool after GPU work completes
pool.release(texture);
}

Borrowed textures (texture.borrow()) are silently dropped on release and not pooled.

Pool Key

Textures are keyed by (width, height, format, access, flags). Acquiring a texture only matches exact keys — a 128×128 texture will not be returned for a 256×256 request.

Eviction

When a key already holds max_per_key entries, additional releases are dropped (destroyed) immediately.

Stats and Cleanup

#![allow(unused)]
fn main() {
let stats = pool.stats();
println!("{} textures pooled, ~{} bytes", stats.entries, stats.estimated_bytes);

pool.clear(); // drop all pooled textures, free GPU memory
}

When to Use Pooling

ScenarioRecommendation
Many small storage buffers with similar lifetimeBufferPool — one allocation, many views
Per-frame uniform/storage data that changes every frameBufferPool reset inside a FrameOrchestrator retirement callback — epoch-safe, no manual ring
Transient render targets or compute texturesTexturePool — acquire/release cycle avoids allocation churn
Long-lived buffers (mesh data, static textures)Individual Buffer / Texture — pooling adds no benefit
Uniform buffer updated once at startupIndividual Buffer — no per-frame reuse needed

Sub-Allocation Patterns

Static Geometry Pool

Pack all static mesh data into one BufferPool at load time:

#![allow(unused)]
fn main() {
let size = BufferPool::padded_size(&[
    (vertex_count, std::mem::size_of::<Vertex>()),
    (index_count, std::mem::size_of::<u32>()),
]);
let mut pool = BufferPool::new(&device, size)?;

let vertices = pool.alloc_with_data(&vertex_data)?;
let indices = pool.alloc_with_data(&index_data)?;
}

Per-Frame Dynamic Data

For data that changes every frame, keep a BufferPool per frame slot and reset it in the FrameOrchestrator retirement callback — the orchestrator guarantees the GPU has finished before the callback fires:

#![allow(unused)]
fn main() {
struct FrameData {
    pool: BufferPool,
}

let mut orch: FrameOrchestrator<FrameData> = FrameOrchestrator::new(&device, 3);

// Each frame:
let handle = orch.begin_frame(|_dev, retired| {
    retired.data.pool.reset(); // safe: GPU epoch has passed
    Ok(())
})?;

let mut frame_data = /* get or create pool for this slot */;
let uniforms = frame_data.pool.alloc_with_data(&[camera_data])?;
let instances = frame_data.pool.alloc_with_data(&instance_transforms)?;
}

See Pipelined Frames for the full FrameOrchestrator API.

Transient Compute Textures

Pool intermediate textures in a multi-pass compute pipeline:

#![allow(unused)]
fn main() {
let mut tex_pool = TexturePool::default();

// Each frame:
let temp = tex_pool.acquire(&device, w, h, fmt, TextureKind::Direct, flags)?;
// ... compute pass writes to temp ...
// ... next pass reads from temp ...
tex_pool.release(temp); // return for reuse next frame
}