Constant Memory 15 XP Medium

Ctrl+Enter Run Ctrl+S Save

📢 Chapter 4, Part 4: Constant Memory — The Royal Broadcast

💡 Story: The King announces something to ALL soldiers via a royal broadcast. Every soldier hears the same message at the same time — instantly! That's constant memory: read-only data that ALL threads read simultaneously with special hardware caching.

Constant Memory:

📖 Read-only — Threads can ONLY read it, never write
📡 Broadcast — When ALL threads in a warp access the SAME address, it's served in 1 cycle!
💾 Size — 64 KB total
🔧 Declare — __constant__ keyword, declared globally
📤 Fill — CPU fills it with cudaMemcpyToSymbol()

#include <cuda_runtime.h>
#include <stdio.h>

// Declare constant memory GLOBALLY (outside any function)
__constant__ float kernel_weights[9];  // e.g., 3x3 convolution filter

__global__ void applyFilter(float* image, float* output, int n) {
    int i = threadIdx.x + blockIdx.x * blockDim.x;
    
    // ALL threads read kernel_weights[0..8] — BROADCAST!  
    // The constant cache serves all 32 warp threads in 1 cycle!
    float sum = 0.0f;
    for (int k = 0; k < 9; k++) {
        if (i + k < n)
            sum += image[i + k] * kernel_weights[k];  // Fast broadcast read
    }
    if (i < n) output[i] = sum;
}

int main() {
    float weights[] = {0.1f, 0.1f, 0.1f, 0.1f, 0.2f, 0.1f, 0.1f, 0.1f, 0.1f};
    
    // Upload to constant memory from CPU
    cudaMemcpyToSymbol(kernel_weights, weights, 9 * sizeof(float));
    
    // ... rest of program ...
    return 0;
}

When to use constant memory:

✅ Perfect for — Filter coefficients, lookup tables, mathematical constants, model weights that don't change during a kernel
✅ Great when — ALL threads read the SAME values (broadcast!)
❌ Avoid when — Different threads read different elements (no benefit, can be slower due to serialization)
📏 Limit — Only 64KB total — don't store large data here

🎯 Real-world use: In deep learning kernels, filter weights for a convolutional layer are often stored in constant memory so they can be broadcast to all threads processing different parts of the input feature map.

📋 Instructions

Simulate constant memory by using compile-time constants. Print a convolution filter application: ``` === Constant Memory Simulation === Filter weights: 0.1 0.2 0.4 0.2 0.1 Input data: 10.0 20.0 30.0 40.0 50.0 All 5 threads reading same weights (broadcast!) Thread 0: sum = 10.0*0.1 + 20.0*0.2 + 30.0*0.4 + 40.0*0.2 + 50.0*0.1 = 30.0 ```

This program is complete! Run it to see the simulation. The key lesson: constant memory is perfect for filter weights because ALL threads read the SAME weights — this is the 'broadcast' scenario where constant memory shines.

← Previous Next Exercise →

main.py

Hi! I'm Rex 👋

Output

Ready. Press ▶ Run or Ctrl+Enter.

›