CUDA Programming Threads, Blocks & Grids
💡
Exercise 13

Global Thread Index 15 XP Medium

Ctrl+Enter Run Ctrl+S Save

🔢 Chapter 3, Part 3: The Index — Your Thread's True Identity

💡 Story: If 10,000 soldiers are sent to paint a wall of 10,000 bricks, each soldier needs to know EXACTLY which brick is theirs. The global thread index is each soldier's brick assignment slip!

The global thread index formula for 1D, 2D, and 3D:

// ============ 1D ============ __global__ void kernel1D(float* arr, int n) { int i = threadIdx.x + blockIdx.x * blockDim.x; if (i < n) arr[i] = i * 2.0f; } // ============ 2D ============ __global__ void kernel2D(float* mat, int width, int height) { int col = threadIdx.x + blockIdx.x * blockDim.x; // x direction int row = threadIdx.y + blockIdx.y * blockDim.y; // y direction int idx = row * width + col; // flatten 2D to 1D index if (col < width && row < height) mat[idx] = row * 10.0f + col; // row-major order } // ============ 3D ============ __global__ void kernel3D(float* vol, int W, int H, int D) { int x = threadIdx.x + blockIdx.x * blockDim.x; int y = threadIdx.y + blockIdx.y * blockDim.y; int z = threadIdx.z + blockIdx.z * blockDim.z; if (x < W && y < H && z < D) { int idx = z * W * H + y * W + x; // 3D → 1D vol[idx] = x + y + z; } }

📐 2D Index Visualization (4x3 matrix, 2x2 blocks):

// Matrix laid out in memory (row-major): // [0,0][0,1][0,2][0,3] // [1,0][1,1][1,2][1,3] // [2,0][2,1][2,2][2,3] // // Thread (col=2, row=1) → idx = 1*4 + 2 = 6 // Memory address: mat[6] // // Always use: idx = row * width + col

🔑 Grid-stride loops — when you have MORE data than threads:

// If n=1,000,000 but you only launch 1024 threads, // use a grid-stride loop — each thread processes MULTIPLE elements: __global__ void processLarge(float* arr, int n) { int stride = blockDim.x * gridDim.x; // total threads launched int i = threadIdx.x + blockIdx.x * blockDim.x; while (i < n) { // Each thread handles elements spaced by stride arr[i] *= 2.0f; i += stride; // Jump to next chunk } }
📋 Instructions
Simulate 2D thread indexing. For a 4-wide, 3-tall matrix with 2×2 thread blocks, compute and print the 2D coordinates and 1D memory index for each thread: ``` === 2D Thread Index Simulation === Width=4, Height=3, BlockSize=2x2 col=0, row=0 -> idx=0 col=1, row=0 -> idx=1 col=2, row=0 -> idx=2 col=3, row=0 -> idx=3 col=0, row=1 -> idx=4 col=1, row=1 -> idx=5 col=2, row=1 -> idx=6 col=3, row=1 -> idx=7 col=0, row=2 -> idx=8 col=1, row=2 -> idx=9 col=2, row=2 -> idx=10 col=3, row=2 -> idx=11 ```
The 2D to 1D formula is: idx = row * width + col. This is called row-major order — the same order C stores multi-dimensional arrays in memory.
⚠️ Try solving it yourself first — you'll learn more!
#include <stdio.h>

int main() {
    int width = 4, height = 3;
    printf("=== 2D Thread Index Simulation ===\n");
    printf("Width=%d, Height=%d, BlockSize=2x2\n", width, height);
    for (int row = 0; row < height; row++) {
        for (int col = 0; col < width; col++) {
            int idx = row * width + col;
            printf("col=%d, row=%d -> idx=%d\n", col, row, idx);
        }
    }
    return 0;
}
main.py
Hi! I'm Rex 👋
Output
Ready. Press ▶ Run or Ctrl+Enter.