🔢 Chapter 3, Part 3: The Index — Your Thread's True Identity
💡 Story: If 10,000 soldiers are sent to paint a wall of 10,000 bricks, each soldier needs to know EXACTLY which brick is theirs. The global thread index is each soldier's brick assignment slip!
The global thread index formula for 1D, 2D, and 3D:
// ============ 1D ============
__global__ void kernel1D(float* arr, int n) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < n) arr[i] = i * 2.0f;
}
// ============ 2D ============
__global__ void kernel2D(float* mat, int width, int height) {
int col = threadIdx.x + blockIdx.x * blockDim.x; // x direction
int row = threadIdx.y + blockIdx.y * blockDim.y; // y direction
int idx = row * width + col; // flatten 2D to 1D index
if (col < width && row < height)
mat[idx] = row * 10.0f + col; // row-major order
}
// ============ 3D ============
__global__ void kernel3D(float* vol, int W, int H, int D) {
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int z = threadIdx.z + blockIdx.z * blockDim.z;
if (x < W && y < H && z < D) {
int idx = z * W * H + y * W + x; // 3D → 1D
vol[idx] = x + y + z;
}
}
📐 2D Index Visualization (4x3 matrix, 2x2 blocks):
// Matrix laid out in memory (row-major):
// [0,0][0,1][0,2][0,3]
// [1,0][1,1][1,2][1,3]
// [2,0][2,1][2,2][2,3]
//
// Thread (col=2, row=1) → idx = 1*4 + 2 = 6
// Memory address: mat[6]
//
// Always use: idx = row * width + col
🔑 Grid-stride loops — when you have MORE data than threads:
// If n=1,000,000 but you only launch 1024 threads,
// use a grid-stride loop — each thread processes MULTIPLE elements:
__global__ void processLarge(float* arr, int n) {
int stride = blockDim.x * gridDim.x; // total threads launched
int i = threadIdx.x + blockIdx.x * blockDim.x;
while (i < n) { // Each thread handles elements spaced by stride
arr[i] *= 2.0f;
i += stride; // Jump to next chunk
}
}
📋 Instructions
Simulate 2D thread indexing. For a 4-wide, 3-tall matrix with 2×2 thread blocks, compute and print the 2D coordinates and 1D memory index for each thread:
```
=== 2D Thread Index Simulation ===
Width=4, Height=3, BlockSize=2x2
col=0, row=0 -> idx=0
col=1, row=0 -> idx=1
col=2, row=0 -> idx=2
col=3, row=0 -> idx=3
col=0, row=1 -> idx=4
col=1, row=1 -> idx=5
col=2, row=1 -> idx=6
col=3, row=1 -> idx=7
col=0, row=2 -> idx=8
col=1, row=2 -> idx=9
col=2, row=2 -> idx=10
col=3, row=2 -> idx=11
```
The 2D to 1D formula is: idx = row * width + col. This is called row-major order — the same order C stores multi-dimensional arrays in memory.
⚠️ Try solving it yourself first — you'll learn more!
#include <stdio.h>
int main() {
int width = 4, height = 3;
printf("=== 2D Thread Index Simulation ===\n");
printf("Width=%d, Height=%d, BlockSize=2x2\n", width, height);
for (int row = 0; row < height; row++) {
for (int col = 0; col < width; col++) {
int idx = row * width + col;
printf("col=%d, row=%d -> idx=%d\n", col, row, idx);
}
}
return 0;
}