CUDA Programming Threads, Blocks & Grids
💡
Exercise 12

The Grid — Your Army 15 XP Easy

Ctrl+Enter Run Ctrl+S Save

🌐 Chapter 3, Part 2: The Grid — Command the Whole Army

💡 A Grid is the entire deployment. When you launch myKernel<<<16, 256>>>(), you deploy a grid of 16 blocks. The grid is the top-level organization of your parallel computation.

Grids can be 1D, 2D, or 3D! You use a special type called dim3:

#include <cuda_runtime.h> // 1D grid (most common for arrays) dim3 grid1D(num_blocks); // same as <<<num_blocks, threads>>> // 2D grid (great for image processing) dim3 grid2D(blocks_x, blocks_y); // Arranges blocks in a 2D pattern dim3 block2D(16, 16); // 16x16 = 256 threads/block // Launch: kernel<<<grid2D, block2D>>>() // Inside a 2D kernel: __global__ void process2D(float* img, int width, int height) { int col = threadIdx.x + blockIdx.x * blockDim.x; // x coordinate int row = threadIdx.y + blockIdx.y * blockDim.y; // y coordinate if (col < width && row < height) { int idx = row * width + col; // 2D → 1D index conversion img[idx] *= 1.5f; // Brighten! } }

dim3 — The dimension struct:

  • dim3 grid(4) — Same as dim3(4, 1, 1) — 1D: 4 blocks
  • dim3 grid(4, 4) — 2D: 4×4 = 16 blocks total
  • dim3 grid(4, 4, 4) — 3D: 4×4×4 = 64 blocks total
  • dim3 block(32, 32) — 2D block with 1024 threads (32×32)
  • 🧮 gridDim.x, gridDim.y, gridDim.z — Access in kernel
// 2D image processing — Classic CUDA pattern // For an 800x600 image: dim3 blockSize(16, 16); // 256 threads/block dim3 gridSize(800/16, 600/16); // 50 × 37.5 → round up // Correct way: dim3 gridSize((800+15)/16, (600+15)/16); // = (50, 38) // Total blocks = 50 × 38 = 1900 // Total threads = 1900 × 256 = 486,400 // Compare to 800×600 = 480,000 pixels → close enough, with bounds check!
📋 Instructions
Print grid configurations for different problem sizes. For each image size, compute the required grid dimensions using 16×16 thread blocks: ``` === 2D Grid Configurations (16x16 blocks) === Image 800x600: grid 50x38, total blocks=1900 Image 1920x1080: grid 120x68, total blocks=8160 Image 256x256: grid 16x16, total blocks=256 Image 100x100: grid 7x7, total blocks=49 ```
The code is already nearly complete! Just run it — the formula (width + blockSize - 1) / blockSize correctly computes the ceiling division for grid dimensions.
🧪 Test Cases
Input
printGridConfig(800, 600, 16)
Expected
Image 800x600: grid 50x38, total blocks=1900
800x600 image
Input
printGridConfig(1920, 1080, 16)
Expected
Image 1920x1080: grid 120x68, total blocks=8160
1080p image
main.py
Hi! I'm Rex 👋
Output
Ready. Press ▶ Run or Ctrl+Enter.