The Grid — Your Army 15 XP Easy

Ctrl+Enter Run Ctrl+S Save

🌐 Chapter 3, Part 2: The Grid — Command the Whole Army

💡 A Grid is the entire deployment. When you launch myKernel<<<16, 256>>>(), you deploy a grid of 16 blocks. The grid is the top-level organization of your parallel computation.

Grids can be 1D, 2D, or 3D! You use a special type called dim3:

#include <cuda_runtime.h>

// 1D grid (most common for arrays)
dim3 grid1D(num_blocks);          // same as <<<num_blocks, threads>>>

// 2D grid (great for image processing)
dim3 grid2D(blocks_x, blocks_y);  // Arranges blocks in a 2D pattern
dim3 block2D(16, 16);             // 16x16 = 256 threads/block
// Launch: kernel<<<grid2D, block2D>>>()

// Inside a 2D kernel:
__global__ void process2D(float* img, int width, int height) {
    int col = threadIdx.x + blockIdx.x * blockDim.x;  // x coordinate
    int row = threadIdx.y + blockIdx.y * blockDim.y;  // y coordinate
    
    if (col < width && row < height) {
        int idx = row * width + col;  // 2D → 1D index conversion
        img[idx] *= 1.5f; // Brighten!
    }
}

dim3 — The dimension struct:

dim3 grid(4) — Same as dim3(4, 1, 1) — 1D: 4 blocks
dim3 grid(4, 4) — 2D: 4×4 = 16 blocks total
dim3 grid(4, 4, 4) — 3D: 4×4×4 = 64 blocks total
dim3 block(32, 32) — 2D block with 1024 threads (32×32)
🧮 gridDim.x, gridDim.y, gridDim.z — Access in kernel

// 2D image processing — Classic CUDA pattern
// For an 800x600 image:
dim3 blockSize(16, 16);                        // 256 threads/block
dim3 gridSize(800/16, 600/16);                 // 50 × 37.5 → round up
// Correct way:
dim3 gridSize((800+15)/16, (600+15)/16);       // = (50, 38)
// Total blocks = 50 × 38 = 1900
// Total threads = 1900 × 256 = 486,400
// Compare to 800×600 = 480,000 pixels → close enough, with bounds check!

📋 Instructions

Print grid configurations for different problem sizes. For each image size, compute the required grid dimensions using 16×16 thread blocks: ``` === 2D Grid Configurations (16x16 blocks) === Image 800x600: grid 50x38, total blocks=1900 Image 1920x1080: grid 120x68, total blocks=8160 Image 256x256: grid 16x16, total blocks=256 Image 100x100: grid 7x7, total blocks=49 ```

The code is already nearly complete! Just run it — the formula (width + blockSize - 1) / blockSize correctly computes the ceiling division for grid dimensions.

🧪 Test Cases

Input

printGridConfig(800, 600, 16)

Expected

Image 800x600: grid 50x38, total blocks=1900

800x600 image

Input

printGridConfig(1920, 1080, 16)

Expected

Image 1920x1080: grid 120x68, total blocks=8160

1080p image

← Previous Next Exercise →

main.py

Hi! I'm Rex 👋

Output

Ready. Press ▶ Run or Ctrl+Enter.

›