2D Grids & Blocks 20 XP Medium

Ctrl+Enter Run Ctrl+S Save

🖼️ Chapter 3, Part 4: 2D Grids — Perfect for Images & Matrices

💡 Story: You're now the AI engineer at Netflix. Your job: add a cinematic filter to every frame of a 4K movie. Each frame is 3840×2160 pixels. With 2D grids, each thread handles one pixel — all 8,294,400 pixels processed simultaneously!

#include <cuda_runtime.h>
#include <stdio.h>

// Apply grayscale to an RGB image
__global__ void grayscale(unsigned char* rgb, unsigned char* gray, int width, int height) {
    // Thread maps to one pixel
    int col = threadIdx.x + blockIdx.x * blockDim.x;  // x pixel
    int row = threadIdx.y + blockIdx.y * blockDim.y;  // y pixel
    
    if (col < width && row < height) {
        int pixelIdx = row * width + col;
        int rgbIdx = pixelIdx * 3;  // RGB has 3 channels
        
        unsigned char r = rgb[rgbIdx + 0];
        unsigned char g = rgb[rgbIdx + 1];
        unsigned char b = rgb[rgbIdx + 2];
        
        // Luminosity formula (ITU-R BT.601)
        gray[pixelIdx] = (unsigned char)(0.299f*r + 0.587f*g + 0.114f*b);
    }
}

int main() {
    int width = 1920, height = 1080;
    
    // 2D block: 16x16 = 256 threads per block
    dim3 blockSize(16, 16);
    dim3 gridSize((width + 15) / 16, (height + 15) / 16);
    // gridSize = (120, 68) → 120×68 = 8160 blocks
    // Total threads = 8160 × 256 = 2,088,960 (covers all 2,073,600 pixels)
    
    // ... (memory allocation, copy, etc.) ...
    grayscale<<<gridSize, blockSize>>>(d_rgb, d_gray, width, height);
    cudaDeviceSynchronize();
    return 0;
}

Why 16×16 blocks (256 threads)?

🔲 256 = 8 warps of 32 threads each — warp-aligned (efficient!)
📊 16×16 tiles fit nicely in shared memory for tiling optimizations
⚙️ Good occupancy — Each SM can run multiple 256-thread blocks
🏆 Industry standard — 16×16 is the go-to for 2D CUDA problems

🔑 Common 2D CUDA applications:

🖼️ Image filtering, convolution (CNNs!), color space conversion
🧮 Matrix operations (multiply, transpose, addition)
🌊 2D simulations (fluid dynamics, heat diffusion)
🎮 Game physics, collision detection

📋 Instructions

Calculate and print the 2D grid configurations for common image processing workloads: ``` === 2D CUDA Grid Calculator === Block size: 16x16 (256 threads/block) 720p (1280x720): grid=80x45, blocks=3600, threads=921600 1080p (1920x1080): grid=120x68, blocks=8160, threads=2088960 4K (3840x2160): grid=240x135,blocks=32400, threads=8294400 ```

This code is almost complete! The printImageConfig function already implements the grid calculation. Just run it to see the configurations. The key formula is: gridX = ceil(width/blockSize), gridY = ceil(height/blockSize).

← Previous Next Exercise →

main.py

Hi! I'm Rex 👋

Output

Ready. Press ▶ Run or Ctrl+Enter.

›