Knowing Your Thread ID 15 XP Easy

Ctrl+Enter Run Ctrl+S Save

🆔 Chapter 2, Part 4: Every Thread Knows Who It Is

💡 Story: Your 1,024 GPU soldiers are deployed. How does each soldier know which part of the battlefield to cover? Simple — each soldier has a unique ID badge! In CUDA, that badge is the global thread index.

CUDA provides built-in variables available inside every kernel:

// Available inside any __global__ kernel:
threadIdx.x  // Thread's position within its BLOCK (0-based)
blockIdx.x   // Which BLOCK this thread belongs to (0-based)
blockDim.x   // Total threads per block
gridDim.x    // Total number of blocks in the grid

// Computing the GLOBAL thread ID (most important formula in CUDA!):
int globalId = threadIdx.x + blockIdx.x * blockDim.x;
//             ^ position    ^ which     ^ how big
//               in block     block?      is each block?

📖 Visualizing it:

// Launch: <<<3, 4>>> = 3 blocks, 4 threads each = 12 threads total
//
// Block 0:  Thread 0, 1, 2, 3  → Global IDs: 0,  1,  2,  3
// Block 1:  Thread 0, 1, 2, 3  → Global IDs: 4,  5,  6,  7
// Block 2:  Thread 0, 1, 2, 3  → Global IDs: 8,  9,  10, 11
//
// Formula: globalId = threadIdx.x + blockIdx.x * blockDim.x
//
// Block 1, Thread 2: globalId = 2 + 1 * 4 = 6  ✓
// Block 2, Thread 3: globalId = 3 + 2 * 4 = 11 ✓

#include <stdio.h>

__global__ void showThreadInfo() {
    int globalId = threadIdx.x + blockIdx.x * blockDim.x;
    
    printf("Block %d, Thread %d -> Global ID: %d\n",
           blockIdx.x,    // which block
           threadIdx.x,   // thread within block
           globalId       // global unique ID
    );
}

int main() {
    showThreadInfo<<<3, 4>>>();  // 3 blocks, 4 threads each
    cudaDeviceSynchronize();
    return 0;
}

🔑 Why this formula matters so much: When you have an array of n elements and you launch n threads, each thread uses its globalId as an array index — thread 0 processes element 0, thread 1 processes element 1, etc. This is the foundation of ALL CUDA programs!

// Pattern used in EVERY CUDA program:
__global__ void processArray(float* arr, int n) {
    int i = threadIdx.x + blockIdx.x * blockDim.x;  // MEMORIZE this!
    if (i < n) {          // Safety check (last block may have extra threads)
        arr[i] = arr[i] * 2.0f;  // Thread i processes element i
    }
}

📋 Instructions

Given a kernel launch configuration, compute the global thread IDs. Write a C program that simulates the CUDA thread ID computation (no actual GPU needed). For a launch of **2 blocks, 4 threads per block**, print what each thread's global ID would be: ``` === Thread ID Simulation === Block 0, Thread 0: Global ID = 0 Block 0, Thread 1: Global ID = 1 Block 0, Thread 2: Global ID = 2 Block 0, Thread 3: Global ID = 3 Block 1, Thread 0: Global ID = 4 Block 1, Thread 1: Global ID = 5 Block 1, Thread 2: Global ID = 6 Block 1, Thread 3: Global ID = 7 ``` Use nested loops to simulate it!

The formula is: int globalId = threadId + blockId * threadsPerBlock; This is exactly what CUDA's threadIdx.x + blockIdx.x * blockDim.x computes on the GPU!

⚠️ Try solving it yourself first — you'll learn more!

#include <stdio.h>

int main() {
    int numBlocks = 2;
    int threadsPerBlock = 4;
    printf("=== Thread ID Simulation ===\n");
    for (int blockId = 0; blockId < numBlocks; blockId++) {
        for (int threadId = 0; threadId < threadsPerBlock; threadId++) {
            int globalId = threadId + blockId * threadsPerBlock;
            printf("Block %d, Thread %d: Global ID = %d\n", blockId, threadId, globalId);
        }
    }
    return 0;
}

🧪 Test Cases

Input

Block 0, Thread 0

Expected

Global ID = 0

First thread

Input

Block 1, Thread 0

Expected

Global ID = 4

First thread of block 1

Input

Block 1, Thread 3

Expected

Global ID = 7

Last thread

← Previous Next Exercise →

main.py

Hi! I'm Rex 👋

Output

Ready. Press ▶ Run or Ctrl+Enter.

›