The CUDA Toolkit 15 XP Easy

Ctrl+Enter Run Ctrl+S Save

⚙️ Chapter 1, Part 4: Your CUDA Toolbox

💡 Think of the CUDA Toolkit as your army headquarters. It gives you everything you need: the weapons (compilers), the maps (documentation), the training manuals (libraries), and the spy satellite (profiler).

The CUDA Toolkit contains:

🔨 nvcc — The NVIDIA CUDA Compiler (like gcc but for .cu files)
📚 cuBLAS — GPU-accelerated linear algebra (Basic Linear Algebra Subprograms)
🔢 cuDNN — GPU-accelerated deep neural network primitives (used by TensorFlow, PyTorch)
🔍 cuFFT — GPU Fast Fourier Transform library
🎯 Nsight — Visual profiler to see where your code is slow
🧪 cuda-gdb — Debugger for GPU code
📖 CUDA Runtime API — Functions like cudaMalloc, cudaMemcpy, cudaFree

File types in CUDA:

📄 .cu — CUDA source file (GPU + CPU code together)
📄 .cuh — CUDA header file
📄 .c / .cpp — Regular C/C++ (host only)
⚙️ nvcc compiles .cu files, separating GPU vs CPU code automatically

// File: hello_cuda.cu
// Compile with: nvcc hello_cuda.cu -o hello_cuda
// Run with:     ./hello_cuda

#include <stdio.h>
#include <cuda_runtime.h>   // CUDA Runtime API header

// __global__ means: "this function runs on the GPU"
__global__ void helloKernel() {
    printf("Hello from GPU thread %d!\n", threadIdx.x);
}

int main() {
    // <<<blocks, threads_per_block>>>
    helloKernel<<<1, 5>>>();     // 1 block, 5 threads
    cudaDeviceSynchronize();     // Wait for GPU to finish
    return 0;
}

How nvcc compiles your code:

1️⃣ nvcc reads your .cu file
2️⃣ Separates __global__ and __device__ code (GPU) from regular C code (CPU)
3️⃣ Compiles GPU code to PTX (parallel thread execution) assembly
4️⃣ PTX is then compiled to CUBIN (cube-in) — actual GPU machine code
5️⃣ Compiles CPU code with the regular C++ compiler (g++ or cl.exe)
6️⃣ Links everything into a single executable

🎮 Checking your GPU: Run nvidia-smi in your terminal to see your GPU model, driver version, and memory usage. This is the 'health check' for your GPU.

// Query GPU properties at runtime
#include <cuda_runtime.h>
#include <stdio.h>

int main() {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, 0);  // Device 0 = first GPU
    
    printf("GPU Name: %s\n", prop.name);
    printf("Total VRAM: %lu MB\n", prop.totalGlobalMem / (1024*1024));
    printf("CUDA Cores per SM: %d\n", prop.maxThreadsPerBlock);
    printf("Number of SMs: %d\n", prop.multiProcessorCount);
    printf("Max Threads/Block: %d\n", prop.maxThreadsPerBlock);
    return 0;
}

📋 Instructions

Print information about the CUDA toolkit components: ``` === CUDA Toolkit Components === Compiler: nvcc Linear Algebra: cuBLAS Deep Learning: cuDNN FFT Library: cuFFT Profiler: Nsight Debugger: cuda-gdb File Extension: .cu ```

Each line uses printf("Component: Value\n"). Add all 7 lines after the header.

← Previous Next Exercise →

main.py

Hi! I'm Rex 👋

Output

Ready. Press ▶ Run or Ctrl+Enter.

›