CUDA Programming The GPU Universe
💡
Exercise 4

The CUDA Toolkit 15 XP Easy

Ctrl+Enter Run Ctrl+S Save

⚙️ Chapter 1, Part 4: Your CUDA Toolbox

💡 Think of the CUDA Toolkit as your army headquarters. It gives you everything you need: the weapons (compilers), the maps (documentation), the training manuals (libraries), and the spy satellite (profiler).

The CUDA Toolkit contains:

  • 🔨 nvcc — The NVIDIA CUDA Compiler (like gcc but for .cu files)
  • 📚 cuBLAS — GPU-accelerated linear algebra (Basic Linear Algebra Subprograms)
  • 🔢 cuDNN — GPU-accelerated deep neural network primitives (used by TensorFlow, PyTorch)
  • 🔍 cuFFT — GPU Fast Fourier Transform library
  • 🎯 Nsight — Visual profiler to see where your code is slow
  • 🧪 cuda-gdb — Debugger for GPU code
  • 📖 CUDA Runtime API — Functions like cudaMalloc, cudaMemcpy, cudaFree

File types in CUDA:

  • 📄 .cu — CUDA source file (GPU + CPU code together)
  • 📄 .cuh — CUDA header file
  • 📄 .c / .cpp — Regular C/C++ (host only)
  • ⚙️ nvcc compiles .cu files, separating GPU vs CPU code automatically
// File: hello_cuda.cu // Compile with: nvcc hello_cuda.cu -o hello_cuda // Run with: ./hello_cuda #include <stdio.h> #include <cuda_runtime.h> // CUDA Runtime API header // __global__ means: "this function runs on the GPU" __global__ void helloKernel() { printf("Hello from GPU thread %d!\n", threadIdx.x); } int main() { // <<<blocks, threads_per_block>>> helloKernel<<<1, 5>>>(); // 1 block, 5 threads cudaDeviceSynchronize(); // Wait for GPU to finish return 0; }

How nvcc compiles your code:

  • 1️⃣ nvcc reads your .cu file
  • 2️⃣ Separates __global__ and __device__ code (GPU) from regular C code (CPU)
  • 3️⃣ Compiles GPU code to PTX (parallel thread execution) assembly
  • 4️⃣ PTX is then compiled to CUBIN (cube-in) — actual GPU machine code
  • 5️⃣ Compiles CPU code with the regular C++ compiler (g++ or cl.exe)
  • 6️⃣ Links everything into a single executable

🎮 Checking your GPU: Run nvidia-smi in your terminal to see your GPU model, driver version, and memory usage. This is the 'health check' for your GPU.

// Query GPU properties at runtime #include <cuda_runtime.h> #include <stdio.h> int main() { cudaDeviceProp prop; cudaGetDeviceProperties(&prop, 0); // Device 0 = first GPU printf("GPU Name: %s\n", prop.name); printf("Total VRAM: %lu MB\n", prop.totalGlobalMem / (1024*1024)); printf("CUDA Cores per SM: %d\n", prop.maxThreadsPerBlock); printf("Number of SMs: %d\n", prop.multiProcessorCount); printf("Max Threads/Block: %d\n", prop.maxThreadsPerBlock); return 0; }
📋 Instructions
Print information about the CUDA toolkit components: ``` === CUDA Toolkit Components === Compiler: nvcc Linear Algebra: cuBLAS Deep Learning: cuDNN FFT Library: cuFFT Profiler: Nsight Debugger: cuda-gdb File Extension: .cu ```
Each line uses printf("Component: Value\n"). Add all 7 lines after the header.
main.py
Hi! I'm Rex 👋
Output
Ready. Press ▶ Run or Ctrl+Enter.