Skip to content

Manual GPU Reservations

The reserve command allows you to manually reserve GPUs for a specific duration without immediately running a command. This is useful for interactive development, planning work sessions, or blocking GPUs for maintenance.

Basic Usage

canhazgpu reserve [--gpus <count> | --gpu-ids <ids>] [--duration <time>]

Defaults: - --gpus: 1 GPU - --duration: 8 hours

Options: - --gpus, -g: Number of GPUs to reserve - --gpu-ids: Specific GPU IDs to reserve (comma-separated, e.g., 1,3,5) - --duration, -d: How long to reserve the GPUs

GPU Selection

  • Use --gpus to let canhazgpu select GPUs using the LRU algorithm
  • Use --gpu-ids when you need specific GPUs (e.g., for hardware requirements)
  • You can use both options together if --gpus matches the GPU ID count or is 1 (default)

Duration Formats

canhazgpu supports flexible duration formats:

Format Description Example
30m 30 minutes --duration 30m
2h 2 hours --duration 2h
1d 1 day --duration 1d
0.5h 30 minutes (decimal) --duration 0.5h
90m 90 minutes --duration 90m
3.5d 3.5 days --duration 3.5d

Common Examples

Quick Development Sessions

# Reserve 1 GPU for 2 hours
canhazgpu reserve --duration 2h

# Reserve 1 GPU for 30 minutes of testing
canhazgpu reserve --duration 30m

Multi-GPU Development

# Reserve 2 GPUs for 4 hours
canhazgpu reserve --gpus 2 --duration 4h

# Reserve 4 GPUs for distributed development
canhazgpu reserve --gpus 4 --duration 6h

# Reserve specific GPU IDs
canhazgpu reserve --gpu-ids 0,2 --duration 4h

Extended Work Sessions

# Full day development (8 hours, default)
canhazgpu reserve

# Multi-day project work
canhazgpu reserve --gpus 2 --duration 2d

# Week-long research sprint
canhazgpu reserve --gpus 1 --duration 7d

Use Cases

Interactive Development

Perfect for Jupyter notebooks, IPython sessions, or iterative model development:

# Reserve GPU for notebook session
canhazgpu reserve --duration 4h
# Note the GPU IDs from the output, e.g., "Reserved 1 GPU(s): [2]"

# Manually set CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=2

# Start Jupyter with the reserved GPU
jupyter notebook

# Your notebooks now have exclusive GPU access

Batch Job Preparation

Reserve GPUs while you prepare and test your batch jobs:

# Reserve GPUs for job prep
canhazgpu reserve --gpus 2 --duration 2h
# Note the GPU IDs from the output, e.g., "Reserved 2 GPU(s): [1, 3]"

# Manually set CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=1,3

# Test your scripts with the reserved GPUs
python test_distributed.py

# Run the actual job (using same GPUs)
python distributed_training.py

# Release when done
canhazgpu release

Maintenance Windows

Block GPUs during system maintenance or updates:

# Block GPU during driver updates
canhazgpu reserve --gpus 8 --duration 1h

# Perform maintenance
sudo apt update && sudo apt upgrade nvidia-driver-*

# Release after maintenance
canhazgpu release

Meeting and Presentation Prep

Ensure GPUs are available for demos and presentations:

# Reserve before important demo
canhazgpu reserve --gpus 1 --duration 3h

# Run demo applications
python demo_inference.py
jupyter notebook presentation.ipynb

# Release after presentation
canhazgpu release

How Manual Reservations Work

Allocation Process

  1. Validation: Checks actual GPU usage with nvidia-smi
  2. Conflict Detection: Excludes GPUs in unreserved use
  3. LRU Selection: Chooses least recently used GPUs
  4. Time-based Expiry: Sets expiration time based on duration
  5. Persistent Storage: Saves reservation in Redis

Environment Setup

Unlike run commands, manual reservations don't automatically set environment variables. You need to check which GPUs were allocated:

# Reserve GPUs
 canhazgpu reserve --gpus 2 --duration 4h
Reserved 2 GPU(s): [1, 3] for 4h 0m 0s

# Check current allocations
 canhazgpu status
GPU STATUS    USER     DURATION    TYPE    MODEL            DETAILS                    VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
1   in use    alice    30s         manual                   expires in 3h 59m 30s     
3   in use    alice    30s         manual                   expires in 3h 59m 30s     

# Manually set CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=1,3
python your_script.py

Expiration and Cleanup

Manual reservations automatically expire after the specified duration:

 canhazgpu status
GPU STATUS    USER     DURATION    TYPE    MODEL            DETAILS                    VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
1   in use    alice    3h 58m 45s  manual                   expires in 1m 15s         

# After expiration
 canhazgpu status  
GPU STATUS    USER     DURATION    TYPE    MODEL            DETAILS                    VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
1   available          free for 5s                                                    

Releasing Reservations

Manual Release

Release all your manual reservations immediately:

 canhazgpu release
Released 2 GPU(s): [1, 3]

Checking Your Reservations

Use status to see your current reservations:

 canhazgpu status
GPU STATUS    USER     DURATION    TYPE    MODEL            DETAILS                    VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
0   available          free for 1h 15m 30s                                           
1   in use    alice    45m 12s     manual                   expires in 3h 14m 48s     # Your reservation
2   in use    bob      1h 30m 0s   run     pytorch-model    heartbeat 5s ago          
3   in use    alice    45m 12s     manual                   expires in 3h 14m 48s     # Your reservation

Error Handling

Insufficient GPUs

 canhazgpu reserve --gpus 4 --duration 2h
Error: Not enough GPUs available. Requested: 4, Available: 2 (2 GPUs in use without reservation - run 'canhazgpu status' for details)

Check the status and try again with fewer GPUs or wait for others to finish.

Invalid Duration Format

 canhazgpu reserve --duration 2hours
Error: Invalid duration format. Use formats like: 30m, 2h, 1d, 0.5h

Use the supported duration formats listed above.

Allocation Lock Contention

 canhazgpu reserve --gpus 2
Error: Failed to acquire allocation lock after 5 attempts

Multiple users are trying to allocate GPUs simultaneously. Wait a few seconds and try again.

Best Practices

Duration Planning

  • Start conservative: Reserve for shorter periods initially
  • Extend if needed: Use reserve again to extend (requires releasing first)
  • Plan for interruptions: Don't reserve longer than you'll actually use

Resource Efficiency

# Good: Reserve what you need
canhazgpu reserve --gpus 1 --duration 2h

# Wasteful: Over-reserving
canhazgpu reserve --gpus 8 --duration 24h  # Only if you really need this

Team Coordination

  • Communicate: Let teammates know about long reservations
  • Release early: Use canhazgpu release when done early
  • Check conflicts: Use canhazgpu status before making large reservations

Development Workflow

# Efficient development cycle
canhazgpu reserve --duration 1h           # Start small
# ... work for 45 minutes ...
canhazgpu reserve --duration 30m          # Extend if needed (after releasing)
# ... finish work ...
canhazgpu release                         # Clean up immediately

Integration Examples

Shell Scripts

#!/bin/bash
set -e

echo "Reserving GPUs for data processing..."
canhazgpu reserve --gpus 2 --duration 3h

echo "Starting data processing pipeline..."
python preprocess.py
python feature_extraction.py
python model_training.py

echo "Releasing GPUs..."
canhazgpu release

echo "Processing complete!"

Python Integration

import subprocess
import os

def reserve_gpus(count=1, duration="2h"):
    """Reserve GPUs and return allocated GPU IDs"""
    result = subprocess.run([
        "canhazgpu", "reserve", 
        "--gpus", str(count),
        "--duration", duration
    ], capture_output=True, text=True)

    if result.returncode != 0:
        raise RuntimeError(f"GPU reservation failed: {result.stderr}")

    # Parse GPU IDs from output
    # "Reserved 2 GPU(s): [1, 3] for 2h 0m 0s"
    import re
    match = re.search(r'Reserved \d+ GPU\(s\): \[([^\]]+)\]', result.stdout)
    if match:
        gpu_ids = [int(x.strip()) for x in match.group(1).split(',')]
        os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(map(str, gpu_ids))
        return gpu_ids
    return []

def release_gpus():
    """Release all manual reservations"""
    subprocess.run(["canhazgpu", "release"], check=True)

# Usage
try:
    gpu_ids = reserve_gpus(2, "1h")  
    print(f"Using GPUs: {gpu_ids}")

    # Your GPU work here
    import torch
    print(f"PyTorch sees {torch.cuda.device_count()} GPUs")

finally:
    release_gpus()

Manual reservations provide fine-grained control over GPU allocation, making them perfect for interactive development and planned work sessions.