Manual GPU Reservations
The reserve
command allows you to manually reserve GPUs for a specific duration without immediately running a command. This is useful for interactive development, planning work sessions, or blocking GPUs for maintenance.
Basic Usage
Defaults:
- --gpus
: 1 GPU
- --duration
: 8 hours
Options:
- --gpus, -g
: Number of GPUs to reserve
- --gpu-ids
: Specific GPU IDs to reserve (comma-separated, e.g., 1,3,5)
- --duration, -d
: How long to reserve the GPUs
GPU Selection
- Use
--gpus
to let canhazgpu select GPUs using the LRU algorithm - Use
--gpu-ids
when you need specific GPUs (e.g., for hardware requirements) - You can use both options together if
--gpus
matches the GPU ID count or is 1 (default)
Duration Formats
canhazgpu supports flexible duration formats:
Format | Description | Example |
---|---|---|
30m |
30 minutes | --duration 30m |
2h |
2 hours | --duration 2h |
1d |
1 day | --duration 1d |
0.5h |
30 minutes (decimal) | --duration 0.5h |
90m |
90 minutes | --duration 90m |
3.5d |
3.5 days | --duration 3.5d |
Common Examples
Quick Development Sessions
# Reserve 1 GPU for 2 hours
canhazgpu reserve --duration 2h
# Reserve 1 GPU for 30 minutes of testing
canhazgpu reserve --duration 30m
Multi-GPU Development
# Reserve 2 GPUs for 4 hours
canhazgpu reserve --gpus 2 --duration 4h
# Reserve 4 GPUs for distributed development
canhazgpu reserve --gpus 4 --duration 6h
# Reserve specific GPU IDs
canhazgpu reserve --gpu-ids 0,2 --duration 4h
Extended Work Sessions
# Full day development (8 hours, default)
canhazgpu reserve
# Multi-day project work
canhazgpu reserve --gpus 2 --duration 2d
# Week-long research sprint
canhazgpu reserve --gpus 1 --duration 7d
Use Cases
Interactive Development
Perfect for Jupyter notebooks, IPython sessions, or iterative model development:
# Reserve GPU for notebook session
canhazgpu reserve --duration 4h
# Note the GPU IDs from the output, e.g., "Reserved 1 GPU(s): [2]"
# Manually set CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=2
# Start Jupyter with the reserved GPU
jupyter notebook
# Your notebooks now have exclusive GPU access
Batch Job Preparation
Reserve GPUs while you prepare and test your batch jobs:
# Reserve GPUs for job prep
canhazgpu reserve --gpus 2 --duration 2h
# Note the GPU IDs from the output, e.g., "Reserved 2 GPU(s): [1, 3]"
# Manually set CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=1,3
# Test your scripts with the reserved GPUs
python test_distributed.py
# Run the actual job (using same GPUs)
python distributed_training.py
# Release when done
canhazgpu release
Maintenance Windows
Block GPUs during system maintenance or updates:
# Block GPU during driver updates
canhazgpu reserve --gpus 8 --duration 1h
# Perform maintenance
sudo apt update && sudo apt upgrade nvidia-driver-*
# Release after maintenance
canhazgpu release
Meeting and Presentation Prep
Ensure GPUs are available for demos and presentations:
# Reserve before important demo
canhazgpu reserve --gpus 1 --duration 3h
# Run demo applications
python demo_inference.py
jupyter notebook presentation.ipynb
# Release after presentation
canhazgpu release
How Manual Reservations Work
Allocation Process
- Validation: Checks actual GPU usage with nvidia-smi
- Conflict Detection: Excludes GPUs in unreserved use
- LRU Selection: Chooses least recently used GPUs
- Time-based Expiry: Sets expiration time based on duration
- Persistent Storage: Saves reservation in Redis
Environment Setup
Unlike run
commands, manual reservations don't automatically set environment variables. You need to check which GPUs were allocated:
# Reserve GPUs
❯ canhazgpu reserve --gpus 2 --duration 4h
Reserved 2 GPU(s): [1, 3] for 4h 0m 0s
# Check current allocations
❯ canhazgpu status
GPU STATUS USER DURATION TYPE MODEL DETAILS VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
1 in use alice 30s manual expires in 3h 59m 30s
3 in use alice 30s manual expires in 3h 59m 30s
# Manually set CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=1,3
python your_script.py
Expiration and Cleanup
Manual reservations automatically expire after the specified duration:
❯ canhazgpu status
GPU STATUS USER DURATION TYPE MODEL DETAILS VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
1 in use alice 3h 58m 45s manual expires in 1m 15s
# After expiration
❯ canhazgpu status
GPU STATUS USER DURATION TYPE MODEL DETAILS VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
1 available free for 5s
Releasing Reservations
Manual Release
Release all your manual reservations immediately:
Checking Your Reservations
Use status
to see your current reservations:
❯ canhazgpu status
GPU STATUS USER DURATION TYPE MODEL DETAILS VALIDATION
--- --------- -------- ----------- ------- ---------------- -------------------------- ---------------------
0 available free for 1h 15m 30s
1 in use alice 45m 12s manual expires in 3h 14m 48s # Your reservation
2 in use bob 1h 30m 0s run pytorch-model heartbeat 5s ago
3 in use alice 45m 12s manual expires in 3h 14m 48s # Your reservation
Error Handling
Insufficient GPUs
❯ canhazgpu reserve --gpus 4 --duration 2h
Error: Not enough GPUs available. Requested: 4, Available: 2 (2 GPUs in use without reservation - run 'canhazgpu status' for details)
Check the status and try again with fewer GPUs or wait for others to finish.
Invalid Duration Format
❯ canhazgpu reserve --duration 2hours
Error: Invalid duration format. Use formats like: 30m, 2h, 1d, 0.5h
Use the supported duration formats listed above.
Allocation Lock Contention
Multiple users are trying to allocate GPUs simultaneously. Wait a few seconds and try again.
Best Practices
Duration Planning
- Start conservative: Reserve for shorter periods initially
- Extend if needed: Use
reserve
again to extend (requires releasing first) - Plan for interruptions: Don't reserve longer than you'll actually use
Resource Efficiency
# Good: Reserve what you need
canhazgpu reserve --gpus 1 --duration 2h
# Wasteful: Over-reserving
canhazgpu reserve --gpus 8 --duration 24h # Only if you really need this
Team Coordination
- Communicate: Let teammates know about long reservations
- Release early: Use
canhazgpu release
when done early - Check conflicts: Use
canhazgpu status
before making large reservations
Development Workflow
# Efficient development cycle
canhazgpu reserve --duration 1h # Start small
# ... work for 45 minutes ...
canhazgpu reserve --duration 30m # Extend if needed (after releasing)
# ... finish work ...
canhazgpu release # Clean up immediately
Integration Examples
Shell Scripts
#!/bin/bash
set -e
echo "Reserving GPUs for data processing..."
canhazgpu reserve --gpus 2 --duration 3h
echo "Starting data processing pipeline..."
python preprocess.py
python feature_extraction.py
python model_training.py
echo "Releasing GPUs..."
canhazgpu release
echo "Processing complete!"
Python Integration
import subprocess
import os
def reserve_gpus(count=1, duration="2h"):
"""Reserve GPUs and return allocated GPU IDs"""
result = subprocess.run([
"canhazgpu", "reserve",
"--gpus", str(count),
"--duration", duration
], capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f"GPU reservation failed: {result.stderr}")
# Parse GPU IDs from output
# "Reserved 2 GPU(s): [1, 3] for 2h 0m 0s"
import re
match = re.search(r'Reserved \d+ GPU\(s\): \[([^\]]+)\]', result.stdout)
if match:
gpu_ids = [int(x.strip()) for x in match.group(1).split(',')]
os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(map(str, gpu_ids))
return gpu_ids
return []
def release_gpus():
"""Release all manual reservations"""
subprocess.run(["canhazgpu", "release"], check=True)
# Usage
try:
gpu_ids = reserve_gpus(2, "1h")
print(f"Using GPUs: {gpu_ids}")
# Your GPU work here
import torch
print(f"PyTorch sees {torch.cuda.device_count()} GPUs")
finally:
release_gpus()
Manual reservations provide fine-grained control over GPU allocation, making them perfect for interactive development and planned work sessions.