Deploy Aspose.LLM for .NET on NVIDIA GPUs — select CUDA, offload all layers, configure multi-GPU split, and size VRAM for model + KV cache....with multiple GPUs where you only want one for inference: preset...attention and KV dtype. CPU-only deployment Bring your own GGUF...