nvidia

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Llama 3.3 Nemotron Super 49B V1.5 is a text-in, text-out model from NVIDIA built on Meta's Llama 3.3 architecture. It accepts up to 131,072 tokens of context and generates up to 16,384 tokens per response. The model supports tool use and reasoning mode, which makes it eligible for agentic workflows and multi-step tasks. Structured output support is not confirmed in available specifications. At $0.40 per million tokens for both input and output, the pricing sits in the budget-to-mid range for models of this class. The comparison problem is straightforward: there is no independent benchmark coverage yet, so relative quality against competitors is unverified. Teams comfortable running their own evals and willing to accept that uncertainty may find the price reasonable for a reasoning-capable, long-context option; those who need established benchmark scores before committing should wait or choose a model with a documented performance record.

Query via API → View on nvidia → Estimate cost

Quality Score

91/100

price + capability + benchmarks

Input Price

$0.40

per 1M tokens

Output Price

$0.40

per 1M tokens

Context Window

131,072

tokens

Model ID: nvidia/llama-3.3-nemotron-super-49b-v1.5
Vendor: nvidia
Tokenizer: Llama3
Input Modalities: text
Output Modalities: text
Max Output: 16,384 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: ✓ supported
Vision: text only
Audio: no
Moderated: no

Similar models

nvidia

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Similar models

NVIDIA: Nemotron 3 Super (free)

NVIDIA: Nemotron Nano 12B 2 VL (free)

NVIDIA: Nemotron 3 Ultra (free)

NVIDIA: Nemotron 3 Nano 30B A3B (free)

NVIDIA: Nemotron Nano 9B V2 (free)

NVIDIA: Nemotron 3 Ultra