nvidia

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Llama 3.3 Nemotron Super 49B V1.5 is a text-in, text-out model from NVIDIA built on Meta's Llama 3.3 architecture. It accepts up to 131,072 tokens of context and generates up to 16,384 tokens per response. The model supports tool use and reasoning mode, which makes it eligible for agentic workflows and multi-step tasks. Structured output support is not confirmed in available specifications. At $0.40 per million tokens for both input and output, the pricing sits in the budget-to-mid range for models of this class. The comparison problem is straightforward: there is no independent benchmark coverage yet, so relative quality against competitors is unverified. Teams comfortable running their own evals and willing to accept that uncertainty may find the price reasonable for a reasoning-capable, long-context option; those who need established benchmark scores before committing should wait or choose a model with a documented performance record.

Quality Score
91/100
price + capability + benchmarks
Input Price
$0.40
per 1M tokens
Output Price
$0.40
per 1M tokens
Context Window
131,072
tokens
Model ID
nvidia/llama-3.3-nemotron-super-49b-v1.5
Vendor
nvidia
Tokenizer
Llama3
Input Modalities
text
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
text only
Audio
no
Moderated
no

Similar models