NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
Llama 3.3 Nemotron Super 49B V1.5 is a text-in, text-out model from NVIDIA built on Meta's Llama 3.3 architecture. It accepts up to 131,072 tokens of context and generates up to 16,384 tokens per response. The model supports tool use and reasoning mode, which makes it eligible for agentic workflows and multi-step tasks. Structured output support is not confirmed in available specifications. At $0.40 per million tokens for both input and output, the pricing sits in the budget-to-mid range for models of this class. The comparison problem is straightforward: there is no independent benchmark coverage yet, so relative quality against competitors is unverified. Teams comfortable running their own evals and willing to accept that uncertainty may find the price reasonable for a reasoning-capable, long-context option; those who need established benchmark scores before committing should wait or choose a model with a documented performance record.
- Model ID
- nvidia/llama-3.3-nemotron-super-49b-v1.5
- Vendor
- nvidia
- Tokenizer
- Llama3
- Input Modalities
- text
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- text only
- Audio
- no
- Moderated
- no