Baidu: ERNIE 4.5 VL 424B A47B
ERNIE 4.5 VL 424B A47B is a multimodal model from Baidu that accepts both text and image inputs and returns up to 16,000 output tokens per request. It supports reasoning and offers a 131,072-token context window, which is suitable for long-document tasks. It does not support tool use, and structured output support is unconfirmed, so workflows that depend on either feature should look elsewhere. At $0.42 per million input tokens and $1.25 per million output tokens, the pricing is competitive for a large multimodal model, but there is currently no independent benchmark coverage to validate its performance claims. Buyers who need a vision-capable model with a long context and are comfortable working from Baidu's own documentation may find it worth testing, particularly for cost-sensitive workloads. Anyone requiring third-party performance data before committing should wait until independent evaluations are available.
- Model ID
- baidu/ernie-4.5-vl-424b-a47b
- Vendor
- baidu
- Tokenizer
- Other
- Input Modalities
- image, text
- Output Modalities
- text
- Max Output
- 16,000 tokens
- Tool Calling
- not supported
- Structured Output
- not supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no