baidu

Baidu: ERNIE 4.5 VL 424B A47B

ERNIE 4.5 VL 424B A47B is a multimodal model from Baidu that accepts both text and image inputs and returns up to 16,000 output tokens per request. It supports reasoning and offers a 131,072-token context window, which is suitable for long-document tasks. It does not support tool use, and structured output support is unconfirmed, so workflows that depend on either feature should look elsewhere. At $0.42 per million input tokens and $1.25 per million output tokens, the pricing is competitive for a large multimodal model, but there is currently no independent benchmark coverage to validate its performance claims. Buyers who need a vision-capable model with a long context and are comfortable working from Baidu's own documentation may find it worth testing, particularly for cost-sensitive workloads. Anyone requiring third-party performance data before committing should wait until independent evaluations are available.

Quality Score
80/100
price + capability + benchmarks
Input Price
$0.42
per 1M tokens
Output Price
$1.25
per 1M tokens
Context Window
131,072
tokens
Model ID
baidu/ernie-4.5-vl-424b-a47b
Vendor
baidu
Tokenizer
Other
Input Modalities
image, text
Output Modalities
text
Max Output
16,000 tokens
Tool Calling
not supported
Structured Output
not supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no