Llama 4 Maverick Instruct

meta-llama/llama-4-maverick-17b-128e-instruct-fp8

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400 billion total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million-token context window. It was trained on a curated mix of public, licensed, and Meta-platform data, covering approximately 22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

Price

Input	$0.17/百万 tokens
Output	$0.85/百万 tokens

Use the following code example to integrate our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.highwayapi.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="meta-llama/llama-4-maverick-17b-128e-instruct-fp8",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=1048576,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Information

Provider

Llama

Quantification

fp8

Supported features

Context length

1048576

Maximum output

1048576

Function call

Support

Input Capabilities

text

Output Capabilities

text