Real-Time Inference

dw realtime sends a single inference request and streams the response. Useful for quick tests, prototyping prompts, and interactive use.

Basic Usage

dw realtime Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 "Explain batch inference in one paragraph"

The response streams token-by-token to stdout.

System Message

dw realtime Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 "Summarize this text" \
  --system "You are a concise technical writer."

Reading from Stdin

When no prompt is given, dw realtime reads from stdin:

echo "What is 2+2?" | dw realtime Qwen/Qwen3-VL-30B-A3B-Instruct-FP8
cat document.txt | dw realtime Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 --system "Summarize this"

Options

FlagDescription
--system <MSG>Set the system message
--max-tokens <N>Maximum tokens to generate
--temperature <T>Sampling temperature (0.0-2.0)
--no-streamWait for full response instead of streaming
-o, --output-file <FILE>Write response to a file
--usagePrint token usage summary after completion

Non-Streaming Mode

dw realtime Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 "Hello" --no-stream

Waits for the complete response before printing. Useful when you need the full text at once (e.g., for piping to jq).

Token Usage

dw realtime Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 "Hello" --usage

Prints input/output token counts to stderr after the response completes.

Output to File

dw realtime Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 "Write a poem" -o poem.txt