Context efficiencyWhere tokens go
Parses 15 languages into skeletons: imports, type defs, function signatures with their line ranges. Adds 59 tok/turn but saves 224 tok/turn on reads, netting 165 tok/turn saved. My usage indicated reads are ~65% of all tokens, so this optimization is big.
A sandboxed Python interpreter with memory and time limits. All tools are exposed as async functions, so the model can asyncio.gather() a bunch of reads, grep the results, and only return what matters. Intermediate data never reaches your context.
The model picks weak, medium, or strong for each subagent. Haiku-tier for grep-heavy research, opus-tier for architecture. Subagents can be read-only or have full tool access.
The system prompt, tool descriptions, and examples are short. When context gets too long, maki compacts history automatically: strips images, thinking blocks, and summarizes older turns.
User experienceWhat you get
Native binary. No javascript runtime, no react. Even the splash screen animation uses SIMD. Syntax highlighting runs on a background thread pool so it never blocks your input. Fits well on small laptop screens.
Philosophy: don't hide anything. Token count, cost, and model are always in the status bar. Each subagent gets its own chat window you can flip through with Ctrl-N/P. Ctrl-F for fuzzy search. /btw runs a side query without touching the current session. ! runs shell commands, !! runs them silently.
Bash commands are parsed with tree-sitter so maki knows what's actually being run. git diff && rm -rf / correctly flags both git and rm. Most agents only see git. Handles subshells, command substitution, pipes. Per-tool allow/deny rules, or --yolo to skip it all. SSRF protection on webfetch.
Long-term memory that persists across sessions. Tell maki to remember something, somtimes it picks things up on its own. Double-Escape to rewind. Plan mode restricts the agent to read-only. MCP servers over stdio or HTTP. Skills. 26 themes. Paste images. --print for headless (output is Claude Code-compatible).
index: read less, know more
Instead of reading full files, index parses with tree-sitter and returns a compact skeleton. The model sees the structure, then reads only the lines it needs.
use std::fs; use clap::Parser; use color_eyre::Result; #[derive(Parser)] struct Args { paths: Vec<PathBuf>, #[arg(short, long)] lines: bool, } fn count_words(text: &str) -> usize { text.split_whitespace().count() } fn count_lines(text: &str) -> usize { text.lines().count() } fn main() -> Result<()> { let args = Args::parse(); for path in &args.paths { let text = fs::read_to_string(path)?; let n = if args.lines { count_lines(&text) } else { count_words(&text) }; println!("{}: {n}", path.display()); } Ok(()) }
imports: [1-3] clap::Parser, color_eyre::Result, std::fs types: #[derive(Parser)] struct Args [5-9] paths: Vec<PathBuf> lines: bool fns: count_words(text: &str) -> usize [11-13] count_lines(text: &str) -> usize [15-17] main() -> Result<()> [19-29]
code_execution: think inside the sandbox
Tools are exposed as async Python functions. The model writes a script, runs it sandboxed, and only the print() output enters your context.
cfg, meta = await asyncio.gather( read(path='Cargo.toml'), bash(command='cargo metadata') ) deps = json.loads(meta) used = set(re.findall( r'use (\w+)::', cfg )) declared = { p['name'] for p in deps['packages'] } stale = declared - used for s in sorted(stale): print(s)
base64 hmac once_cell