Can AI handle 2003-era toolchains?
Benchmark of LLMs on real open-source projects against dependency hell, legacy toolchains, and complex build systems. Compare top models by success rate, cost or speed.

LLMs can vibe-code and win coding contests, but can they handle real-world software issues like dependency hell, legacy toolchains or weird compile errors?

We gave 19 state-of-the-art LLMs unmodified source code of open-source projects like curl (HTTP client), jq (command-line JSON processor) and tested them on 15 real-world tasks.

The goal is simple: build a working binary from source - but getting there is hard. The toughest challanges include cross-compiling to Windows or ARM64 and resurrecting decade-old code on modern systems. Agents sometimes need 88 commands and 29 minutes to produce a working binary.

CompileBench Success Rate Ranking
# Model pass@1 / pass@2
1 gpt-5-high logo gpt-5-high
83% / 93%
2 gpt-5-mini-high logo gpt-5-mini-high
83% / 87%
3 grok-4 logo grok-4
70% / 87%
4 claude-sonnet-4 logo claude-sonnet-4
80% / 80%
5 claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
80% / 80%
6 claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
57% / 80%
7 grok-code-fast-1 logo grok-code-fast-1
67% / 73%
8 deepseek-v3.1 logo deepseek-v3.1
57% / 73%
9 gpt-4.1 logo gpt-4.1
60% / 67%
10 kimi-k2-0905 logo kimi-k2-0905
57% / 67%
11 gpt-5-minimal logo gpt-5-minimal
50% / 67%
12 qwen3-max logo qwen3-max
43% / 67%
13 gemini-2.5-pro logo gemini-2.5-pro
57% / 60%
14 gemini-2.5-flash logo gemini-2.5-flash
50% / 60%
15 gpt-4.1-mini logo gpt-4.1-mini
47% / 60%
16 gpt-oss-120b-high logo gpt-oss-120b-high
47% / 60%
17 gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
47% / 53%
18 glm-4.5 logo glm-4.5
40% / 53%
19 gpt-5-mini-minimal logo gpt-5-mini-minimal
37% / 47%
pass@1
success within a single attempt
pass@2
success within 2 attempts
arrow pointing to ranking
Tasks

Each task follows a shared structure: we hand an AI the source of an open-source project, a clear build objective, and an interactive Linux terminal. The agent must discover the build system (e.g., Autotools/Make/CMake or custom scripts), decide whether to patch the sources, resolve missing headers and libraries, choose compiler/linker flags (dynamic vs static, glibc vs musl), and verify that the produced binary works.

Difficulty spans quick, modern builds to reviving 2003-era code and producing fully static binaries. Tasks include projects like cowsay, jq, and GNU coreutils (including static and legacy variants); see the per-task pages for details.

Task pass@1 / pass@2
curl-ssl
Build curl 8.16.0 with SSL support (TLS v1.3), brotli, zlib and zstd; autotools setup and library detection.
92% / 100%
coreutils
Build coreutils 9.7; large project with extensive feature detection.
87% / 100%
cowsay
Build cowsay 3.8.4; small legacy build with quirky packaging.
92% / 95%
jq
Build jq 1.8.1; autotools and dependency detection can be tricky.
92% / 95%
coreutils-static-alpine
Produce fully static coreutils 9.7 with a working sha1sum; many binaries, strict static linking.
82% / 95%
curl
Build curl 8.16.0; autotools setup and library detection.
79% / 95%
coreutils-static
Produce fully static coreutils 9.7; many binaries, strict static linking.
76% / 95%
jq-windows2
Compile the jq package for amd64 Windows and install it to /home/peter/result/jq.exe. Make sure it runs correctly via the installed wine; static linking flags, dependency closure, and toolchain differences.
66% / 84%
jq-static
Produce a fully static jq 1.8.1; careful with linker flags and deps.
61% / 79%
coreutils-old-version
Build coreutils 5.0; legacy autotools and modern compiler hurdles.
58% / 74%
jq-static-musl
Produce a musl-linked static jq 1.8.1; toolchain and portability challenges.
39% / 53%
jq-windows
Compile the jq package for amd64 Windows and install it to /home/peter/result/jq.exe. Build it statically; static linking flags, dependency closure, and toolchain differences.
37% / 47%
coreutils-old-version-alpine
Build coreutils 5.0 and surface a working sha1sum; legacy autotools and modern compiler hurdles.
8% / 11%
curl-ssl-arm64-static2
Build curl 8.16.0 with SSL support (TLS v1.3), brotli, zlib and zstd. The binary should be statically compiled for arm64. Do a trial run via qemu-aarch64-static, downloading https://google.com; autotools setup and library detection.
5% / 11%
curl-ssl-arm64-static
Build curl 8.16.0 with SSL support (TLS v1.3), brotli, zlib and zstd. The binary should be statically compiled for arm64; autotools setup and library detection.
3% / 5%
CompileBench Cost Ranking
This section compares models on cost using split-based tables that sum the cheapest N task costs per model to reward breadth and cheap wins.
Cheapest 7 tasks
# Model Sum of cheapest 7 pass@1 / pass@2
1 gpt-5-mini-minimal logo gpt-5-mini-minimal $0.02
37% / 47%
2 grok-code-fast-1 logo grok-code-fast-1 $0.03
67% / 73%
3 gpt-oss-120b-high logo gpt-oss-120b-high $0.04
47% / 60%
4 gpt-4.1-mini logo gpt-4.1-mini $0.04
47% / 60%
5 gemini-2.5-flash logo gemini-2.5-flash $0.06
50% / 60%
6 gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking $0.07
47% / 53%
7 gpt-5-mini-high logo gpt-5-mini-high $0.08
83% / 87%
8 gpt-5-minimal logo gpt-5-minimal $0.13
50% / 67%
9 qwen3-max logo qwen3-max $0.13
43% / 67%
10 deepseek-v3.1 logo deepseek-v3.1 $0.21
57% / 73%
11 gpt-4.1 logo gpt-4.1 $0.23
60% / 67%
12 gemini-2.5-pro logo gemini-2.5-pro $0.27
57% / 60%
13 glm-4.5 logo glm-4.5 $0.28
40% / 53%
14 gpt-5-high logo gpt-5-high $0.49
83% / 93%
15 kimi-k2-0905 logo kimi-k2-0905 $0.54
57% / 67%
16 grok-4 logo grok-4 $0.84
70% / 87%
17 claude-sonnet-4 logo claude-sonnet-4 $0.94
80% / 80%
18 claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k $0.99
80% / 80%
19 claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k $3.65
57% / 80%
CompileBench Speed Ranking
This section compares models on speed using split-based tables that sum the fastest N task times per model to reward breadth and quick wins.
Fastest 7 tasks
# Model Sum of fastest 7 pass@1 / pass@2
1 gpt-4.1 logo gpt-4.1 4m55s
60% / 67%
2 qwen3-max logo qwen3-max 5m18s
43% / 67%
3 gpt-4.1-mini logo gpt-4.1-mini 5m35s
47% / 60%
4 gpt-oss-120b-high logo gpt-oss-120b-high 5m44s
47% / 60%
5 deepseek-v3.1 logo deepseek-v3.1 6m26s
57% / 73%
6 gpt-5-minimal logo gpt-5-minimal 6m37s
50% / 67%
7 gpt-5-mini-minimal logo gpt-5-mini-minimal 6m38s
37% / 47%
8 grok-code-fast-1 logo grok-code-fast-1 7m32s
67% / 73%
9 gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking 7m38s
47% / 53%
10 gemini-2.5-flash logo gemini-2.5-flash 7m44s
50% / 60%
11 kimi-k2-0905 logo kimi-k2-0905 8m44s
57% / 67%
12 gemini-2.5-pro logo gemini-2.5-pro 9m14s
57% / 60%
13 claude-sonnet-4 logo claude-sonnet-4 9m25s
80% / 80%
14 glm-4.5 logo glm-4.5 10m43s
40% / 53%
15 claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k 12m22s
80% / 80%
16 gpt-5-mini-high logo gpt-5-mini-high 12m41s
83% / 87%
17 claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k 13m14s
57% / 80%
18 gpt-5-high logo gpt-5-high 15m38s
83% / 93%
19 grok-4 logo grok-4 17m36s
70% / 87%
Benchmark costs
Across all tasks, the benchmark spent $214.62, sent 13195 LLM requests, and ran for 33h41m38s in total: 18h42m48s of model inference time and 14h15m57s spent in the terminal, executing 12903 commands. “Total” means we added up every attempt across tasks. Per‑task averages and details live on the task pages.
# Model Total cost LLM inference time Command execution time Total time Tokens used
1 gpt-5-mini-minimal logo gpt-5-mini-minimal $0.18 15m58s 20m37s 37m29s 231k
2 grok-code-fast-1 logo grok-code-fast-1 $0.62 55m9s 1h27m37s 2h52m14s 540k
3 gpt-oss-120b-high logo gpt-oss-120b-high $0.93 21m10s 24m2s 46m13s 406k
4 gpt-5-mini-high logo gpt-5-mini-high $1.56 2h11m49s 39m4s 2h51m37s 891k
5 gpt-5-minimal logo gpt-5-minimal $1.90 32m46s 31m9s 1h04m36s 466k
6 gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking $1.99 35m22s 58m54s 1h34m58s 671k
7 gpt-4.1-mini logo gpt-4.1-mini $2.16 22m9s 1h23m26s 1h46m29s 575k
8 glm-4.5 logo glm-4.5 $2.20 34m15s 21m10s 56m7s 328k
9 gemini-2.5-flash logo gemini-2.5-flash $3.06 21m3s 1h07m42s 1h29m36s 688k
10 gpt-5-high logo gpt-5-high $5.67 2h05m46s 25m19s 2h31m46s 751k
11 gemini-2.5-pro logo gemini-2.5-pro $7.36 47m9s 1h00m11s 1h47m59s 558k
12 deepseek-v3.1 logo deepseek-v3.1 $9.22 39m27s 30m17s 1h10m31s 582k
13 gpt-4.1 logo gpt-4.1 $10.78 24m3s 21m48s 46m32s 536k
14 claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k $13.39 1h18m32s 57m1s 2h16m06s 807k
15 claude-sonnet-4 logo claude-sonnet-4 $14.05 1h00m15s 28m21s 1h29m07s 828k
16 kimi-k2-0905 logo kimi-k2-0905 $18.37 1h17m52s 1h06m00s 2h24m14s 590k
17 grok-4 logo grok-4 $32.13 2h37m22s 1h04m59s 3h43m18s 811k
18 qwen3-max logo qwen3-max $38.66 36m13s 44m23s 1h21m45s 723k
19 claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k $50.39 1h46m28s 23m58s 2h11m02s 669k
Total $214.62 18h42m48s 14h15m57s 33h41m38s 11.7M
All attempts
A complete list of every run across models and tasks. Click any row to open the full attempt report with logs, commands, and outputs.
Model Task Status Error
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils Failure exceeded max tool calls (50)
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-old-version Failure exceeded max tool calls (70)
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-old-version Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-old-version-alpine Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-old-version-alpine Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-static Failure exceeded max tool calls (50)
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-static Failure task failed: kill missing at /home/peter/result/kill or not executable
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-static-alpine Failure exceeded max tool calls (50)
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
coreutils-static-alpine Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
cowsay Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
cowsay Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl-ssl Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl-ssl Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl-ssl-arm64-static Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq Failure exceeded max tool calls (50)
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-static Failure exceeded max tool calls (50)
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-static Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-static-musl Failure task failed: jq is not statically linked
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-static-musl Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-windows Failure task failed: jq help does not contain expected string
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-windows Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-windows2 Success -
claude-opus-4.1-thinking-16k logo claude-opus-4.1-thinking-16k
jq-windows2 Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils-old-version Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils-old-version Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils-old-version-alpine Failure exceeded max tool calls (100)
claude-sonnet-4 logo claude-sonnet-4
coreutils-old-version-alpine Failure task failed: df missing at /home/peter/result/df or not executable
claude-sonnet-4 logo claude-sonnet-4
coreutils-static Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils-static Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils-static-alpine Success -
claude-sonnet-4 logo claude-sonnet-4
coreutils-static-alpine Success -
claude-sonnet-4 logo claude-sonnet-4
cowsay Success -
claude-sonnet-4 logo claude-sonnet-4
cowsay Success -
claude-sonnet-4 logo claude-sonnet-4
curl Success -
claude-sonnet-4 logo claude-sonnet-4
curl Success -
claude-sonnet-4 logo claude-sonnet-4
curl-ssl Success -
claude-sonnet-4 logo claude-sonnet-4
curl-ssl Success -
claude-sonnet-4 logo claude-sonnet-4
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
claude-sonnet-4 logo claude-sonnet-4
curl-ssl-arm64-static Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-sonnet-4 logo claude-sonnet-4
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-sonnet-4 logo claude-sonnet-4
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not statically linked
claude-sonnet-4 logo claude-sonnet-4
jq Success -
claude-sonnet-4 logo claude-sonnet-4
jq Success -
claude-sonnet-4 logo claude-sonnet-4
jq-static Success -
claude-sonnet-4 logo claude-sonnet-4
jq-static Success -
claude-sonnet-4 logo claude-sonnet-4
jq-static-musl Success -
claude-sonnet-4 logo claude-sonnet-4
jq-static-musl Success -
claude-sonnet-4 logo claude-sonnet-4
jq-windows Success -
claude-sonnet-4 logo claude-sonnet-4
jq-windows Success -
claude-sonnet-4 logo claude-sonnet-4
jq-windows2 Success -
claude-sonnet-4 logo claude-sonnet-4
jq-windows2 Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-old-version Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-old-version Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-old-version-alpine Failure task failed: df missing at /home/peter/result/df or not executable
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-static Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-static Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-static-alpine Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
coreutils-static-alpine Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
cowsay Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
cowsay Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl-ssl Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl-ssl Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl-ssl-arm64-static Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-static Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-static Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-static-musl Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-static-musl Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-windows Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-windows Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-windows2 Success -
claude-sonnet-4-thinking-16k logo claude-sonnet-4-thinking-16k
jq-windows2 Success -
deepseek-v3.1 logo deepseek-v3.1
coreutils Success -
deepseek-v3.1 logo deepseek-v3.1
coreutils Success -
deepseek-v3.1 logo deepseek-v3.1
coreutils-old-version Failure unexpected end of JSON input
deepseek-v3.1 logo deepseek-v3.1
coreutils-old-version Success -
deepseek-v3.1 logo deepseek-v3.1
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
deepseek-v3.1 logo deepseek-v3.1
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
deepseek-v3.1 logo deepseek-v3.1
coreutils-static Success -
deepseek-v3.1 logo deepseek-v3.1
coreutils-static Failure task failed: uptime missing at /home/peter/result/uptime or not executable
deepseek-v3.1 logo deepseek-v3.1
coreutils-static-alpine Success -
deepseek-v3.1 logo deepseek-v3.1
coreutils-static-alpine Success -
deepseek-v3.1 logo deepseek-v3.1
cowsay Success -
deepseek-v3.1 logo deepseek-v3.1
cowsay Success -
deepseek-v3.1 logo deepseek-v3.1
curl Success -
deepseek-v3.1 logo deepseek-v3.1
curl Success -
deepseek-v3.1 logo deepseek-v3.1
curl-ssl Success -
deepseek-v3.1 logo deepseek-v3.1
curl-ssl Success -
deepseek-v3.1 logo deepseek-v3.1
curl-ssl-arm64-static Failure task failed: curl binary does not exist
deepseek-v3.1 logo deepseek-v3.1
curl-ssl-arm64-static Failure task failed: curl binary does not exist
deepseek-v3.1 logo deepseek-v3.1
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not statically linked
deepseek-v3.1 logo deepseek-v3.1
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: * Protocol "https" not...
deepseek-v3.1 logo deepseek-v3.1
jq Success -
deepseek-v3.1 logo deepseek-v3.1
jq Success -
deepseek-v3.1 logo deepseek-v3.1
jq-static Success -
deepseek-v3.1 logo deepseek-v3.1
jq-static Failure task failed: jq binary does not exist
deepseek-v3.1 logo deepseek-v3.1
jq-static-musl Success -
deepseek-v3.1 logo deepseek-v3.1
jq-static-musl Failure task failed: jq binary does not exist
deepseek-v3.1 logo deepseek-v3.1
jq-windows Failure task failed: jq help does not contain expected string
deepseek-v3.1 logo deepseek-v3.1
jq-windows Success -
deepseek-v3.1 logo deepseek-v3.1
jq-windows2 Failure task failed: jq help does not contain expected string
deepseek-v3.1 logo deepseek-v3.1
jq-windows2 Failure task failed: jq help does not contain expected string
gemini-2.5-flash logo gemini-2.5-flash
coreutils Success -
gemini-2.5-flash logo gemini-2.5-flash
coreutils Success -
gemini-2.5-flash logo gemini-2.5-flash
coreutils-old-version Failure task failed: chroot missing at /home/peter/result/chroot or not executable
gemini-2.5-flash logo gemini-2.5-flash
coreutils-old-version Failure task failed: chroot missing at /home/peter/result/chroot or not executable
gemini-2.5-flash logo gemini-2.5-flash
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gemini-2.5-flash logo gemini-2.5-flash
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gemini-2.5-flash logo gemini-2.5-flash
coreutils-static Success -
gemini-2.5-flash logo gemini-2.5-flash
coreutils-static Success -
gemini-2.5-flash logo gemini-2.5-flash
coreutils-static-alpine Failure task failed: sha1sum binary does not exist
gemini-2.5-flash logo gemini-2.5-flash
coreutils-static-alpine Success -
gemini-2.5-flash logo gemini-2.5-flash
cowsay Success -
gemini-2.5-flash logo gemini-2.5-flash
cowsay Success -
gemini-2.5-flash logo gemini-2.5-flash
curl Success -
gemini-2.5-flash logo gemini-2.5-flash
curl Failure task failed: curl binary does not exist
gemini-2.5-flash logo gemini-2.5-flash
curl-ssl Success -
gemini-2.5-flash logo gemini-2.5-flash
curl-ssl Success -
gemini-2.5-flash logo gemini-2.5-flash
curl-ssl-arm64-static Failure task failed: curl-arm64 is not aarch64 architecture
gemini-2.5-flash logo gemini-2.5-flash
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gemini-2.5-flash logo gemini-2.5-flash
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not aarch64 architecture
gemini-2.5-flash logo gemini-2.5-flash
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gemini-2.5-flash logo gemini-2.5-flash
jq Success -
gemini-2.5-flash logo gemini-2.5-flash
jq Success -
gemini-2.5-flash logo gemini-2.5-flash
jq-static Success -
gemini-2.5-flash logo gemini-2.5-flash
jq-static Failure task failed: jq is not statically linked
gemini-2.5-flash logo gemini-2.5-flash
jq-static-musl Failure task failed: jq is not statically linked
gemini-2.5-flash logo gemini-2.5-flash
jq-static-musl Failure task failed: jq is not statically linked
gemini-2.5-flash logo gemini-2.5-flash
jq-windows Failure task failed: jq help does not contain expected string
gemini-2.5-flash logo gemini-2.5-flash
jq-windows Failure task failed: jq help does not contain expected string
gemini-2.5-flash logo gemini-2.5-flash
jq-windows2 Success -
gemini-2.5-flash logo gemini-2.5-flash
jq-windows2 Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-old-version Failure task failed: seq missing at /home/peter/result/seq or not executable
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-old-version Failure failed to unmarshal shell-harness response: unexpected end of JSON input
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-old-version-alpine Failure exceeded max tool calls (100)
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-static Failure task failed: sha1sum binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-static Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-static-alpine Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
coreutils-static-alpine Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
cowsay Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
cowsay Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl-ssl Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl-ssl Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not aarch64 architecture
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq Success -
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-static Failure task failed: jq is not statically linked
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-static Failure exceeded max tool calls (50)
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-static-musl Failure exceeded max tool calls (50)
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-static-musl Failure task failed: jq binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-windows Failure task failed: jq.exe binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-windows Failure task failed: jq help does not contain expected string
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-windows2 Failure task failed: jq.exe binary does not exist
gemini-2.5-flash-thinking logo gemini-2.5-flash-thinking
jq-windows2 Success -
gemini-2.5-pro logo gemini-2.5-pro
coreutils Success -
gemini-2.5-pro logo gemini-2.5-pro
coreutils Success -
gemini-2.5-pro logo gemini-2.5-pro
coreutils-old-version Failure task failed: sha1sum binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
coreutils-old-version Failure task failed: chroot missing at /home/peter/result/chroot or not executable
gemini-2.5-pro logo gemini-2.5-pro
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
coreutils-static Success -
gemini-2.5-pro logo gemini-2.5-pro
coreutils-static Success -
gemini-2.5-pro logo gemini-2.5-pro
coreutils-static-alpine Success -
gemini-2.5-pro logo gemini-2.5-pro
coreutils-static-alpine Success -
gemini-2.5-pro logo gemini-2.5-pro
cowsay Success -
gemini-2.5-pro logo gemini-2.5-pro
cowsay Success -
gemini-2.5-pro logo gemini-2.5-pro
curl Success -
gemini-2.5-pro logo gemini-2.5-pro
curl Success -
gemini-2.5-pro logo gemini-2.5-pro
curl-ssl Success -
gemini-2.5-pro logo gemini-2.5-pro
curl-ssl Success -
gemini-2.5-pro logo gemini-2.5-pro
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
gemini-2.5-pro logo gemini-2.5-pro
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
gemini-2.5-pro logo gemini-2.5-pro
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
jq Success -
gemini-2.5-pro logo gemini-2.5-pro
jq Success -
gemini-2.5-pro logo gemini-2.5-pro
jq-static Success -
gemini-2.5-pro logo gemini-2.5-pro
jq-static Success -
gemini-2.5-pro logo gemini-2.5-pro
jq-static-musl Failure task failed: jq is not statically linked
gemini-2.5-pro logo gemini-2.5-pro
jq-static-musl Failure task failed: jq binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
jq-windows Failure task failed: jq.exe binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
jq-windows Failure task failed: jq help does not contain expected string
gemini-2.5-pro logo gemini-2.5-pro
jq-windows2 Failure task failed: jq.exe binary does not exist
gemini-2.5-pro logo gemini-2.5-pro
jq-windows2 Success -
glm-4.5 logo glm-4.5
coreutils Success -
glm-4.5 logo glm-4.5
coreutils Success -
glm-4.5 logo glm-4.5
coreutils-old-version Failure task failed: sha1sum binary does not exist
glm-4.5 logo glm-4.5
coreutils-old-version Success -
glm-4.5 logo glm-4.5
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
glm-4.5 logo glm-4.5
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
glm-4.5 logo glm-4.5
coreutils-static Failure task failed: sha1sum binary does not exist
glm-4.5 logo glm-4.5
coreutils-static Success -
glm-4.5 logo glm-4.5
coreutils-static-alpine Success -
glm-4.5 logo glm-4.5
coreutils-static-alpine Success -
glm-4.5 logo glm-4.5
cowsay Success -
glm-4.5 logo glm-4.5
cowsay Success -
glm-4.5 logo glm-4.5
curl Success -
glm-4.5 logo glm-4.5
curl Failure task failed: curl did not download the expected local file content, but instead: curl: (1) Protocol "file" not supported
glm-4.5 logo glm-4.5
curl-ssl Failure task failed: curl binary does not exist
glm-4.5 logo glm-4.5
curl-ssl Success -
glm-4.5 logo glm-4.5
curl-ssl-arm64-static Failure task failed: curl binary does not exist
glm-4.5 logo glm-4.5
curl-ssl-arm64-static Failure task failed: curl binary does not exist
glm-4.5 logo glm-4.5
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
glm-4.5 logo glm-4.5
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
glm-4.5 logo glm-4.5
jq Failure task failed: jq binary does not exist
glm-4.5 logo glm-4.5
jq Failure task failed: jq binary does not exist
glm-4.5 logo glm-4.5
jq-static Success -
glm-4.5 logo glm-4.5
jq-static Success -
glm-4.5 logo glm-4.5
jq-static-musl Failure task failed: jq is not statically linked
glm-4.5 logo glm-4.5
jq-static-musl Failure task failed: jq is not statically linked
glm-4.5 logo glm-4.5
jq-windows Failure task failed: jq help does not contain expected string
glm-4.5 logo glm-4.5
jq-windows Failure task failed: jq.exe binary does not exist
glm-4.5 logo glm-4.5
jq-windows2 Failure task failed: jq.exe binary does not exist
glm-4.5 logo glm-4.5
jq-windows2 Failure task failed: jq help does not contain expected string
gpt-4.1 logo gpt-4.1
coreutils Success -
gpt-4.1 logo gpt-4.1
coreutils Success -
gpt-4.1 logo gpt-4.1
coreutils-old-version Failure task failed: install missing at /home/peter/result/install or not executable
gpt-4.1 logo gpt-4.1
coreutils-old-version Success -
gpt-4.1 logo gpt-4.1
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-4.1 logo gpt-4.1
coreutils-old-version-alpine Failure task failed: df missing at /home/peter/result/df or not executable
gpt-4.1 logo gpt-4.1
coreutils-static Success -
gpt-4.1 logo gpt-4.1
coreutils-static Success -
gpt-4.1 logo gpt-4.1
coreutils-static-alpine Success -
gpt-4.1 logo gpt-4.1
coreutils-static-alpine Success -
gpt-4.1 logo gpt-4.1
cowsay Success -
gpt-4.1 logo gpt-4.1
cowsay Success -
gpt-4.1 logo gpt-4.1
curl Success -
gpt-4.1 logo gpt-4.1
curl Success -
gpt-4.1 logo gpt-4.1
curl-ssl Success -
gpt-4.1 logo gpt-4.1
curl-ssl Success -
gpt-4.1 logo gpt-4.1
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-4.1 logo gpt-4.1
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-4.1 logo gpt-4.1
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gpt-4.1 logo gpt-4.1
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gpt-4.1 logo gpt-4.1
jq Success -
gpt-4.1 logo gpt-4.1
jq Success -
gpt-4.1 logo gpt-4.1
jq-static Success -
gpt-4.1 logo gpt-4.1
jq-static Success -
gpt-4.1 logo gpt-4.1
jq-static-musl Failure task failed: jq is not statically linked
gpt-4.1 logo gpt-4.1
jq-static-musl Failure task failed: jq is not statically linked
gpt-4.1 logo gpt-4.1
jq-windows Failure task failed: jq help does not contain expected string
gpt-4.1 logo gpt-4.1
jq-windows Failure task failed: jq help does not contain expected string
gpt-4.1 logo gpt-4.1
jq-windows2 Failure task failed: jq help does not contain expected string
gpt-4.1 logo gpt-4.1
jq-windows2 Success -
gpt-4.1-mini logo gpt-4.1-mini
coreutils Success -
gpt-4.1-mini logo gpt-4.1-mini
coreutils Failure task failed: sha1sum binary does not exist
gpt-4.1-mini logo gpt-4.1-mini
coreutils-old-version Success -
gpt-4.1-mini logo gpt-4.1-mini
coreutils-old-version Success -
gpt-4.1-mini logo gpt-4.1-mini
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-4.1-mini logo gpt-4.1-mini
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-4.1-mini logo gpt-4.1-mini
coreutils-static Success -
gpt-4.1-mini logo gpt-4.1-mini
coreutils-static Failure task failed: install missing at /home/peter/result/install or not executable
gpt-4.1-mini logo gpt-4.1-mini
coreutils-static-alpine Success -
gpt-4.1-mini logo gpt-4.1-mini
coreutils-static-alpine Success -
gpt-4.1-mini logo gpt-4.1-mini
cowsay Success -
gpt-4.1-mini logo gpt-4.1-mini
cowsay Success -
gpt-4.1-mini logo gpt-4.1-mini
curl Failure exceeded max tool calls (50)
gpt-4.1-mini logo gpt-4.1-mini
curl Success -
gpt-4.1-mini logo gpt-4.1-mini
curl-ssl Success -
gpt-4.1-mini logo gpt-4.1-mini
curl-ssl Success -
gpt-4.1-mini logo gpt-4.1-mini
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-4.1-mini logo gpt-4.1-mini
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-4.1-mini logo gpt-4.1-mini
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gpt-4.1-mini logo gpt-4.1-mini
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gpt-4.1-mini logo gpt-4.1-mini
jq Success -
gpt-4.1-mini logo gpt-4.1-mini
jq Success -
gpt-4.1-mini logo gpt-4.1-mini
jq-static Failure task failed: jq is not statically linked
gpt-4.1-mini logo gpt-4.1-mini
jq-static Failure task failed: jq is not statically linked
gpt-4.1-mini logo gpt-4.1-mini
jq-static-musl Failure task failed: jq is not statically linked
gpt-4.1-mini logo gpt-4.1-mini
jq-static-musl Failure task failed: jq is not statically linked
gpt-4.1-mini logo gpt-4.1-mini
jq-windows Failure task failed: jq help does not contain expected string
gpt-4.1-mini logo gpt-4.1-mini
jq-windows Failure task failed: jq help does not contain expected string
gpt-4.1-mini logo gpt-4.1-mini
jq-windows2 Failure task failed: jq help does not contain expected string
gpt-4.1-mini logo gpt-4.1-mini
jq-windows2 Success -
gpt-5-high logo gpt-5-high
coreutils Success -
gpt-5-high logo gpt-5-high
coreutils Success -
gpt-5-high logo gpt-5-high
coreutils-old-version Success -
gpt-5-high logo gpt-5-high
coreutils-old-version Success -
gpt-5-high logo gpt-5-high
coreutils-old-version-alpine Failure exceeded max tool calls (100)
gpt-5-high logo gpt-5-high
coreutils-old-version-alpine Success -
gpt-5-high logo gpt-5-high
coreutils-static Success -
gpt-5-high logo gpt-5-high
coreutils-static Success -
gpt-5-high logo gpt-5-high
coreutils-static-alpine Success -
gpt-5-high logo gpt-5-high
coreutils-static-alpine Success -
gpt-5-high logo gpt-5-high
cowsay Success -
gpt-5-high logo gpt-5-high
cowsay Success -
gpt-5-high logo gpt-5-high
curl Success -
gpt-5-high logo gpt-5-high
curl Failure task failed: curl did not download the expected local file content, but instead: curl: (1) Protocol "file" not supported
gpt-5-high logo gpt-5-high
curl-ssl Success -
gpt-5-high logo gpt-5-high
curl-ssl Success -
gpt-5-high logo gpt-5-high
curl-ssl-arm64-static Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
gpt-5-high logo gpt-5-high
curl-ssl-arm64-static Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
gpt-5-high logo gpt-5-high
curl-ssl-arm64-static2 Success -
gpt-5-high logo gpt-5-high
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
gpt-5-high logo gpt-5-high
jq Success -
gpt-5-high logo gpt-5-high
jq Success -
gpt-5-high logo gpt-5-high
jq-static Success -
gpt-5-high logo gpt-5-high
jq-static Success -
gpt-5-high logo gpt-5-high
jq-static-musl Success -
gpt-5-high logo gpt-5-high
jq-static-musl Success -
gpt-5-high logo gpt-5-high
jq-windows Success -
gpt-5-high logo gpt-5-high
jq-windows Success -
gpt-5-high logo gpt-5-high
jq-windows2 Success -
gpt-5-high logo gpt-5-high
jq-windows2 Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils-old-version Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils-old-version Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils-old-version-alpine Failure task failed: df missing at /home/peter/result/df or not executable
gpt-5-mini-high logo gpt-5-mini-high
coreutils-old-version-alpine Failure task failed: df missing at /home/peter/result/df or not executable
gpt-5-mini-high logo gpt-5-mini-high
coreutils-static Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils-static Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils-static-alpine Success -
gpt-5-mini-high logo gpt-5-mini-high
coreutils-static-alpine Success -
gpt-5-mini-high logo gpt-5-mini-high
cowsay Success -
gpt-5-mini-high logo gpt-5-mini-high
cowsay Success -
gpt-5-mini-high logo gpt-5-mini-high
curl Success -
gpt-5-mini-high logo gpt-5-mini-high
curl Success -
gpt-5-mini-high logo gpt-5-mini-high
curl-ssl Success -
gpt-5-mini-high logo gpt-5-mini-high
curl-ssl Success -
gpt-5-mini-high logo gpt-5-mini-high
curl-ssl-arm64-static Success -
gpt-5-mini-high logo gpt-5-mini-high
curl-ssl-arm64-static Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
gpt-5-mini-high logo gpt-5-mini-high
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
gpt-5-mini-high logo gpt-5-mini-high
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
gpt-5-mini-high logo gpt-5-mini-high
jq Success -
gpt-5-mini-high logo gpt-5-mini-high
jq Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-static Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-static Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-static-musl Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-static-musl Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-windows Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-windows Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-windows2 Success -
gpt-5-mini-high logo gpt-5-mini-high
jq-windows2 Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-old-version Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-old-version Failure task failed: sha1sum binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-static Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-static Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-static-alpine Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
coreutils-static-alpine Failure task failed: tail missing at /home/peter/result/tail or not executable
gpt-5-mini-minimal logo gpt-5-mini-minimal
cowsay Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
cowsay Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl Failure task failed: curl binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl Failure task failed: curl binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl-ssl Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl-ssl Failure task failed: curl binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq Success -
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-static Failure task failed: jq is not statically linked
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-static Failure task failed: jq binary does not exist
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-static-musl Failure task failed: jq is not statically linked
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-static-musl Failure task failed: jq is not statically linked
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-windows Failure task failed: jq help does not contain expected string
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-windows Failure task failed: jq help does not contain expected string
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-windows2 Failure task failed: jq help does not contain expected string
gpt-5-mini-minimal logo gpt-5-mini-minimal
jq-windows2 Failure task failed: jq help does not contain expected string
gpt-5-minimal logo gpt-5-minimal
coreutils Success -
gpt-5-minimal logo gpt-5-minimal
coreutils Failure task failed: false missing at /home/peter/result/false or not executable
gpt-5-minimal logo gpt-5-minimal
coreutils-old-version Success -
gpt-5-minimal logo gpt-5-minimal
coreutils-old-version Success -
gpt-5-minimal logo gpt-5-minimal
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-5-minimal logo gpt-5-minimal
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-5-minimal logo gpt-5-minimal
coreutils-static Success -
gpt-5-minimal logo gpt-5-minimal
coreutils-static Failure task failed: kill missing at /home/peter/result/kill or not executable
gpt-5-minimal logo gpt-5-minimal
coreutils-static-alpine Failure task failed: kill missing at /home/peter/result/kill or not executable
gpt-5-minimal logo gpt-5-minimal
coreutils-static-alpine Failure task failed: groups missing at /home/peter/result/groups or not executable
gpt-5-minimal logo gpt-5-minimal
cowsay Success -
gpt-5-minimal logo gpt-5-minimal
cowsay Success -
gpt-5-minimal logo gpt-5-minimal
curl Success -
gpt-5-minimal logo gpt-5-minimal
curl Success -
gpt-5-minimal logo gpt-5-minimal
curl-ssl Success -
gpt-5-minimal logo gpt-5-minimal
curl-ssl Success -
gpt-5-minimal logo gpt-5-minimal
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
gpt-5-minimal logo gpt-5-minimal
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
gpt-5-minimal logo gpt-5-minimal
curl-ssl-arm64-static2 Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
gpt-5-minimal logo gpt-5-minimal
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not statically linked
gpt-5-minimal logo gpt-5-minimal
jq Success -
gpt-5-minimal logo gpt-5-minimal
jq Success -
gpt-5-minimal logo gpt-5-minimal
jq-static Failure task failed: jq is not statically linked
gpt-5-minimal logo gpt-5-minimal
jq-static Success -
gpt-5-minimal logo gpt-5-minimal
jq-static-musl Success -
gpt-5-minimal logo gpt-5-minimal
jq-static-musl Failure task failed: jq is not statically linked
gpt-5-minimal logo gpt-5-minimal
jq-windows Failure task failed: jq help does not contain expected string
gpt-5-minimal logo gpt-5-minimal
jq-windows Failure task failed: jq help does not contain expected string
gpt-5-minimal logo gpt-5-minimal
jq-windows2 Failure task failed: jq help does not contain expected string
gpt-5-minimal logo gpt-5-minimal
jq-windows2 Success -
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils Failure task failed: sha1sum binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils Success -
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-old-version Failure task failed: sha1sum binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-old-version Failure task failed: sha1sum binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-static Success -
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-static Success -
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-static-alpine Success -
gpt-oss-120b-high logo gpt-oss-120b-high
coreutils-static-alpine Success -
gpt-oss-120b-high logo gpt-oss-120b-high
cowsay Success -
gpt-oss-120b-high logo gpt-oss-120b-high
cowsay Success -
gpt-oss-120b-high logo gpt-oss-120b-high
curl Success -
gpt-oss-120b-high logo gpt-oss-120b-high
curl Failure task failed: curl did not download the expected local file content, but instead: curl: (1) Protocol "file" not supported
gpt-oss-120b-high logo gpt-oss-120b-high
curl-ssl Success -
gpt-oss-120b-high logo gpt-oss-120b-high
curl-ssl Success -
gpt-oss-120b-high logo gpt-oss-120b-high
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
curl-ssl-arm64-static Failure task failed: curl binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
curl-ssl-arm64-static2 Failure unknown tool:
gpt-oss-120b-high logo gpt-oss-120b-high
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
jq Success -
gpt-oss-120b-high logo gpt-oss-120b-high
jq Success -
gpt-oss-120b-high logo gpt-oss-120b-high
jq-static Success -
gpt-oss-120b-high logo gpt-oss-120b-high
jq-static Failure task failed: jq binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
jq-static-musl Failure task failed: jq is not statically linked
gpt-oss-120b-high logo gpt-oss-120b-high
jq-static-musl Failure task failed: jq binary does not exist
gpt-oss-120b-high logo gpt-oss-120b-high
jq-windows Failure task failed: jq help does not contain expected string
gpt-oss-120b-high logo gpt-oss-120b-high
jq-windows Failure task failed: jq help does not contain expected string
gpt-oss-120b-high logo gpt-oss-120b-high
jq-windows2 Failure unknown tool:
gpt-oss-120b-high logo gpt-oss-120b-high
jq-windows2 Success -
grok-4 logo grok-4
coreutils Success -
grok-4 logo grok-4
coreutils Success -
grok-4 logo grok-4
coreutils-old-version Success -
grok-4 logo grok-4
coreutils-old-version Success -
grok-4 logo grok-4
coreutils-old-version-alpine Failure context deadline exceeded
grok-4 logo grok-4
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
grok-4 logo grok-4
coreutils-static Failure task failed: sha1sum is not statically linked
grok-4 logo grok-4
coreutils-static Success -
grok-4 logo grok-4
coreutils-static-alpine Success -
grok-4 logo grok-4
coreutils-static-alpine Failure task failed: install missing at /home/peter/result/install or not executable
grok-4 logo grok-4
cowsay Success -
grok-4 logo grok-4
cowsay Success -
grok-4 logo grok-4
curl Success -
grok-4 logo grok-4
curl Success -
grok-4 logo grok-4
curl-ssl Success -
grok-4 logo grok-4
curl-ssl Success -
grok-4 logo grok-4
curl-ssl-arm64-static Failure task failed: curl HTTPS request to google.com did not return content-type: text/html but instead: } [2 bytes data] * SSL...
grok-4 logo grok-4
curl-ssl-arm64-static Failure task failed: curl binary does not exist
grok-4 logo grok-4
curl-ssl-arm64-static2 Success -
grok-4 logo grok-4
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not statically linked
grok-4 logo grok-4
jq Success -
grok-4 logo grok-4
jq Success -
grok-4 logo grok-4
jq-static Success -
grok-4 logo grok-4
jq-static Failure task failed: jq binary does not exist
grok-4 logo grok-4
jq-static-musl Failure task failed: jq binary does not exist
grok-4 logo grok-4
jq-static-musl Success -
grok-4 logo grok-4
jq-windows Success -
grok-4 logo grok-4
jq-windows Success -
grok-4 logo grok-4
jq-windows2 Success -
grok-4 logo grok-4
jq-windows2 Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils-old-version Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils-old-version Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
grok-code-fast-1 logo grok-code-fast-1
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
grok-code-fast-1 logo grok-code-fast-1
coreutils-static Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils-static Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils-static-alpine Success -
grok-code-fast-1 logo grok-code-fast-1
coreutils-static-alpine Success -
grok-code-fast-1 logo grok-code-fast-1
cowsay Success -
grok-code-fast-1 logo grok-code-fast-1
cowsay Success -
grok-code-fast-1 logo grok-code-fast-1
curl Success -
grok-code-fast-1 logo grok-code-fast-1
curl Success -
grok-code-fast-1 logo grok-code-fast-1
curl-ssl Success -
grok-code-fast-1 logo grok-code-fast-1
curl-ssl Success -
grok-code-fast-1 logo grok-code-fast-1
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
grok-code-fast-1 logo grok-code-fast-1
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
grok-code-fast-1 logo grok-code-fast-1
curl-ssl-arm64-static2 Failure task failed: curl binary does not exist
grok-code-fast-1 logo grok-code-fast-1
curl-ssl-arm64-static2 Failure failed to unmarshal shell-harness response: unexpected end of JSON input
grok-code-fast-1 logo grok-code-fast-1
jq Success -
grok-code-fast-1 logo grok-code-fast-1
jq Success -
grok-code-fast-1 logo grok-code-fast-1
jq-static Success -
grok-code-fast-1 logo grok-code-fast-1
jq-static Failure task failed: jq is not statically linked
grok-code-fast-1 logo grok-code-fast-1
jq-static-musl Success -
grok-code-fast-1 logo grok-code-fast-1
jq-static-musl Failure task failed: jq is not statically linked
grok-code-fast-1 logo grok-code-fast-1
jq-windows Failure task failed: jq help does not contain expected string
grok-code-fast-1 logo grok-code-fast-1
jq-windows Failure task failed: jq help does not contain expected string
grok-code-fast-1 logo grok-code-fast-1
jq-windows2 Success -
grok-code-fast-1 logo grok-code-fast-1
jq-windows2 Success -
kimi-k2-0905 logo kimi-k2-0905
coreutils Failure task failed: sha1sum binary does not exist
kimi-k2-0905 logo kimi-k2-0905
coreutils Success -
kimi-k2-0905 logo kimi-k2-0905
coreutils-old-version Failure unknown tool: run_system_cmd
kimi-k2-0905 logo kimi-k2-0905
coreutils-old-version Failure exceeded max tool calls (70)
kimi-k2-0905 logo kimi-k2-0905
coreutils-old-version-alpine Failure invalid character '*' in string escape code
kimi-k2-0905 logo kimi-k2-0905
coreutils-old-version-alpine Failure task failed: df missing at /home/peter/result/df or not executable
kimi-k2-0905 logo kimi-k2-0905
coreutils-static Success -
kimi-k2-0905 logo kimi-k2-0905
coreutils-static Success -
kimi-k2-0905 logo kimi-k2-0905
coreutils-static-alpine Success -
kimi-k2-0905 logo kimi-k2-0905
coreutils-static-alpine Success -
kimi-k2-0905 logo kimi-k2-0905
cowsay Failure task failed: Cowsay binary does not exist
kimi-k2-0905 logo kimi-k2-0905
cowsay Success -
kimi-k2-0905 logo kimi-k2-0905
curl Success -
kimi-k2-0905 logo kimi-k2-0905
curl Success -
kimi-k2-0905 logo kimi-k2-0905
curl-ssl Success -
kimi-k2-0905 logo kimi-k2-0905
curl-ssl Success -
kimi-k2-0905 logo kimi-k2-0905
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
kimi-k2-0905 logo kimi-k2-0905
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
kimi-k2-0905 logo kimi-k2-0905
curl-ssl-arm64-static2 Failure invalid character '\\' after object key:value pair
kimi-k2-0905 logo kimi-k2-0905
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not statically linked
kimi-k2-0905 logo kimi-k2-0905
jq Success -
kimi-k2-0905 logo kimi-k2-0905
jq Success -
kimi-k2-0905 logo kimi-k2-0905
jq-static Success -
kimi-k2-0905 logo kimi-k2-0905
jq-static Success -
kimi-k2-0905 logo kimi-k2-0905
jq-static-musl Failure task failed: jq binary does not exist
kimi-k2-0905 logo kimi-k2-0905
jq-static-musl Failure task failed: jq is not statically linked
kimi-k2-0905 logo kimi-k2-0905
jq-windows Failure invalid character '.' in string escape code
kimi-k2-0905 logo kimi-k2-0905
jq-windows Success -
kimi-k2-0905 logo kimi-k2-0905
jq-windows2 Success -
kimi-k2-0905 logo kimi-k2-0905
jq-windows2 Success -
qwen3-max logo qwen3-max
coreutils Success -
qwen3-max logo qwen3-max
coreutils Success -
qwen3-max logo qwen3-max
coreutils-old-version Success -
qwen3-max logo qwen3-max
coreutils-old-version Failure exceeded max cost dollars (max=$3.00, current=3.01)
qwen3-max logo qwen3-max
coreutils-old-version-alpine Failure POST "https://openrouter.ai/api/v1/chat/completions": 400 Bad Request {"message":"Provider returned error","code":400,"m...
qwen3-max logo qwen3-max
coreutils-old-version-alpine Failure task failed: sha1sum binary does not exist
qwen3-max logo qwen3-max
coreutils-static Success -
qwen3-max logo qwen3-max
coreutils-static Failure task failed: install missing at /home/peter/result/install or not executable
qwen3-max logo qwen3-max
coreutils-static-alpine Success -
qwen3-max logo qwen3-max
coreutils-static-alpine Failure LLM call failed: POST "https://openrouter.ai/api/v1/chat/completions": 429 Too Many Requests {"message":"Rate limit exce...
qwen3-max logo qwen3-max
cowsay Failure task failed: Cowsay does not contain expected string (eyes)
qwen3-max logo qwen3-max
cowsay Failure task failed: Cowsay does not contain expected string (eyes)
qwen3-max logo qwen3-max
curl Success -
qwen3-max logo qwen3-max
curl Failure LLM call failed: POST "https://openrouter.ai/api/v1/chat/completions": 429 Too Many Requests {"message":"Rate limit exce...
qwen3-max logo qwen3-max
curl-ssl Success -
qwen3-max logo qwen3-max
curl-ssl Failure LLM call failed: POST "https://openrouter.ai/api/v1/chat/completions": 429 Too Many Requests {"message":"Rate limit exce...
qwen3-max logo qwen3-max
curl-ssl-arm64-static Failure task failed: curl-arm64 is not aarch64 architecture
qwen3-max logo qwen3-max
curl-ssl-arm64-static Failure task failed: curl-arm64 is not statically linked
qwen3-max logo qwen3-max
curl-ssl-arm64-static2 Failure task failed: curl brotli compression test failed - content-encoding: br not found
qwen3-max logo qwen3-max
curl-ssl-arm64-static2 Failure task failed: curl-arm64 is not aarch64 architecture
qwen3-max logo qwen3-max
jq Success -
qwen3-max logo qwen3-max
jq Success -
qwen3-max logo qwen3-max
jq-static Failure task failed: jq is not statically linked
qwen3-max logo qwen3-max
jq-static Failure task failed: jq is not statically linked
qwen3-max logo qwen3-max
jq-static-musl Success -
qwen3-max logo qwen3-max
jq-static-musl Success -
qwen3-max logo qwen3-max
jq-windows Success -
qwen3-max logo qwen3-max
jq-windows Failure LLM call failed: POST "https://openrouter.ai/api/v1/chat/completions": 429 Too Many Requests {"message":"Rate limit exce...
qwen3-max logo qwen3-max
jq-windows2 Success -
qwen3-max logo qwen3-max
jq-windows2 Failure task failed: jq.exe binary does not exist