An evaluation suite for agentic models in real MCP tool environments (Notion / GitHub / Filesystem / Postgres / Playwright). MCPMark provides a reproducible, extensible benchmark for researchers and ...
Navigation failed because the Playwright MCP backend couldn’t launch (spawn npx ENOENT), so I’ll try the MCP browser installer once to see if it can bootstrap the missing runtime. Output Couldn’t ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results