Testing If Bash Is All You Need: SQL vs Bash Agent Comparison
Author: Ankur Goyal, Andrew Qu (Vercel & Braintrust)
Original: Read the full article
Summary
When Vercel claimed agents could work effectively with just filesystem and bash, the natural question was: how does that compare to purpose-built tools like SQL? This empirical study provides the answer—and the most important finding isn't about which tool wins.
The Question
Following Vercel's exploration of filesystem-and-bash agents, a natural follow-up question emerged: if bash works so well for general data exploration, how does it compare to SQL—a language specifically designed for structured data queries?
To find out, Vercel partnered with Braintrust to run a rigorous empirical comparison. The experiment tested different agent tool configurations on a standardized set of data analysis tasks, measuring accuracy, cost, and reliability.
Experiment Setup
The Benchmark
The team designed a set of data analysis tasks that require querying structured data—the kind of work where SQL is traditionally the obvious choice. Tasks ranged from simple lookups ("What was the total revenue in Q3?") to complex aggregations ("Which product category had the highest month-over-month growth rate in the last 6 months?").
Agent Configurations
Three agent configurations were tested:
- Pure SQL Agent: Given access only to a SQL database and a tool to execute SQL queries
- Pure Bash Agent: Given access only to the data as CSV/JSON files and a bash terminal
- Hybrid SQL+Bash Agent: Given access to both SQL and bash, plus a self-verification step
Each agent received the same underlying language model and the same data. The only variable was the tool set and interaction pattern.
Results
Pure SQL Agent: 100% Accuracy
The SQL agent performed flawlessly on structured data tasks. This isn't surprising—SQL is purpose-built for exactly this type of work. The agent could:
- Write precise queries to extract exactly the needed data
- Use JOINs, GROUP BY, and window functions naturally
- Handle complex aggregations in a single query
- Leverage database indexing for efficient execution
SQL's declarative nature aligns perfectly with data analysis: you describe what you want, not how to get it. The model's extensive training on SQL means it writes accurate queries consistently.
Pure Bash Agent: Good but Not Perfect
The bash agent achieved high accuracy but fell short of SQL for structured data tasks. While it could handle many queries correctly using tools like awk, sort, uniq, and jq, certain operations were cumbersome:
- Complex multi-table joins required multi-step pipelines prone to errors
- Aggregations with multiple grouping levels were harder to express
- Edge cases in CSV parsing (quoted fields, escaped commas) occasionally tripped up the shell-based approach
- Some calculations required writing small scripts, adding complexity
The bash agent excelled at tasks involving text search, pattern matching, and exploring data structure—areas where the filesystem approach shines. But for pure structured data analysis, SQL's purpose-built abstractions gave it a clear edge.
Hybrid SQL+Bash with Self-Verification: 100% Accuracy
The most interesting result came from the hybrid configuration. Given access to both SQL and bash, plus an explicit self-verification step, this agent also achieved 100% accuracy.
The self-verification step was key: after generating an answer, the agent was instructed to verify its result using an alternative method. For example:
- Run a SQL query to get the answer
- Export the result
- Use bash to independently verify the calculation
- Compare results before reporting
Or conversely:
- Use bash to explore the data and compute an answer
- Write a SQL query to verify
- Resolve any discrepancies
This cross-checking caught errors that either tool alone might miss.
The Key Insight: Verification Matters More Than Tool Choice
The headline finding isn't "SQL beats bash" or "bash is good enough." It's this: self-verification is more important than which tool you use.
Both the pure SQL agent (100%) and the hybrid agent with verification (100%) achieved perfect accuracy. The hybrid agent achieved this despite using bash for some operations that were less suited to it—because the verification step caught and corrected any errors.
This has profound implications for agent design:
Verification as a First-Class Concern
Most agent frameworks focus on tool selection and orchestration. This experiment shows that the verification step—having the agent check its own work—is at least as important as the tools themselves. A mediocre tool with good verification can match a perfect tool without verification.
The Self-Verification Pattern
The self-verification pattern is straightforward to implement:
1. Agent performs the task using its preferred method
2. Agent independently verifies the result
- Using a different tool or approach
- Using a different query that should produce the same answer
- Using sanity checks (totals should sum, counts should be positive, etc.)
3. If verification fails, agent corrects and re-verifies
4. Agent reports the verified result with confidenceThis pattern applies beyond data analysis. Code agents can run tests after writing code. Documentation agents can verify accuracy against source code. Any agent can benefit from checking its own work before reporting.
Practical Implications
Choosing Tools for Your Agent
The results suggest a practical framework for tool selection:
| Data Type | Best Tool | Rationale |
|---|---|---|
| Structured data (tables, databases) | SQL | Purpose-built, declarative, precise |
| Unstructured text (logs, documents) | Bash | grep, awk, and pipes excel here |
| Mixed data | Hybrid | Use the right tool for each sub-task |
| Any critical task | + Verification | Always verify, regardless of tool |
Don't Over-Engineer Tool Selection
A tempting takeaway might be: always give agents every possible tool. But there's a trade-off. More tools mean more decision complexity for the model. The experiment suggests that a focused set of tools plus verification outperforms a broad set of tools without verification.
If you're building an agent for structured data, give it SQL and add verification. If you're building one for document exploration, give it bash and add verification. Don't build both unless your use case genuinely spans both domains.
Cost Considerations
The bash approach remains significantly cheaper for exploratory tasks:
- SQL requires maintaining a database server
- Bash works directly on files, eliminating infrastructure costs
- For read-heavy exploratory workloads, the filesystem approach uses fewer tokens
- But for complex analytical queries, SQL's single-query efficiency can actually use fewer tokens than multi-step bash pipelines
The optimal choice depends on your specific workload pattern.
Beyond Data Analysis
While this experiment focused on data analysis, the self-verification principle generalizes broadly:
Code Generation
An agent writing code can:
- Write the code
- Run the test suite
- If tests fail, analyze failures and fix
- Re-run tests to verify
This is already standard practice in coding agents—the experiment quantifies why it works.
Content Generation
An agent writing documentation can:
- Generate the documentation
- Verify code examples actually compile and run
- Check that API references match the actual API
- Ensure cross-references are valid
Decision Making
An agent making recommendations can:
- Analyze data and form a recommendation
- Check the recommendation against constraints
- Verify calculations used in the analysis
- Present the recommendation with verification evidence
Key Takeaways
- Match tools to data types: SQL for structured data, bash for unstructured exploration
- Always add self-verification: It's the single most impactful agent design decision
- Verification can compensate for tool limitations: A weaker tool + verification can match a stronger tool alone
- Keep the tool set focused: More tools add complexity; better to master fewer tools with verification
- Cross-tool verification is strongest: Verifying results with a different approach catches systematic errors that same-tool verification might miss
The experiment provides an empirical foundation for a principle many practitioners had intuited: in agent design, the ability to check your work matters as much as the ability to do the work.
