Files

Spencer 2ed33595d1 feat: Fix and test CodeExecutionTool

- Updated CodeExecutionTool to use 'node:alpine' Docker image for Node.js execution.
- Corrected syntax for Docker image variable in codeExecution.tool.ts.
- Implemented manual parsing of Docker stream headers for clean output.
- Marked CodeExecutionTool testing as complete in SPEC.md.

2025-08-24 15:37:17 -05:00

4.3 KiB

Raw Blame History

Project: Self-Hosted, CLI-Based Coding Agent

This document outlines the specification for building a self-hosted, CLI-based coding agent that can autonomously create, test, and refine code scripts.

1. Architectural Design & Workflow (LangGraph Graph)

The core of the agent will be a LangGraph graph that defines the agent's state and logic flow.

Nodes:
- [ ] plan_task: This node will receive the user's natural language input and create a formal, executable plan. This plan will be a sequence of steps for the agent to follow.
- [ ] generate_code: This node will take the plan and generate the initial code script.
- [ ] execute_and_test: This node will execute the generated code in a sandboxed environment and run tests against it. It will capture the output, errors, and test results.
- [ ] analyze_results: This node will analyze the results from the execute_and_test node to determine if the code meets the requirements of the plan.
- [ ] refine_code: If the analyze_results node determines that the code is not yet correct, this node will be responsible for debugging the code and generating a refined version.
Edges:
- [ ] The graph will start at the plan_task node.
- [ ] From plan_task, the graph will proceed to generate_code.
- [ ] From generate_code, the graph will proceed to execute_and_test.
- [ ] From execute_and_test, the graph will proceed to analyze_results.
- [ ] The analyze_results node will have a conditional edge.
  - If the code is correct, the graph will terminate and output the final code.
  - If the code is incorrect, the graph will loop back to the refine_code node, which will then pass the refined plan to the generate_code node to start the loop again.

2. Tooling Strategy

The agent will be equipped with the following tools:

[ ] CodeExecutionTool: A tool for safely executing the generated code in a sandboxed environment. This will likely involve using Docker or a similar containerization technology.
[ ] FileManagementTool: A set of tools for reading and writing files to the local filesystem. This will be necessary for the agent to create, modify, and save the code scripts it is working on.
[ ] TestRunnerTool: A tool for running specific test cases against the generated code. This will be used to verify the correctness of the code.

3. Technology Stack

Language: TypeScript
Framework: LangChain.js with LangGraph
CLI Framework: Commander.js
Local LLM Runner: Ollama
Code Execution Sandbox: Docker
Testing: Jest

4. Development Phases & Milestones

The project will be developed in the following phases:

Phase 1: Foundation & Tooling
- [x] Set up the local development environment.
- [x] Download and set up the Phi4-mini model.
  - Running locally via Ollama on port 11434
- [x] Implement the CodeExecutionTool.
  - Test pending restart
  - [x] Test the implmentation of the CodeExecutionTool.
- [ ] Implement the FileManagementTool.
- [ ] Implement the TestRunnerTool.
Phase 2: Implementing the LangGraph Workflow
- [ ] Implement the plan_task node.
- [ ] Implement the generate_code node.
- [ ] Implement the execute_and_test node.
- [ ] Implement the analyze_results node.
- [ ] Implement the refine_code node.
- [ ] Connect the nodes and implement the conditional looping logic.
Phase 3: CLI & Error Handling
- [ ] Create a command-line interface (CLI) for interacting with the agent.
- [ ] Implement robust error handling throughout the system.
- [ ] Implement persistence for the LangGraph state, so that the agent can be stopped and restarted without losing its progress.

5. Final Deliverables

[ ] A detailed markdown document (SPEC.md) that can be used as a blueprint for development.
[ ] The source code for the self-hosted, CLI-based coding agent.
[ ] A README.md file with instructions on how to set up and run the agent.

Additional Notes for Astra/Inanis

A collection of AI Prompts from various vendors
- system-prompts-and-models-of-ai-tools

4.3 KiB Raw Blame History