- Updated CodeExecutionTool to use 'node:alpine' Docker image for Node.js execution. - Corrected syntax for Docker image variable in codeExecution.tool.ts. - Implemented manual parsing of Docker stream headers for clean output. - Marked CodeExecutionTool testing as complete in SPEC.md.
4.3 KiB
4.3 KiB
Project: Self-Hosted, CLI-Based Coding Agent
This document outlines the specification for building a self-hosted, CLI-based coding agent that can autonomously create, test, and refine code scripts.
1. Architectural Design & Workflow (LangGraph Graph)
The core of the agent will be a LangGraph graph that defines the agent's state and logic flow.
-
Nodes:
[ ] plan_task: This node will receive the user's natural language input and create a formal, executable plan. This plan will be a sequence of steps for the agent to follow.[ ] generate_code: This node will take the plan and generate the initial code script.[ ] execute_and_test: This node will execute the generated code in a sandboxed environment and run tests against it. It will capture the output, errors, and test results.[ ] analyze_results: This node will analyze the results from theexecute_and_testnode to determine if the code meets the requirements of the plan.[ ] refine_code: If theanalyze_resultsnode determines that the code is not yet correct, this node will be responsible for debugging the code and generating a refined version.
-
Edges:
[ ]The graph will start at theplan_tasknode.[ ]Fromplan_task, the graph will proceed togenerate_code.[ ]Fromgenerate_code, the graph will proceed toexecute_and_test.[ ]Fromexecute_and_test, the graph will proceed toanalyze_results.[ ]Theanalyze_resultsnode will have a conditional edge.- If the code is correct, the graph will terminate and output the final code.
- If the code is incorrect, the graph will loop back to the
refine_codenode, which will then pass the refined plan to thegenerate_codenode to start the loop again.
2. Tooling Strategy
The agent will be equipped with the following tools:
[ ]CodeExecutionTool: A tool for safely executing the generated code in a sandboxed environment. This will likely involve using Docker or a similar containerization technology.[ ]FileManagementTool: A set of tools for reading and writing files to the local filesystem. This will be necessary for the agent to create, modify, and save the code scripts it is working on.[ ]TestRunnerTool: A tool for running specific test cases against the generated code. This will be used to verify the correctness of the code.
3. Technology Stack
- Language: TypeScript
- Framework: LangChain.js with LangGraph
- CLI Framework: Commander.js
- Local LLM Runner: Ollama
- Code Execution Sandbox: Docker
- Testing: Jest
4. Development Phases & Milestones
The project will be developed in the following phases:
-
Phase 1: Foundation & Tooling
[x]Set up the local development environment.[x]Download and set up the Phi4-mini model.- Running locally via Ollama on port 11434
[x]Implement theCodeExecutionTool.- Test pending restart
[x]Test the implmentation of theCodeExecutionTool.
[ ]Implement theFileManagementTool.[ ]Implement theTestRunnerTool.
-
Phase 2: Implementing the LangGraph Workflow
[ ]Implement theplan_tasknode.[ ]Implement thegenerate_codenode.[ ]Implement theexecute_and_testnode.[ ]Implement theanalyze_resultsnode.[ ]Implement therefine_codenode.[ ]Connect the nodes and implement the conditional looping logic.
-
Phase 3: CLI & Error Handling
[ ]Create a command-line interface (CLI) for interacting with the agent.[ ]Implement robust error handling throughout the system.[ ]Implement persistence for the LangGraph state, so that the agent can be stopped and restarted without losing its progress.
5. Final Deliverables
[ ]A detailed markdown document (SPEC.md) that can be used as a blueprint for development.[ ]The source code for the self-hosted, CLI-based coding agent.[ ]AREADME.mdfile with instructions on how to set up and run the agent.
Additional Notes for Astra/Inanis
- A collection of AI Prompts from various vendors