AI Agents Solo-Built Harness Engineering

AI Data Science Platform

8+ specialised AI agents for end-to-end data science

I built this solo at Fetch.ai. The idea: instead of writing data science code yourself, describe what you want in natural language and let a team of AI agents handle the pipeline. Each agent is specialised. A supervisor agent orchestrates them. Humans stay in the loop at every decision point.

It's live with data scientists at Bosch and Fetch.ai, and non-technical users at HR teams. People who'd never written a line of Python are getting Kaggle-level results in minutes.

This project is fundamentally about harness engineering: designing the environment, constraints, and feedback loops that let AI agents do reliable work. The human's job isn't to write code. It's to specify intent, set architectural boundaries, and build the scaffolding that keeps agents on track.

I built this before the term existed. The problems were the same: agents hallucinate, lose context, violate structure. I had to engineer 10+ guardrails, prompt injection protection, agent logging, enforced determinism, and orchestration patterns from scratch. What OpenAI now calls harness engineering, I was figuring out on my own.

Six specialised agents handle every stage of the pipeline. A LangGraph supervisor decomposes high-level goals into multi-agent workflows, decides which agents to invoke and in what order, and manages state between steps.

๐Ÿ“‚

Data Loader

Ingests CSV files, infers schemas, prepares datasets for downstream agents.

๐Ÿงน

Data Cleaning

Generates Python/pandas code to fix missing values, handle outliers, normalize types.

๐Ÿ“Š

Visualization

Generates interactive Plotly charts from natural language descriptions.

โš™๏ธ

Feature Engineering

Writes scikit-learn transformation code based on statistical properties of the data.

๐Ÿง 

ML Training

Orchestrates H2O AutoML runs and interprets leaderboard results.

๐ŸŽฏ

Prediction

Runs inference on trained models and explains predictions.

The LLM isn't a chat wrapper. It's the execution engine behind every agent. Three modes:

Coding

Writes executable code

cleaning

Python/pandas to fix missing values, handle outliers, normalize types

features

scikit-learn transformation pipelines for encoding, scaling, derived features

viz

Plotly chart specs from natural language, rendered in the browser

Reasoning

Analyses and decides

schema

Detects data quality issues before any code runs

charts

Picks chart types based on distribution, column types, cardinality

models

Interprets AutoML leaderboard and recommends the best model with reasoning

Orchestration

Coordinates agents

supervisor

LangGraph supervisor decomposes goals into multi-agent workflows

routing

Decides which agents to invoke, in what order, manages state

example

"Analyse this sales data" โ†’ load โ†’ clean โ†’ visualise โ†’ engineer โ†’ train

User (Natural Language)
        |
        v
  Next.js Frontend (React 18, Tailwind CSS)
        |
        v
  FastAPI Backend
        |
        v
  LangGraph Supervisor (LLM orchestration)
        |
   +----+----+----+----+----+
   |    |    |    |    |    |
   v    v    v    v    v    v
 Load Clean  Viz  Feat Train Predict
 Agent Agent Agent Agent Agent Agent
   |    |    |    |    |    |
   +----+----+----+----+----+
        |
        v
  LLM โ€” coding + reasoning
        |
  H2O AutoML / MLflow / PostgreSQL / Redis
Frontend

Next.js 14, React 18, TypeScript, Tailwind CSS, Plotly.js

Backend

Python 3.10, FastAPI, LangChain, LangGraph

LLM

OpenAI-compatible API

ML

H2O AutoML, scikit-learn, XGBoost, MLflow

Infrastructure

PostgreSQL, Redis, Celery, Docker (7 services)

A full data science workflow in six steps. From raw CSV to trained model and predictions.

1

Upload

Drop a CSV. The data loader agent ingests it and infers the schema.

2

Clean

The cleaning agent generates Python code that fixes missing values, removes outliers, normalizes types.

3

Visualize

Describe what you want to see. The viz agent generates interactive Plotly charts.

4

Engineer

Features are transformed into ML-ready representations using generated scikit-learn code.

5

Train

H2O AutoML trains and evaluates models. The LLM interprets the leaderboard and recommends the best one.

6

Predict

Run inference on new data. Get predictions and explanations.

code
GitHub Repository ↗

Full source code and documentation