I had a great conversation with a founder who is changing the existing landscape of ERP, a field I have love/hate/appreciation relatonship with.
This framework is an extention to the conversation, of what I think the experience would be when the new wave of ERPs are combined with AI/LLM and alike.
The Magic Experience
During or prior to the sales call, we ask the customer to send a bunch of files that they rely on to run their business.
We start to introduce AERP with a live demo, but the demo is tailored to the customer's process. Customer freaked out, couldn't believe we built their EPR this quickly, and want to buy this right away.
What actually happened is we uploaded their files to our magical system, which output system configuration files (json or yaml) with test data that mimic their business.
How can we make this magical system?
Technical Intro
The goal is to build a system to automatically generate an AERP-Specific Entity-Relationship (ER) diagram (for an ERP-like dataset) from multiple data files involves combining data parsing, relationship inference, and graph visualization.
The output is a JSON-based ER diagram / taxonomy with tables and their relationships. It will use a combination of rule-based heuristics (aka. instruction following / prompt engineering) and AI to infer likely primary key/foreign key links between tables, with an interface for users to review and refine these suggestions.
1. System Architecture & Data Flow
The system consists of an ingestion layer, a processing engine for relationship detection, a graph database for storage, and an output/visualization module. The components work together to transform raw uploaded data into a connected graph of tables and relationships, which can then be output as JSON and visualized.
Components
- File Ingestion: Users upload Excel workbooks (multi-tab), Google Sheets (CSV format), and standalone CSV files via a web interface. The system parses these files into structured table objects.
- Data Structuring: The ingestion layer extracts metadata (table names, columns, data types) and computes basic statistics (unique values, null counts, etc.).
- Processing & Relationship Inference: The system identifies:
- Primary keys using uniqueness heuristics.
- Foreign keys by matching column names, data overlap, and type consistency.
- Graph Database Storage: Tables and relationships are stored in Neo4j as a graph.
- User Review & Refinement: Users can validate or adjust inferred relationships via an interactive UI.
- Output Generation: The final ER diagram is exported as structured JSON and displayed graphically.
2. AI & Heuristics Methodology
Heuristics for Relationship Inference
- Primary Key Detection: Based on uniqueness, column naming conventions (e.g.,
id
), and position in the table. - Foreign Key Detection:
- Column Name Matching: Similar names suggest relationships (
customer_id
in Orders linking toid
in Customers). - Value Overlap Analysis: If a column’s values are mostly contained within another table’s primary key column, a relationship is inferred.
- Type Consistency: Ensuring that candidate foreign keys match the data type of referenced primary keys.
- Column Name Matching: Similar names suggest relationships (
- Graph Analysis: Relationships are stored in Neo4j for efficient querying and validation.
AI-Powered Validation
- Machine Learning Model: Trains on known schemas to predict relationships.
- NLP & Contextual AI: Uses language models to interpret table/column names and suggest missing links.
- Human-in-the-Loop Refinement: Users review AI-generated relationships and adjust as needed.
3. JSON Specification for ER Diagram Output (utimately to match AERP's taxonomy)
{
"tables": [
{
"name": "Customers",
"columns": [
{ "name": "id", "type": "INTEGER", "primaryKey": true },
{ "name": "name", "type": "TEXT", "primaryKey": false }
]
},
{
"name": "Orders",
"columns": [
{ "name": "order_id", "type": "INTEGER", "primaryKey": true },
{ "name": "customer_id", "type": "INTEGER", "primaryKey": false }
]
}
],
"relationships": [
{
"fromTable": "Orders",
"fromColumn": "customer_id",
"toTable": "Customers",
"toColumn": "id",
"type": "Many-to-One"
}
]
}
4. Implementation Roadmap
MVP (Minimum Viable Product)
- Core File Ingestion & Parsing:
- Use ExcelJS for Excel, csv-parse for CSV.
- Process Google Sheets manually via CSV export (later via API integration).
- Basic Relationship Detection Heuristics:
- Identify primary keys (uniqueness-based, naming conventions).
- Detect foreign keys via column name matching and data overlap.
- JSON Output Generation:
- Format detected relationships in structured JSON.
- Basic Visualization (Prototype):
- Use D3.js or vis.js for rendering an interactive ER diagram.
- User Review & Refinement:
- Allow users to manually edit relationships (e.g., via a basic confirmation step).