Natural Langauge to SQL Query – AI/ML Engineer Portfolio

Overview

This project is an AI-powered Natural Language to SQL Query Generator designed for a company where non-technical teams, such as marketing and growth, struggled to retrieve data from internal databases. These teams often relied on technical analysts to write SQL queries, causing delays and bottlenecks in decision-making. The tool allows any team member—regardless of technical background—to interact with the database simply by asking questions in plain English. It then generates accurate and executable SQL queries automatically. This significantly reduces the time and dependency involved in data access, empowering non-technical users and freeing up analysts for more strategic work. The project is still under development and being designed to run locally and securely, without exposing internal company databases to external cloud-based AI services.

My Role & Contributions

This project was initiated based on a real-world problem shared by a friend working in a marketing team in India. I took complete ownership of the project, from ideation to implementation. My responsibilities included:

Understanding the business problem and user pain points
Architecting the system to work entirely offline/local for data privacy.
Developing backend APIs to support schema extraction and SQL generation.
Designing the full NLP pipeline and AI agent workflow.
Integrating local language models with search and reasoning components.

Tech Stack

Python FastAPI LangGraph LangChain CodeLlama 7B Cosine Similarity BM25 Ollama Streamlit Docker AWS EC2 LangChain SQLDatabase Agent

Implementation Details

Schema Extraction: An API is used to connect to the company’s internal database and extract the schema. It stores the structure in a simple <TableName>.<ColumnName> format in a YAML file. This helps in understanding what data is available without exposing sensitive details..
Query Analysis with NLP When a user types a query in natural language (e.g., "Show me the number of users who signed up last week"), the system tokenizes the input and uses an AI model to identify the key columns needed to build a SQL query.
Relevant Column Search: Using a hybrid search approach, the system matches user intent with relevant columns from the YAML schema file:
- BM25 is used for exact text match.
- Cosine similarity handles semantic matching for more context-aware results.
SQL Query Generation: The AI model (CodeLlama 7B) uses the identified columns and natural language prompt to generate an initial SQL query.
Query Execution via LangGraph: An agentic workflow is created using LangGraph which:
- Tries to execute the generated SQL query using LangChain’s SQLDatabase agent.
- If the query fails (e.g., syntax or column mismatch), the error is analyzed, and a revised query is generated automatically.
Results and Feedback Loop: If the execution succeeds, the results are returned to the user. If not, the process loops intelligently until a working query is created or it provides a helpful error message.

Results & Impact

While the project is still under development, early testing shows promising results:

Traditional query generation (via analysts) could take 10–15 minutes. This system reduces it to under 3 minutes on average.
Inspired by Uber’s QueryGPT (which showed 18% daily ops productivity gain), this system aims to bring a similar, if not higher, efficiency boost to teams.
Now, non-technical users can explore and analyze data independently without technical bottlenecks.
y using Ollama to run models like CodeLlama locally, companies can keep their sensitive data in-house, solving a major concern with commercial AI tools.

View on GitHub

Natural Lnaguage to SQL query generator

Overview

My Role & Contributions

Tech Stack

Implementation Details

Results & Impact

Live Demo? Not Today!