Bridging the gap between the natural language and structured data sets
Reference number | |
Coordinator | Starcounter AB |
Funding from Vinnova | SEK 972 734 |
Project duration | September 2023 - May 2024 |
Status | Completed |
Venture | Emerging technology solutions |
Call | Emerging technology solutions stage 1 2023 |
Important results from the project
Starcounter´s team has validated the hypothesis that a modified large language model can translate sentences in natural language into Starcounter´s data model to perform numerical operations, i.e. creating correct database queries without special expertise in how such queries are asked or expertise in the data model used. We see that the results produced are strong indications that this is feasible.
Expected long term effects
The result of the project was a Proof-of-Concept implementation that can produce database queries in Starcounter´s data model from natural language. The quality of the database queries produced made us choose to further develop this technology. The project has given good indications that this innovation can pave the way for changing the way enterprise software can be used and adapted to increasing needs.
Approach and implementation
Starcounter started by creating a diversified dataset and conducted a series of experiments with both external API-based language models and our own locally trained models based on CodeLlama, Llama3 and OpenCodeInterpreter. We applied various modern techniques withinin the LLM-field to achieve efficient and generalized learning of our paradigm. These models were tested to generate structured data representations in our domain-specific language, which map to our data model. The models achieved a high level of precision on our benchmarks and evaluations.