Your browser doesn't support javascript. This means that the content or functionality of our website will be limited or unavailable. If you need more information about Vinnova, please contact us.

Bridging the gap between the natural language and structured data sets

Reference number
Coordinator Starcounter AB
Funding from Vinnova SEK 972 734
Project duration September 2023 - May 2024
Status Completed
Venture Emerging technology solutions
Call Emerging technology solutions stage 1 2023

Important results from the project

Starcounter´s team has validated the hypothesis that a modified large language model can translate sentences in natural language into Starcounter´s data model to perform numerical operations, i.e. creating correct database queries without special expertise in how such queries are asked or expertise in the data model used. We see that the results produced are strong indications that this is feasible.

Expected long term effects

The result of the project was a Proof-of-Concept implementation that can produce database queries in Starcounter´s data model from natural language. The quality of the database queries produced made us choose to further develop this technology. The project has given good indications that this innovation can pave the way for changing the way enterprise software can be used and adapted to increasing needs.

Approach and implementation

Starcounter started by creating a diversified dataset and conducted a series of experiments with both external API-based language models and our own locally trained models based on CodeLlama, Llama3 and OpenCodeInterpreter. We applied various modern techniques withinin the LLM-field to achieve efficient and generalized learning of our paradigm. These models were tested to generate structured data representations in our domain-specific language, which map to our data model. The models achieved a high level of precision on our benchmarks and evaluations.

External links

The project description has been provided by the project members themselves and the text has not been looked at by our editors.

Last updated 21 June 2024

Reference number 2023-01407