About the Project
We approached theBlue.ai GmbH with the task of extracting relevant information from emails to help us automate our daily processes. The solution was supposed to work with texts written both in German and English and to extract relevant information specified by us and extract it in the structured format for further processing.
Initially, theBlue.ai conducted a proof of concept to demonstrate the feasibility of automating the information extraction with Generative AI. Following our feedback and acceptance of the results, they proceeded to develop a production version of the solution to address our specific needs.
Challenges
The main challenge was dealing with unstructured text that lacked any predefined format, making it difficult to detect the specified information accurately. Due to our global operations another challenge was connected with various formats and languages of the texts which were connected with many local standards for the countries around the world. Also, not always all of the determined information was present in the provided texts. All of these challenges caused that the traditional approaches were not good enough. Ensuring high accuracy and efficiency in the extraction process was crucial to meet the project's goals.
Solution
To tackle the challenges, theBlue.ai leveraged the power of GPT-3.5 and GPT-4 models for information extraction. Careful prompt engineering and testing with different models significantly improved accuracy. They integrated the solution using FastAPI and encapsulated the code within a Docker container, enabling seamless deployment on our servers. Now, we can easily send requests to the API and receive the needed information in the specified format.