LLM Aggregators: Samu Toljamo, Olli Glorioso, Viljami Hakkarainen and David Ramos Our Three-Step Approach: Step 1: Group Similar Innovations using embeddings Generate semantic embeddings from innovation descriptions and titles Use similarity thresholds to identify potential duplicate clusters Scale analysis across thousands of innovation records Step 2: Validate Groups with LLM Azure OpenAI reviews each cluster for false positives Removes incorrectly grouped innovations with detailed reasoning Ensures high precision while maintaining recall Step 3: Aggregate Results with LLM LLM combines information from multiple sources about the same innovation Creates unified innovation profiles preserving all source details Maintains full traceability while consolidating descriptions You will find some visualizations from the project link and in the repo the most interesting file is the main.ipynb file. Video: https://drive.google.com/drive/folders/1ZdlPXga2n17u7B9Z9KeLhf-8IOcioF5p?usp=sharing