Team members: Bhabishya Gurung (me), Sakhi Hashmat Khalil, Kiran Thapalia\nOverview\nThis project addresses the challenge of identifying and aggregating duplicate innovations described by different organizations. It was developed as a hackathon submission for VTT, focusing on semantic AI and large language models (LLMs) to resolve ambiguity and unify innovation records.\n\nApproach\nData Integration: Merged structured innovation relationship data from company websites and VTT domain pages.\nFeature Extraction: For each innovation, extracted textual descriptions, full source documents, organization names, and source URLs.\nSemantic Similarity: Used AI-based semantic similarity (likely leveraging embeddings and LLMs) to detect potential duplicates by comparing innovation descriptions.\nClustering & Aggregation: Grouped similar innovations into clusters and generated unified summaries for each cluster, ensuring source and contributor information is preserved[1].\nTechnologies Used\nPython (Jupyter Notebook)\nSemantic AI (embeddings, LLMs)\nData processing with pandas
@bhabej
owner