Knowledge graphs (KGs) represent structured information enabling complex reasoning across heterogeneous sources, but their construction from unstructured data remains labor-intensive. Recent advances in large language models (LLMs) demonstrate strong capabilities in information extraction and semantic understanding, yet converting LLM output into scalable, dynamically updated knowledge graphs while ensuring reliability remains an open challenge.
Core System Development: Design an automated pipeline integrating LLMs for entity/relation extraction with incremental KG updates, connected to interactive timeline and geographic visualizations.
Advanced Analytics: Apply graph theory metrics and graph neural networks to derive higher-level insights from the constructed knowledge graphs.
Comprehensive Evaluation: Benchmark multiple LLMs (GPT-4, LLaMA-3, Jais) across zero-shot, few-shot, and fine-tuned approaches, measuring precision/recall, temporal accuracy, hallucination rates, and cost-performance trade-offs.
Phase 1 (Months 1-2): Literature review and baseline experiments with LLM-based information extraction on curated multimodal datasets.
Phase 2 (Months 3-4): Core system implementation including graph database integration (Neo4j/RDF), incremental update mechanisms, and visual components linking entities to temporal/geographic contexts.
Phase 3 (Month 5): Advanced graph analytics implementation, comprehensive LLM comparison study, and robustness testing with synthetic and real-world noisy data.
Phase 4 (Month 6): Evaluation across intrinsic metrics (extraction accuracy, temporal consistency) and extrinsic measures (downstream task performance), with human expert validation studies.
The resulting system will demonstrate practical applications in crisis monitoring, supply chain analysis, or academic literature mapping while contributing methodological advances to both LLM and knowledge graph communities. The modular design ensures reproducibility and extensibility for future research.