AmbiGraph-Eval: A Benchmark for Resolving Ambiguity in Graph Query Generation

Semantic parsing converts pure language into formal question languages reminiscent of SQL or Cypher, permitting customers to work together with databases extra intuitively. But, pure language is inherently ambiguous, usually supporting a number of legitimate interpretations, whereas question languages demand exactness. Though ambiguity in tabular queries has been explored, graph databases current a problem attributable to their interconnected buildings. Pure language queries on graph nodes and relationships usually yield a number of interpretations because of the structural richness and variety of graph information. For instance, a question like “greatest evaluated restaurant” might fluctuate relying on whether or not outcomes contemplate particular person scores or mixture scores.

Ambiguities in interactive methods pose critical dangers, as failures in semantic parsing could cause queries to diverge from consumer intent. Such errors might end in pointless information retrieval and computation, losing time and sources. In high-stakes contexts reminiscent of real-time decision-making, these points can degrade efficiency, increase operational prices, and cut back effectiveness. LLM-based semantic parsing exhibits promise in addressing complicated and ambiguous queries by utilizing linguistic data and interactive clarification. Nonetheless, LLMs face a problem of self-preference bias. Skilled on human suggestions, they could undertake annotator preferences, resulting in systematic misalignment with precise consumer intent.

Researchers from Hong Kong Baptist College, the Nationwide College of Singapore, BIFOLD & TU Berlin, and Ant Group current a way to handle ambiguity in graph question era. The idea of ambiguity in graph database queries is developed, categorizing it into three varieties: Attribute, Relationship, and Attribute-Relationship ambiguities. Researchers launched AmbiGraph-Eval, a benchmark containing 560 ambiguous queries and corresponding graph database samples to guage mannequin efficiency. It assessments 9 LLMs, analyzing their skill to resolve ambiguities and figuring out areas for enchancment. The research reveals that reasoning capabilities present a restricted benefit, highlighting the significance of understanding graph ambiguity and mastering question syntax.

The AmbiGraph-Eval benchmark is designed to guage LLMs’ skill to generate syntactically appropriate and semantically acceptable graph queries, reminiscent of Cypher, from ambiguous pure language inputs. Furthermore, the dataset is created in two phases: information assortment and human overview. Ambiguous prompts are obtained by three strategies, together with direct extraction from graph databases, synthesis from unambiguous information utilizing LLMs, and full era by prompting LLMs to create new instances. To guage efficiency, the researchers examined 4 closed-source LLMs (e.g., GPT-4, Claude-3.5-Sonnet) and 4 open-source LLMs (e.g., Qwen-2.5, LLaMA-3.1). Evaluations are carried out by API calls or utilizing 4x NVIDIA A40 GPUs.

The analysis of zero-shot efficiency on the AmbiGraph-Eval benchmark exhibits disparities amongst fashions in resolving graph information ambiguities. In attribute ambiguity duties, O1-mini excels in same-entity (SE) situations, with GPT-4o and LLaMA-3.1 performing effectively. Nonetheless, GPT-4o outperforms others in cross-entity (CE) duties, exhibiting superior reasoning throughout entities. For relationship ambiguity, LLaMA-3.1 leads, whereas GPT-4o exhibits limitations in SE duties however excels in CE duties. Attribute-relationship ambiguity emerges as probably the most difficult, with LLaMA-3.1 performing greatest in SE duties and GPT-4o dominating CE duties. Total, fashions battle extra with multi-dimensional ambiguities in comparison with remoted attribute or relationship ambiguities.

In conclusion, researchers launched AmbiGraph-Eval, a benchmark for evaluating the power of LLMs to resolve ambiguity in graph database queries. Evaluations of 9 fashions reveal important challenges in producing correct Cypher statements, with robust reasoning expertise providing solely restricted advantages. Core challenges embrace recognizing ambiguous intent, producing legitimate syntax, decoding graph buildings, and performing numerical aggregations. Ambiguity detection and syntax era emerged as main bottlenecks hindering efficiency. To deal with these points, future analysis ought to improve fashions’ ambiguity decision and syntax dealing with utilizing strategies like syntax-aware prompting and specific ambiguity signaling.

Take a look at the Technical Paper. Be at liberty to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter.

The publish AmbiGraph-Eval: A Benchmark for Resolving Ambiguity in Graph Query Generation appeared first on MarkTechPost.

AmbiGraph-Eval: A Benchmark for Resolving Ambiguity in Graph Query Generation

A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL

The Ultimate 2025 Guide to Coding LLM Benchmarks and Performance Metrics

ByteDance Unveils ToolTrain: A New Tool-Integrated Reinforcement Learning RL Framework that Redefines Repo Deep Search

Meet LEANN: The Tiniest Vector Database that Democratizes Personal AI with Storage-Efficient Approximate Nearest Neighbor (ANN) Search Index

Tutorial: Exploring SHAP-IQ Visualizations

Building a Modern Async Configuration Management System with Type Safety and Hot Reloading

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!