An increasingly popular method of representing data in a graph structure is the use of knowledge graphs (KGs). A KG is a set of triples (s, p, o), where s (subject) and o (object) are two graph nodes and p is a predicate describing the type of connection that exists between them. KGs are often supported by a schema (such as an ontology) that outlines the key ideas and relationships in the field of study and the constraints that govern how these ideas and relationships can interact. Among the many activities for which KGs are employed, the number of KGs that have become accepted standards for measuring model performance is small.
However, there are some problems with using only these particular mainstream KGs to determine whether the newly proposed models can be generalized. For example, it has been shown that mainstream datasets share statistical properties, particularly homophily, for node classification. Consequently, a set of datasets with comparative statistics is used to evaluate the new models. As a result, their contribution to performance gains is sometimes consistent outside of common benchmark datasets.
Similarly, it has been shown that many existing link estimation datasets suffer from data bias and contain numerous inference patterns that may include predictive models, leading to over-optimistic estimation performance. Consequently, more diverse datasets are needed. In order to test novel models in different data contexts, it is important to give researchers a mechanism to generate hypothetical but realistic datasets of different sizes and properties. In some application sectors, the absence of publicly accessible KGs is worse than relying on a small number of KGs.
Research in fields such as education, law enforcement or medicine is extremely challenging. Data privacy issues can make it impossible to collect and share real-world knowledge. Domain-oriented KGs, therefore, are very few available in these regions. On the other hand, engineers, practitioners and researchers have specific ideas about the characteristics of the problem of their interest. In this situation it would be beneficial to create a synthetic KG that mimics the characteristics of a real KG. Although these two elements have often been treated separately, the aforementioned problems have led to many attempts to create synthetic generators of schemas and KGs.
Domain-neutral KGs can be generated by stochastic-based generators. Regardless of how effective these approaches are for generating huge graphs quickly, the basic idea of data generation must allow for consideration of the underlying structure. Manufactured KGs may not accurately mimic the characteristics of actual KGs in selected application areas. Schema-driven generators, on the other hand, can produce KGs that mirror real-world data. However, to their knowledge, most efforts were focused on creating synthetic KGs using pre-existing schemas. The more difficult challenge of synthesizing schemas and the KGs they support has been considered but has not yet met with success.
They hope to solve this problem through their study. Researchers from the Université de Lorraine and the Université Côte d’Azur have specifically introduced PyGraft, a Python-based tool for creating highly customizable, domain-neutral schemas and KGs. Their work has made the following contributions: To their knowledge, PyGraft is the only generator specifically designed to generate schemas and KGs in new pipelines and is highly adjustable depending on a wide range of user-specified criteria. Interestingly, the generated resources are domain-neutral, making them suitable for benchmarking regardless of the application area. The resulting schemas and KGs are constructed using an extended set of RDFS and OWL elements, and a DL reasoner is used to ensure their logical consistency. It enables elegant resource descriptions and strict adherence to common Semantic Web standards. They release their code publicly with documentation and accompanying examples for ease of use.
check Paper and Github. All credit for this research goes to the researchers in this project. Also, don’t forget to participate Our 30k+ ML SubReddit, 40k+ Facebook community, Discord ChannelAnd Email newsletterWhere we share the latest AI research news, cool AI projects and more.
If you like our work, you will like our newsletter.
Anish Tiku is a Consulting Intern at MarkTechPost. He is currently pursuing a degree in Data Science and Artificial Intelligence from Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and he is passionate about creating solutions around it. He likes connecting with people and collaborating on interesting projects.