Publication Overview
The orginal papers can be found here:
- DistSim: Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs
- Literal2Feature: An Automatic Scalable RDF Graph Feature Extractor
- DistRDF2ML: Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs
- DistAD in Progress
- SimE4KG - in Progress
Publication Specific Documentation
For recently published papers, we provide here a fast entry point to the provided modules. All of those modules are as well accessible over the standard documentation
SimE4KG: Explainable Distributed multi-modal Semantic Similarity Estimation for Knowledge Graphs
This framework includes all of the most recent developments for the SimE4KG framework. SimE4KG is the Explainable Distributed In-Memory multi-modal Semantic Similarity Estimation for Knowledge Graphs.
Overview
In this release we introduce multiple changes to the Sansa Stack to offer the SimE4KG functionalities The content is structured as follows:
- Release
- Databricks Notebooks
- ReadMe of novel Modules
- Novel Classes
- Unit Tests
- Data Sets
- Further Reading
Release
The changes are made available within this release here
SimE4KG Databricks Notebook
To showcase in a hands on session the usage of SimE4KG modules, we introduce multiple Databricks Notebooks. Those show the Full pipeline but also dedicated parts like the SmartFeature Extractor. Within the notebooks you can see the mixture of Explanations, Sample code and the output of the code snippets. With the Notebooks you can reproduce the functionality within you browser without a need to install the Framework locally. The Notebooks can be found here:
- SimE4KG Databricks Notebook for sample pipeline building including outputs
- SmartFeatureExtractor Databricks Notebook for multi modal feature extraction with the novel Smart Feature Extrator
Use Cases
We developed some sample pipelines implementing use cases which are besed on the downstream pipelines of SimE4KG like:
- SimE4KG Recommendation (in progress)
- SimE4KG Classification (in progress)
- SimE4KG Entity Alignment (in progress)
ReadMe
The novel modules of SimE4KG are documented within the SANSA ML ReadMe. For quick links especially to the high level SimE4KG Transformer and the SmartFeatureExtractor, you can use these two links:
- SimE4KG/Dasim Transformer ReadMe which is the high leveled Similarity Estimation transformer calling entire pipeline
- SmartFeatureExtractor ReadMe which is the novel developed generic multi modal feature extractor transformer
Novel Classes
Novel Classes developed within this release are especially the Dasim Transformer and the SmartFeature extractor but also the corresponding unit test as well as the Evaluation scripts to test module performance:
- DasimTransformer Class Unit Test
- Smart Feature Extractor Class Unit Test
- Evaluation Classes like data size scalability, feature availability evaluation, Smartfeature extractor evaluation and many more …
Datasets
As starting point to play around with the developments of this framework, we recommend the Linked Movie Data Base RDF Knowledge Graph. This KG represents in millions of triples data about movies and consists of multi modal features like lists of URIs as the lists of actors, numeric features like the runtime but also timestamp data like the release date. For purposes of Unit test, we propose also an extract of this data which follow the same schema.
Further Reading
If you are interested into further reading and background information of other related modules we recommend the following papers:
- Distributed semantic analytics using the SANSA stack
- Sparklify: A Scalable Software Component for Efficient Evaluation of SPARQL Queries over Distributed RDF Datasets
- DistSim - Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs
- DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs
Other
- In addition, we provide the full jar of this version below
DistRDF2ML
Release
The changes are made available within this release here
Docs
The documentation with sample code snippets are available within the SANSA ML Readme which include:
- Literal2Feature - AutoSparql Generation for Feature Extraction
- SparqlFrame Feature Extractor
- Smart Vector Assembler
- ML2Graph
- DistAD
Code to Modules:
This release majorly provides the modules:
DistSim ICSC Paper Documentation
the documentation in docs are available here the respective similarity estimation models are in this github directory and further needed utils are here
Code to Modules:
- Feature Extractor
- Similarity Estiamtion Modules
- Minmial Examples
- Evaluation of Experiment Class
- Metagraph Creation
DistAD ICSC Paper Documentation
The documentation in docs are available here. The modules are in this github directory.