Documentation
This module provides the scala doc documentation. The module tree is accessible on the right-hand side. We also provide ReadMe files in the SANSA-Stack Github project. The Github project is available here.
Layers
The SANSA project is structured in the following five layers developed in their respective sub-folders:
- RDF readme, package
- OWL readme, package
- Query readme, package
- Inference readme, package
- Machine Learning readme, package
Paper Specific Documentation
For recently published papers, we provide here a fast entry point to the provided modules. All of those modules are as well accessible over the standard documentation
SimE4KG: Explainable Distributed multi-modal Semantic Similarity Estimation for Knowledge Graphs
This framework includes all of the most recent developments for the SimE4KG framework. SimE4KG is the Explainable Distributed In-Memory multi-modal Semantic Similarity Estimation for Knowledge Graphs.
Overview
In this release we introduce multiple changes to the Sansa Stack to offer the SimE4KG functionalities The content is structured as follows:
- Release
- Databricks Notebooks
- ReadMe of novel Modules
- Novel Classes
- Unit Tests
- Data Sets
- Further Reading
Release
The changes are made available within this release here
SimE4KG Databricks Notebook
To showcase in a hands on session the usage of SimE4KG modules, we introduce multiple Databricks Notebooks. Those show the Full pipeline but also dedicated parts like the SmartFeature Extractor. Within the notebooks you can see the mixture of Explanations, Sample code and the output of the code snippets. With the Notebooks you can reproduce the functionality within you browser without a need to install the Framework locally. The Notebooks can be found here:
- SimE4KG Databricks Notebook for sample pipeline building including outputs
- SmartFeatureExtractor Databricks Notebook for multi modal feature extraction with the novel Smart Feature Extrator
ReadME
The novel modules of SimE4KG are documented within the SANSA ML ReadMe. For quick links especially to the high level SimE4KG Transformer and the SmartFeatureExtractor, you can use these two links:
- SimE4KG/Dasim Transformer ReadMe which is the high leveled Similarity Estimation transformer calling entire pipeline
- SmartFeatureExtractor ReadMe which is the novel developed generic multi modal feature extractor transformer
Novel Classes
Novel Classes developed within this release are especially the Dasim Transformer and the SmartFeature extractor but also the corresponding unit test as well as the Evaluation scripts to test module performance:
- DasimTransformer Class Unit Test
- Smart Feature Extractor Class Unit Test
- Evaluation Classes like data size scalability, feature availability evaluation, Smartfeature extractor evaluation and many more …
Datasets
As starting point to play around with the developments of this framework, we recommend the Linked Movie Data Base RDF Knowledge Graph. This KG represents in millions of triples data about movies and consists of multi modal features like lists of URIs as the lists of actors, numeric features like the runtime but also timestamp data like the release date. For purposes of Unit test, we propose also an extract of this data which follow the same schema.
Further Reading
If you are interested into further reading and background information of other related modules we recommend the following papers:
- Distributed semantic analytics using the SANSA stack
- Sparklify: A Scalable Software Component for Efficient Evaluation of SPARQL Queries over Distributed RDF Datasets
- DistSim - Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs
- DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs
Other
- In addition, we provide the full jar of this version below
DistRDF2ML
Release
The changes are made available within this release here
Docs
The documentation with sample code snippets are available within the SANSA ML Readme which include:
- Literal2Feature - AutoSparql Generation for Feature Extraction
- SparqlFrame Feature Extractor
- Smart Vector Assembler
- ML2Graph
- DistAD
Code to Modules:
This release majorly provides the modules:
DistSim ICSC Paper Documentation
the documentation in docs are available here the respective similarity estimation models are in this github directory and further needed utils are here
Code to Modules:
- Feature Extractor
- Similarity Estiamtion Modules
- Minmial Examples
- Evaluation of Experiment Class
- Metagraph Creation
DistAD ICSC Paper Documentation
The documentation in docs are available here. The modules are in this github directory.