kgdata documentation#
kgdata is a Python library to process dumps of Wikipedia, Wikidata. What it can do:
Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
Create embedded key-value databases to access entities from the dumps.
Extract Wikidata ontology.
Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
Create Pyserini indices to search Wikidata’s entities.