kgdata documentation#

kgdata is a Python library to process dumps of Wikipedia, Wikidata. What it can do:

  • Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)

  • Create embedded key-value databases to access entities from the dumps.

  • Extract Wikidata ontology.

  • Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.

  • Create Pyserini indices to search Wikidata’s entities.

Indices and tables#