kgdata.wikipedia.datasets.easy_tables#
Functions
Tables that can be labeled automatically easily. |
|
|
|
|
Determine if a table is easy or not. |
Classes
- easy_tables() Dataset[LinkedHTMLTable][source]#
Tables that can be labeled automatically easily. The table is easy or not is determined by
kgdata.wikipedia.easy_table.is_easy_table()- Return type:
Dataset[LinkedHTMLTable]
- is_easy_table(tbl: LinkedHTMLTable, tests: List[Callable[[LinkedHTMLTable], bool]]) bool[source]#
Determine if a table is easy or not.
- Parameters:
tbl (LinkedHTMLTable) – Input table.
tests (List[Callable[[LinkedHTMLTable], bool]]) – List of tests. Each test is a function that takes a table and returns a boolean.
- Return type:
- get_n_headers(tbl: LinkedHTMLTable) int[source]#
- Parameters:
tbl (LinkedHTMLTable) –
- Return type:
- class EasyTests[source]#
Bases:
object- MIN_ROWS = 10#
- MIN_FREQ_LINKS = 0.7#
- MIN_LINK_SURFACE = 0.9#
- MIN_EXISTING_LINKS = 0.8#
- static min_rows(tbl: LinkedHTMLTable) bool[source]#
Determine if a table has at least min_rows rows.
- Parameters:
tbl (LinkedHTMLTable) – Input table.
min_rows – Minimum number of rows.
- Return type:
- static min_link_coverage_all_columns(tbl: LinkedHTMLTable) bool[source]#
- Parameters:
tbl (LinkedHTMLTable) –
- Return type:
- static min_links_all_columns(tbl: LinkedHTMLTable) bool[source]#
- Parameters:
tbl (LinkedHTMLTable) –
- Return type:
- static single_links_all_columns(tbl: LinkedHTMLTable) bool[source]#
- Parameters:
tbl (LinkedHTMLTable) –
- Return type:
- static min_existing_links_all_columns(tbl: LinkedHTMLTable) bool[source]#
- Parameters:
tbl (LinkedHTMLTable) –
- Return type: