Модул:data consistency check/doc
This is the documentation page for Модул:data consistency check
This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.
Output
[вироиш]Module:etymology languages/data
- Bashkardi language (
bsg-bas) has a canonical name that is not unique; it is also used by the codebsg. - Rudbari language (
rdb-rud) has a canonical name that is not unique; it is also used by the coderdb. - Chali language (
tks-cal) has a canonical name that is not unique; it is also used by the codetgf.
- Middle Iranian family (
ira-mid) has no child families or languages. - Old Iranian family (
ira-old) has no child families or languages.
- Blissymbols script (
Blis) is not used by any language and has no characters listed for auto-detection. - Cypro-Minoan script (
Cpmn) is not used by any language. - Hieratic script (
Egyh) is not used by any language and has no characters listed for auto-detection. - Elymaic script (
Elym) is not used by any language. - Hiragana script (
Hira) is not used by any language. - Nyiakeng Puachue Hmong script (
Hmnp) is not used by any language. - Kana script (
Hrkt) is not used by any language. - Image-rendered script (
Imag) is not used by any language and has no characters listed for auto-detection. - International Phonetic Alphabet script (
Ipach) is not used by any language and has no characters listed for auto-detection. - Kpelle script (
Kpel) is not used by any language and has no characters listed for auto-detection. - Loma script (
Loma) is not used by any language and has no characters listed for auto-detection. - Moon script (
Moon) is not used by any language and has no characters listed for auto-detection. - Morse code (
Morse) is not used by any language and has no characters listed for auto-detection. - Musical notation script (
Music) is not used by any language. - Nag Mundari script (
Nagm) is not used by any language. - Unspecified script (
None) is not used by any language and has no characters listed for auto-detection. - Rongorongo script (
Roro) is not used by any language and has no characters listed for auto-detection. - Rumi numerals script (
Rumin) is not used by any language. - flag semaphore (
Semap) is not used by any language and has no characters listed for auto-detection. - Visible Speech script (
Visp) is not used by any language and has no characters listed for auto-detection. - Vithkuqi script (
Vith) is not used by any language. - Woleai script (
Wole) is not used by any language and has no characters listed for auto-detection. - Yezidi script (
Yezi) is not used by any language. - mathematical notation script (
Zmth) is not used by any language. - symbol script (
Zsym) is not used by any language. - undetermined script (
Zyyy) is not used by any language and has no characters listed for auto-detection. - uncoded script (
Zzzz) is not used by any language and has no characters listed for auto-detection. - The data key sort_by_scraping for Japanese script (
Jpan) is invalid.
Checks performed
[вироиш]For multiple data modules:
- Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
- Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
- Each name in the list of other names must appear only once.
otherNames, if present, must be an array.- Wikidata item IDs must be a positive integer or a string starting with
Qand ending with decimal digits.
The following must be true of the data used by Module:languages:
- Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
- The canonical name (field
1) must be present and must not be the same as the canonical name of another language. - If field
2is notnil, it must a valid Wikidata item ID. - If field
3orfamilyis given and notnil, it must be a valid family code. - If field
4orscriptsis given and notnil, it must be an array, and each string in the array must be a valid script code. - If
ancestorsis given, it must be an array, and each string in the array must be a valid language or etymology language code. - If
familyis given, it must be a valid family code. - If
typeis given, it must be one of the recognised values (regular,reconstructed,appendix-constructed). - If
entry_nameis given, it must be a table that contains either two arrays (fromandto) or a string (remove_diacritics) or both. - If
sort_keyis given, it may either be a string, or at table that in turn contains either two arrays (fromandto) or a string (remove_diacritics). - If
entry_nameorsort_keyis given, thefromarray must be longer or equal in length to thetoarray. - If
standardCharsis given, it must form a valid Lua string pattern when placed between square brackets with^before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.) - If
override_translitis set,translitmust also be set, because there must be a transliteration module that can override manual transliteration. - If
link_tris present, it must betrue. - Have no data keys besides these:
1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".
Checks not performed:
- If
translitis present, it should be the name of a module, and this module should contain atrfunction that takes a pagename (and optionally a language code and script code) as arguments. - If
sort_keyis a string, it should be the name of a module, and this module should contain amakeSortKeyfunction that takes a pagename (and optionally a language code and script code) as arguments. - If
entry_nameorsort_keyis a table and contains a fieldremove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).
These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.
Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.
The following must be true of the data used by Module:etymology languages:
canonicalNamemust be given.parentmust be given must be a valid language, family or etymology-only language code.- If
ancestorsis given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language. - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".
Codes in Module:families data must:
- Have
canonicalName, which must not be the same as the canonical name of another family. - If
familyis given, it must be a valid family code. - Have at least one language or subfamily belonging to it.
- Have no data keys besides these:
"canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".
Codes in Module:scripts data must:
- Have
canonicalName. - Have at least one language that lists it as one of its scripts.
- Have a
characterspattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.) - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".