Expect the column to be in a specified language.
expect_column_values_are_in_language
This expectation level is EXPERIMENTAL
Contributors:
Tags:
nlp
hackathon
Metrics:
column_values.nonnull.unexpected_count
column_values.are_in_language.unexpected_count
table.row_count
column_values.are_in_language.unexpected_values
Description
Expect the column to be in a specified language.
Args:
- column (str): The column name
- language (str): One of 97 ISO 639-1 language codes, e.g. af, am, an, ar, as, az, be, bg, bn, br, bs, ca, cs, cy, da, de, dz, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, gu, he, hi, hr, ht, hu, hy, id, is, it, ja, jv, ka, kk, km, kn, ko, ku, ky, la, lb, lo, lt, lv, mg, mk, ml, mn, mr, ms, mt, nb, ne, nl, nn, no, oc, or, pa, pl, ps, pt, qu, ro, ru, rw, se, si, sk, sl, sq, sr, sv, sw, ta, te, th, tl, tr, ug, uk, ur, vi, vo, wa, xh, zh, zu
Notes:
-
- Language identification uses the langid package.
-
- langid uses a custom, permissive LICENSE, suitable for commercial purposes.
- Results may be inaccurate for strings shorter than 50 characters.
- No confidence threshold has been set, so language with the highest confidence will be selected, even if confidence is low.
Want to make your own Expectation or an improvement to this one?
We've put together some great how to guides (including videos) on how to create your own expectations in a flash!
You can see those resources here: Contributor Resources