javascript - Regular expression Spanish and Arabic words -
how can write regular expression matches valid spanish , arabic words.
in english know, a-za-z, in hebrew א-ת, in russian А-Яа-яёЁ.
use javascript.
the range a-za-z english words unacceptably simple , naïve. leaves out manner of letters accents , other special marks used in loan words, etc. instance, won't match word "naïve", first sentence. use \p{latin} script, instead.
the range א-ת hebrew words wrong. leaves out hebrew presentation forms, cantillation marks, yiddish digraphs, , more. use \p{hebrew} script, instead.
the range А-Яа-яёЁ russian again incomplete , wrong. use \p{cyrillic} script, instead.
the spanish alphabet uses same 26 letters english, plus ñÑ. again, don't hardcode these range. many spanish words use accented vowels. use \p{latin} script match spanish words. regexes won't distinguish spanish english.
for arabic, use \p{arabic} script.
javascript, regex, , unicode
you said you're using javascript. unfortunately, javascript has little support unicode built-in. in javascript, need use xregexp library , unicode addon. allow use of unicode scripts mentioned above in regular expressions.
scripts vs blocks
always favor unicode scripts on unicode blocks. blocks match poorly code points in particular script. blocks leave out many important code points fall outside of incomplete range, , include many code points have not been assigned character. scripts include relevant code points, , no more.
Comments
Post a Comment