c# - Regex Latin characters filter and non latin character filer -


i developing program ,where need filter words , sentences non-latin character. problem is, found latin character words , sentences , not found words , sentences mixed latin characters , non-latin characters. example, "hello" latin letter word, , can match using code:

match match = regex.match(line.line, @"[^\u0000-\u007f]+", regexoptions.ignorecase);  if (match.success) {     line.line = match.groups[1].value; } 

but not found example mixed non-latin letter word or sentences : "hellø sømthing" .

also, explain regexoptions.none or regexoptions.ignorecase , stand for?

the 4 "latin" blocks (from http://www.fileformat.info/info/unicode/block/index.htm):

basic latin u+0000 - u+007f

latin-1 supplement u+0080 - u+00ff

latin extended-a u+0100 - u+017f

latin extended-b u+0180 - u+024f

so regex "include" of them be:

regex.match(line.line, @"[\u0000-\u024f]+", regexoptions.none); 

while regex catch outside block be:

regex.match(line.line, @"[^\u0000-\u024f]+", regexoptions.none); 

note feel doing regex "by block" little wrong, when use latin blocks, because example in basic latin block have control characters (like new line, ...), letters (a-z, a-z), numbers (0-9), punctation (.,;:...), other characters ($@/&...) , on.

for meaning of regexoptions.none , regexoptions.ignorecase

  • their name quite clear

  • you try googling them on msdn

from https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx:

regexoptions.none: specifies no options set

regexoptions.ignorecase: specifies case-insensitive matching.

the last 1 means if regex.match(line.line, @"abc", regexoptions.ignorecase) match abc, abc, abc, ... , option works on character ranges [a-z] match both a-z , a-z. note useless in case because blocks suggested should contain both uppercase , lowercase "variation" of letters both uppercase , lowercase.


Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -