Details
-
Improvement
-
Resolution: Obsolete
-
Low
-
None
-
None
-
None
-
Version: 3.x
PHP Version:
Webserver:
Database:
Description
Hi,
I'm using utf-8 charset and I'm a bit unhappy with the eZSearchEngine::normalizeText method.
The first line of the method is
$text =& strToLower( $text );
This breaks up utf-8 chars, so one of the following line
$unicodeValueArray =& $codec->convertString( $text );
won't give a reasonable result.
Example:
If you have the german word "für" (a preposition) - it will be converted to "f" after running through normalizeText. So the character "r" is also stripped off.
"Österreich" ( = "Austria" in english) -> "terreich"
"Schönfärberei" (= "garment dyeing" in english) => "schfberei"
At the moment this isn't big problem because the search-string of a user will also run through the method. So, if someone search "Österreich", he will find content which contains "Österreich".
But you give the feature of a wildcard search and I think this could be a problem because the search may find too much (or not expected) content.
Kind regards,
Emil.
Attachments
Issue Links
- relates to
-
EZP-12428 strtolower() in eZSearchLog::addPhrase gives error for special char for e.g. ÅØÆ
- Closed