Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Obsolete
Priority: Low
Fix Version/s: None
Affects Version/s: None
Component/s: Misc
Labels:
None
Environment:

Version: 3.x
PHP Version:
Webserver:
Database:

Description

Hi,

I'm using utf-8 charset and I'm a bit unhappy with the eZSearchEngine::normalizeText method.

The first line of the method is
$text =& strToLower( $text );

This breaks up utf-8 chars, so one of the following line
$unicodeValueArray =& $codec->convertString( $text );

won't give a reasonable result.

Example:

If you have the german word "für" (a preposition) - it will be converted to "f" after running through normalizeText. So the character "r" is also stripped off.

"Österreich" ( = "Austria" in english) -> "terreich"
"Schönfärberei" (= "garment dyeing" in english) => "schfberei"

At the moment this isn't big problem because the search-string of a user will also run through the method. So, if someone search "Österreich", he will find content which contains "Österreich".

But you give the feature of a wildcard search and I think this could be a problem because the search may find too much (or not expected) content.

Kind regards,
Emil.

Attachments

Issue Links

relates to

EZP-12428 strtolower() in eZSearchLog::addPhrase gives error for special char for e.g. Ã…Ã˜Ã†

Closed

Activity

People

Assignee:: unknown

Reporter:: emil.webber

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 27/Jan/04 8:57 AM

Updated:: 27/Jan/04 8:22 AM