Discussion:
Vietnamese UTF8 collation support
Doan Minh Phuong
2006-10-30 17:18:34 UTC
Permalink
Hi, sorry because of I don't know how to add our collation into new
version of MySQL in formal way. I've seen only some line in ctype-uca.c
(in Persian collation part). The collation bellow was passed all my test
for precomposed charset in Vietnamese language (case-insensitive).
Please set high enough value for MY_MAX_COLL_RULE.

Phuong.


static const char vietnam[]=
"& A < \\u00E1 <<< \\u00C1 < \\u00E0 <<< \\u00C0 < \\u1EA3 <<<
\\u1EA2 < \\u00E3 <<< \\u00C3 < \\u1EA1 <<< \\u1EA0"
" < \\u0103 <<< \\u0102 < \\u1EAF <<< \\u1EAE < \\u1EB1 <<<
\\u1EB0 < \\u1EB3 <<< \\u1EB2 < \\u1EB5 <<< \\u1EB4 < \\u1EB7 <<< \\u1EB6"
" < \\u00E2 <<< \\u00C2 < \\u1EA5 <<< \\u1EA4 < \\u1EA7 <<<
\\u1EA6 < \\u1EA9 <<< \\u1EA8 < \\u1EAB <<< \\u1EAA < \\u1EAD <<< \\u1EAC"
"& D < \\u0111 <<< \\u0110"
"& E < \\u00E9 <<< \\u00C9 < \\u00E8 <<< \\u00C8 < \\u1EBB <<<
\\u1EBA < \\u1EBD <<< \\u1EBC < \\u1EB9 <<< \\u1EB8"
" < \\u00EA <<< \\u00CA < \\u1EBF <<< \\u1EBE < \\u1EC1 <<<
\\u1EC0 < \\u1EC3 <<< \\u1EC2 < \\u1EC5 <<< \\u1EC4 < \\u1EC7 <<< \\u1EC6"
"& I < \\u00ED <<< \\u00CD < \\u00EC <<< \\u00CC < \\u1EC9 <<<
\\u1EC8 < \\u0129 <<< \\u0128 < \\u1ECB <<< \\u1ECA"
"& O < \\u00F3 <<< \\u00D3 < \\u00F2 <<< \\u00D2 < \\u1ECF <<<
\\u1ECE < \\u00F5 <<< \\u00D5 < \\u1ECD <<< \\u1ECC"
" < \\u00F4 <<< \\u00D4 < \\u1ED1 <<< \\u1ED0 < \\u1ED3 <<<
\\u1ED2 < \\u1ED5 <<< \\u1ED4 < \\u1ED7 <<< \\u1ED6 < \\u1ED9 <<< \\u1ED8"
" < \\u01A1 <<< \\u01A0 < \\u1EDB <<< \\u1EDA < \\u1EDD <<<
\\u1EDC < \\u1EDF <<< \\u1EDE < \\u1EE1 <<< \\u1EE0 < \\u1EE3 <<< \\u1EE2"
"& U < \\u00FA <<< \\u00DA < \\u00F9 <<< \\u00D9 < \\u1EE7 <<<
\\u1EE6 < \\u0169 <<< \\u0168 < \\u1EE5 <<< \\u1EE4"
" < \\u01B0 <<< \\u01AF < \\u1EE9 <<< \\u1EE8 < \\u1EEB <<<
\\u1EEA < \\u1EED <<< \\u1EEC < \\u1EEF <<< \\u1EEE < \\u1EF1 <<< \\u1EF0"
"& Y < \\u00FD <<< \\u00DD < \\u1EF3 <<< \\u1EF2 < \\u1EF7 <<<
\\u1EF6 < \\u1EF9 <<< \\u1EF8 < \\u1EF5 <<< \\u1EF4";
--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals?unsub=gcdmd-***@m.gmane.org
Alexander Barkov
2006-11-07 13:20:38 UTC
Permalink
Dear Doan,
Post by Doan Minh Phuong
Hi, sorry because of I don't know how to add our collation into new
version of MySQL in formal way. I've seen only some line in ctype-uca.c
(in Persian collation part). The collation bellow was passed all my test
for precomposed charset in Vietnamese language (case-insensitive).
Please set high enough value for MY_MAX_COLL_RULE.
Thank you very much for your contribution!

Looking at "Vietnamese Alphabetical System" pages:
http://vietunicode.sourceforge.net/charset/vietalphabet.html
http://vietunicode.sourceforge.net/charset/v3.htm
I noticed that they provide slightly different rules for comparison
and sorting.

Some examples:

1. Comparison: SELECT x FROM t1 WHERE x='a'

should return (according to this site) all these letters:

a 0061 - A
à 00E0 - A WITH GRAVE
ả 1EA3 - A WITH HOOK ABOVE
ã 00E3 - A WITH TILDE
á 00E1 - A WITH ACUTE
ạ 1EA1 - A WITH DOT BELOW

(a well as their uppercase counterparts).

while your version considers all these letters as different,
not equal to each other.

2. Sorting: SELECT x FROM t1 ORDER BY x
This site recommends to return "00E0 A WITH GRAVE" before
"00E1 A WITH ACUTE". Your version seems to return these
letters in reverse order.

Can you please tell why you choose this way, and
what do you think about the "Vietnamese Alphabetical System" pages?

Do you know any links to official documents describing
Vietnamese alphabet rules?

Thanks!
Post by Doan Minh Phuong
Phuong.
static const char vietnam[]=
"& A < \\u00E1 <<< \\u00C1 < \\u00E0 <<< \\u00C0 < \\u1EA3 <<<
\\u1EA2 < \\u00E3 <<< \\u00C3 < \\u1EA1 <<< \\u1EA0"
" < \\u0103 <<< \\u0102 < \\u1EAF <<< \\u1EAE < \\u1EB1 <<<
\\u1EB0 < \\u1EB3 <<< \\u1EB2 < \\u1EB5 <<< \\u1EB4 < \\u1EB7 <<< \\u1EB6"
" < \\u00E2 <<< \\u00C2 < \\u1EA5 <<< \\u1EA4 < \\u1EA7 <<<
\\u1EA6 < \\u1EA9 <<< \\u1EA8 < \\u1EAB <<< \\u1EAA < \\u1EAD <<< \\u1EAC"
"& D < \\u0111 <<< \\u0110"
"& E < \\u00E9 <<< \\u00C9 < \\u00E8 <<< \\u00C8 < \\u1EBB <<<
\\u1EBA < \\u1EBD <<< \\u1EBC < \\u1EB9 <<< \\u1EB8"
" < \\u00EA <<< \\u00CA < \\u1EBF <<< \\u1EBE < \\u1EC1 <<<
\\u1EC0 < \\u1EC3 <<< \\u1EC2 < \\u1EC5 <<< \\u1EC4 < \\u1EC7 <<< \\u1EC6"
"& I < \\u00ED <<< \\u00CD < \\u00EC <<< \\u00CC < \\u1EC9 <<<
\\u1EC8 < \\u0129 <<< \\u0128 < \\u1ECB <<< \\u1ECA"
"& O < \\u00F3 <<< \\u00D3 < \\u00F2 <<< \\u00D2 < \\u1ECF <<<
\\u1ECE < \\u00F5 <<< \\u00D5 < \\u1ECD <<< \\u1ECC"
" < \\u00F4 <<< \\u00D4 < \\u1ED1 <<< \\u1ED0 < \\u1ED3 <<<
\\u1ED2 < \\u1ED5 <<< \\u1ED4 < \\u1ED7 <<< \\u1ED6 < \\u1ED9 <<< \\u1ED8"
" < \\u01A1 <<< \\u01A0 < \\u1EDB <<< \\u1EDA < \\u1EDD <<<
\\u1EDC < \\u1EDF <<< \\u1EDE < \\u1EE1 <<< \\u1EE0 < \\u1EE3 <<< \\u1EE2"
"& U < \\u00FA <<< \\u00DA < \\u00F9 <<< \\u00D9 < \\u1EE7 <<<
\\u1EE6 < \\u0169 <<< \\u0168 < \\u1EE5 <<< \\u1EE4"
" < \\u01B0 <<< \\u01AF < \\u1EE9 <<< \\u1EE8 < \\u1EEB <<<
\\u1EEA < \\u1EED <<< \\u1EEC < \\u1EEF <<< \\u1EEE < \\u1EF1 <<< \\u1EF0"
"& Y < \\u00FD <<< \\u00DD < \\u1EF3 <<< \\u1EF2 < \\u1EF7 <<<
\\u1EF6 < \\u1EF9 <<< \\u1EF8 < \\u1EF5 <<< \\u1EF4";
--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals?unsub=gcdmd-***@m.gmane.org
Loading...