blackjack,
roulette,
slots,
slots,
blackjack,
roulette,
casino,
blackjack,
blackjack,
blackjack,
slots
1. 三种设置
1.1 数据字符集:
2种汉字编码 ANSI(GBK) UNICODE(UTF8)
select hex('数据库’);
CAFDBEDDBFE2
select hex('数据库’);
E695B0E68DAEE5BA93
1.2 表字符集设置:
create table if not exists chartest.t_utf8
(chartable varchar(100),
charclient varchar(100),
chardata varchar(100),
data varchar(100),
msg varchar(500),
primary key (chartable,charclient,chardata)
)default charset=utf8;
1.3 客户端字符集:
set names latin1;
SET NAMES sets the three session system variables character_set_client, character_set_connection, and character_set_results to the given character set
2. 转换规则
2.1 直接转换
gbk->utf8 99%不可以,少数可以,如联通
mysql> select hex('联通’);
+————-+
| hex('联通’) |
+————-+
| C1AACDA8 |
+————-+
echo “obase=2;ibase=16;C1AACDA8″ |bc
11000001 10101010 11001101 10101000
符合utf8 2字节符号编码规则110… 10…,智能识别有可能出错。
可能引起误断的汉字见:
http://blog.csdn.net/yimengqiannian/article/details/7060565
utf8->gbk 可以,尾部奇数字节截断
gbk->latin1 可以,变为单字节字符
utf8->latin1 可以,变为单字节字符
2.2 通过unicode转换(相当于python中的decode,encode)
gbk->unicode->utf8 可以
utf8->unicode->gbk 可以
gbk->unicode->latin1 不可以
utf8->unicode->latin1 不可以
set names 相当于直接转换,insert into 相当于通过unicode转换。
3. 摸拟测试
一共有18种组合, 下面一一列举:
gbk->gbk->gbk 按顺序分别表示插入字符串原始字符集,客户端字符集,表字符集。
3.1 完全匹配
gbk->gbk->gbk
utf8->utf8->utf8
3.2 插入时进行正确的unicode转换
gbk->gbk->utf8
select hex(convert(CONVERT(UNHEX( 'CAFDBEDDBFE2′) USING gbk) using utf8));
E695B0E68DAEE5BA93
utf8->utf8->gbk
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING utf8) using gbk));
CAFDBEDDBFE2
3.3 latin1单字节流保存
gbk->latin1->latin1
select hex(convert(CONVERT(UNHEX( 'CAFDBEDDBFE2′) USING latin1) using latin1));
CAFDBEDDBFE2
utf8->latin1->latin1
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING latin1) using latin1));
E695B0E68DAEE5BA93
3.4 转为unicode后再转为latin1 无法表示,转为3F (latin1 中的?号)
gbk->gbk->latin1
select hex(convert(CONVERT(UNHEX( 'CAFDBEDDBFE2′) USING gbk) using latin1));
3F3F3F
utf8->utf8->latin1
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING utf8) using latin1));
3F3F3F
utf8->gbk->latin1
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING gbk) using latin1));
3F3F3F3F
3.5 不符合utf8规则,无法转换,保存为空
gbk->utf8->latin1
gbk->utf8->utf8
gbk->utf8->gbk
select hex(convert(CONVERT(UNHEX( 'CAFDBEDDBFE2′) USING utf8) using latin1));
Warning | 1300 | Invalid utf8 character string: 'CAFDBE’
3.6 utf8 转换为双字节汉字后,尾部奇数字节截断
utf8->gbk->gbk
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING gbk) using gbk));
E695B0E68DAEE5BA
| Warning | 1300 | Invalid gbk character string: '93′ |
utf8->gbk->utf8
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING gbk) using utf8));
E98F81E78988E5B581E690B4
| Warning | 1300 | Invalid gbk character string: '93′ |
3.7 latin1 7f后的字符无法全部转为gbk
gbk->latin1->gbk
select hex(convert(CONVERT(UNHEX( 'CAFDBEDDBFE2′) USING latin1) using gbk));
3F3F3F3F3F3F
utf8->latin1->gbk 注意 B0转为A1E3 93转为A1B0
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING latin1) using gbk));
3F3FA1E33F3F3F3F3FA1B0
3.8 latin1字符转为utf8,单字节变为2字节或者3字节
gbk->latin1->utf8 12字节
select hex(convert(CONVERT(UNHEX( 'CAFDBEDDBFE2′) USING latin1) using utf8));
C38AC3BDC2BEC39DC2BFC3A2
utf8->latin1->utf8 20字节, latin1 的 95,93转为utf8的3字节,所以是3*6+2=20字节
select hex(convert(CONVERT(UNHEX( 'E695B0E68DAEE5BA93′) USING latin1) using utf8));
C3A6E280A2C2B0C3A6C28DC2AEC3A5C2BAE2809C
4. 脚本测试
4.1 测试脚本
#!/usr/bin/sh
sql=”
create table if not exists chartest.t_utf8
(chartable varchar(100),
charclient varchar(100),
chardata varchar(100),
data varchar(100),
msg varchar(500),
primary key (chartable,charclient,chardata)
)default charset=utf8;
create table if not exists chartest.t_gbk
(chartable varchar(100),
charclient varchar(100),
chardata varchar(100),
data varchar(100),
msg varchar(500),
primary key (chartable,charclient,chardata)
)default charset=gbk;
create table if not exists chartest.t_latin1
(chartable varchar(100),
charclient varchar(100),
chardata varchar(100),
data varchar(100),
msg varchar(500),
primary key (chartable,charclient,chardata)
)default charset=latin1;
truncate table chartest.t_utf8;
truncate table chartest.t_gbk;
truncate table chartest.t_latin1;
”
mysql -e”$sql”
utf8=$(cat utf8.txt)
gbk=$(cat gb2312.txt)
charset=(latin1 utf8 gbk)
for((i=0;i<${#charset[*]};i++))
do
charclient=${charset[$i]}
for((j=0;j<${#charset[*]};j++))
do
chartable=${charset[$j]}
echo “client=$charclient table=$chartable data=utf8″
sql=”use chartest;set names $charclient;insert into t_$chartable values('$chartable’,'$charclient’,'utf8′,’$utf8′,”);”
msg=$(mysql –show-warnings=true -e”$sql”)
sql=”use chartest;update t_$chartable set msg=”$msg” where chartable=’$chartable’ and charclient=’$charclient’ and chardata=’utf8′;”
mysql –show-warnings=true -e”$sql”
echo “client=$charclient table=$chartable data=gbk”
sql=”use chartest;set names $charclient;insert into t_$chartable values('$chartable’,'$charclient’,'gbk’,'$gbk’,”);”
#echo “$sql”
msg=$(mysql –show-warnings=true -e”$sql”)
sql=”use chartest;update t_$chartable set msg=”$msg” where chartable=’$chartable’ and charclient=’$charclient’ and chardata=’gbk’;”
mysql –show-warnings=true -e”$sql”
done
done
4.2 测试结果
securecrt gbk
set names latin1; set names gbk; set names utf8;
|