AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 10467
Accepted
Derek Downey
Derek Downey
Asked: 2012-01-11 12:10:19 +0800 CST2012-01-11 12:10:19 +0800 CST 2012-01-11 12:10:19 +0800 CST

如何将 MySQL 中的控制字符从 latin1 转换为 UTF-8?

  • 772

在将数据库转换为 UTF-8 时,我注意到关于控制字符 0x80-0x9F 的奇怪行为。例如,0x92(右撇号)不会转换为 UTF-8 并截断列的其余内容,使用以下方法:

CREATE TABLE `bar` (
 `content` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1

INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.06 sec)

SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content                                                                         |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ                                                 |
+---------------------------------------------------------------------------------+
1 row in set (0.06 sec)

ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected, 1 warning (0.06 sec)
Records: 1  Duplicates: 0  Warnings: 1

SHOW WARNINGS;
+---------+------+-------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                             |
+---------+------+-------------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\x80\x81\x82\x83\x84\x85...' for column 'content' at row 1 |
+---------+------+-------------------------------------------------------------------------------------+
1 row in set (0.06 sec)

SELECT * FROM bar;
+---------+
| content |
+---------+
|         |
+---------+
1 row in set (0.06 sec)

虽然通常在 Latin1 中不允许使用 0x80-0x9F,但 MySQL 似乎以不同的方式处理它:

MySQL 的 latin1 与 Windows cp1252 字符集相同。这意味着它与官方 ISO 8859-1 或 IANA(互联网数字分配机构)latin1 相同,除了 IANA latin1 将 0x80 和 0x9f 之间的代码点视为“未定义”,而 cp1252 以及 MySQL 的 latin1 分配字符对于那些职位。[源]

但是 MySQL 似乎无法将上述值范围从其 latin1 字符集转换为 UTF-8 字符集。

这些字符是通过从 word 文档 (cp1252) 复制/粘贴而进入我的数据库的,虽然我可能已经找到一种方法让应用程序为新条目强制使用正确的 UTF-8 值,但我需要确保旧的 get正确转换。

MySQL 中是否有一种方法可以将它们转换为等效的 UTF-8,而无需遍历每个文本列的每一行并用 ASCII 友好的版本替换它们?

mysql character-set
  • 2 2 个回答
  • 9281 Views

2 个回答

  • Voted
  1. Best Answer
    atxdba
    2012-01-11T12:35:23+08:002012-01-11T12:35:23+08:00

    我不确定。我试图开始重现您的问题,但改变对我来说效果很好。

    test > CREATE TABLE `bar` (  `content` text ) ENGINE=MyISAM DEFAULT CHARSET=latin1;  INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
    Query OK, 0 rows affected (0.02 sec)
    
    Query OK, 1 row affected (0.00 sec)
    
    test > ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
    Query OK, 1 row affected (0.04 sec)
    Records: 1  Duplicates: 0  Warnings: 0
    
    test > select * from bar;
    +---------------------------------+
    | content                         |
    +---------------------------------+
    | ����������������������������� |
    +---------------------------------+
    1 row in set (0.00 sec)
    
    test > set names utf8;
    Query OK, 0 rows affected (0.00 sec)
    
    test > select * from bar;
    +---------------------------------------------------------------------------------+
    | content                                                                         |
    +---------------------------------------------------------------------------------+
    | €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
    +---------------------------------------------------------------------------------+
    1 row in set (0.00 sec)
    

    这是我的相关字符设置

    test > show variables like '%char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | utf8                       |
    | character_set_connection | utf8                       |
    | character_set_database   | latin1                     |
    | character_set_filesystem | binary                     |
    | character_set_results    | utf8                       |
    | character_set_server     | latin1                     |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    

    编辑

    我在运行 set names utf8 之前的字符设置

    test > show variables like '%char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | latin1                     |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | latin1                     |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)
    

    版本

    test > select version();
    +-------------------------+
    | version()               |
    +-------------------------+
    | 5.1.41-3ubuntu12.10-log |
    +-------------------------+
    1 row in set (0.00 sec)
    
    • 4
  2. RolandoMySQLDBA
    2012-01-11T13:09:57+08:002012-01-11T13:09:57+08:00

    在加载数据之前,您可能必须将字符集转换为 cp1250。

    我先跑了这个

    mysql> show character set like 'cp%';
    +---------+---------------------------+-------------------+--------+
    | Charset | Description               | Default collation | Maxlen |
    +---------+---------------------------+-------------------+--------+
    | cp850   | DOS West European         | cp850_general_ci  |      1 |
    | cp1250  | Windows Central European  | cp1250_general_ci |      1 |
    | cp866   | DOS Russian               | cp866_general_ci  |      1 |
    | cp852   | DOS Central European      | cp852_general_ci  |      1 |
    | cp1251  | Windows Cyrillic          | cp1251_general_ci |      1 |
    | cp1256  | Windows Arabic            | cp1256_general_ci |      1 |
    | cp1257  | Windows Baltic            | cp1257_general_ci |      1 |
    | cp932   | SJIS for Windows Japanese | cp932_japanese_ci |      2 |
    +---------+---------------------------+-------------------+--------+
    8 rows in set (0.00 sec)
    

    cp1252 在这里不存在。最接近的是cp1250。

    试试这个顺序:

    drop database if exists dtest;
    create database dtest;
    use dtest
    set names cp1250;
    CREATE TABLE `bar` ( 
     `content` text 
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
    INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F); 
    SELECT content FROM bar; 
    SHOW VARIABLES LIKE '%char%';
    set names utf8;
    SHOW VARIABLES LIKE '%char%';
    ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8; 
    SELECT content FROM bar; 
    

    看看会发生什么。

    我在 Linux 上的 MySQL 5.5.19 中得到了这个

    mysql> drop database if exists dtest;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> create database dtest;
    Query OK, 1 row affected (0.00 sec)
    
    mysql> use dtest
    Database changed
    mysql> set names cp1250;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> CREATE TABLE `bar` (
        ->  `content` text
        -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
    Query OK, 0 rows affected (0.01 sec)
    
    mysql> INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
    Query OK, 1 row affected (0.00 sec)
    
    mysql> SELECT content FROM bar;
    +---------------------------------+
    | content                         |
    +---------------------------------+
    | ??
    
    ??????                      |
    +---------------------------------+
    1 row in set (0.00 sec)
    
    mysql> SHOW VARIABLES LIKE '%char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | cp1250                     |
    | character_set_connection | cp1250                     |
    | character_set_database   | latin1                     |
    | character_set_filesystem | binary                     |
    | character_set_results    | cp1250                     |
    | character_set_server     | latin1                     |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)
    
    mysql> set names utf8;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> SHOW VARIABLES LIKE '%char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | utf8                       |
    | character_set_connection | utf8                       |
    | character_set_database   | latin1                     |
    | character_set_filesystem | binary                     |
    | character_set_results    | utf8                       |
    | character_set_server     | latin1                     |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)
    
    mysql> ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
    Query OK, 1 row affected (0.01 sec)
    Records: 1  Duplicates: 0  Warnings: 0
    
    mysql> SELECT content FROM bar;
    +---------------------------------------------------------------------------------+
    | content                                                                         |
    +---------------------ŽÂÂâââââ---------------------------------------------------+
    | â¬ÂâÆââ¦â â¡â°Å â¹Å         ¢Å¡âºÅÂ
                                          +---------------------------------------------------------------------------------+
    1 row in set (0.00 sec)
    
    mysql>
    

    我在我的 Windows 7 机器上的 MySQL 5.5.12 for Windows 中得到了这个

    mysql> drop database if exists dtest;
    Query OK, 1 row affected (0.00 sec)
    
    mysql> create database dtest;
    Query OK, 1 row affected (0.02 sec)
    
    mysql> use dtest
    Database changed
    mysql> set names cp1250;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> CREATE TABLE `bar` (
        ->  `content` text
        -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
    Query OK, 0 rows affected (0.06 sec)
    
    mysql> INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
    Query OK, 1 row affected (0.00 sec)
    
    mysql> SELECT content FROM bar;
    +---------------------------------+
    | content                         |
    +---------------------------------+
    | Ç?é?äàåçëèï??Ä??æÆôöòûù?ÖÜ¢??₧? |
    +---------------------------------+
    1 row in set (0.00 sec)
    
    mysql> SHOW VARIABLES LIKE '%char%';
    +--------------------------+---------------------------------+
    | Variable_name            | Value                           |
    +--------------------------+---------------------------------+
    | character_set_client     | cp1250                          |
    | character_set_connection | cp1250                          |
    | character_set_database   | latin1                          |
    | character_set_filesystem | binary                          |
    | character_set_results    | cp1250                          |
    | character_set_server     | latin1                          |
    | character_set_system     | utf8                            |
    | character_sets_dir       | C:\MySQL_5.5.12\share\charsets\ |
    +--------------------------+---------------------------------+
    8 rows in set (0.00 sec)
    
    mysql> set names utf8;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> SHOW VARIABLES LIKE '%char%';
    +--------------------------+---------------------------------+
    | Variable_name            | Value                           |
    +--------------------------+---------------------------------+
    | character_set_client     | utf8                            |
    | character_set_connection | utf8                            |
    | character_set_database   | latin1                          |
    | character_set_filesystem | binary                          |
    | character_set_results    | utf8                            |
    | character_set_server     | latin1                          |
    | character_set_system     | utf8                            |
    | character_sets_dir       | C:\MySQL_5.5.12\share\charsets\ |
    +--------------------------+---------------------------------+
    8 rows in set (0.00 sec)
    
    mysql> ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
    Query OK, 1 row affected (0.06 sec)
    Records: 1  Duplicates: 0  Warnings: 0
    
    mysql> SELECT content FROM bar;
    +---------------------------------------------------------------------------------+
    | content                                                                         |
    +---------------------------------------------------------------------------------+
    | €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
    +---------------------------------------------------------------------------------+
    1 row in set (0.00 sec)
    
    mysql>
    

    试试看 !!!

    • 1

相关问题

  • 是否有任何 MySQL 基准测试工具?[关闭]

  • 我在哪里可以找到mysql慢日志?

  • 如何优化大型数据库的 mysqldump?

  • 什么时候是使用 MariaDB 而不是 MySQL 的合适时机,为什么?

  • 组如何跟踪数据库架构更改?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何查看 Oracle 中的数据库列表?

    • 8 个回答
  • Marko Smith

    mysql innodb_buffer_pool_size 应该有多大?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    从 .frm 和 .ibd 文件恢复表?

    • 10 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    如何选择每组的第一行?

    • 6 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    pedrosanta 使用 psql 列出数据库权限 2011-08-04 11:01:21 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST
  • Martin Hope
    bernd_k 什么时候应该使用唯一约束而不是唯一索引? 2011-01-05 02:32:27 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve