我有一个数据库,现在需要支持4个字节的字符(中文)。幸运的是我已经在生产MySQL 5.5。

所以我只想将所有utf8_bin归类到utf8mb4_bin。

我相信这样做不会造成性能损失/收益。除了一些存储开销外,此更改。

#1 楼

在我的指南《如何在MySQL数据库中支持完整的Unicode》中,您可以运行以下查询来更新数据库,表或列的字符集和排序规则:

对于每个数据库:

ALTER DATABASE
    database_name
    CHARACTER SET = utf8mb4
    COLLATE = utf8mb4_unicode_ci;


对于每个表:

ALTER TABLE
    table_name
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;


对于每个列:

ALTER TABLE
    table_name
    CHANGE column_name column_name
    VARCHAR(191)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;


(请不要盲目复制-粘贴!确切的声明取决于列的类型,最大长度和其他属性。上一行只是VARCHAR列的示例。)

但是请注意,您无法完全自动化从utf8utf8mb4的转换。如上述指南的第4步所述,您需要检查列和索引键的最大长度,因为使用utf8mb4而不是utf8时,指定的数字具有不同的含义。

《 MySQL 5.5参考手册》的10.1.11节对此有更多信息。

#2 楼

该解决方案将生成然后运行转换数据库,表和列所需的查询。它会转换类型为varchartexttinytextmediumtextlongtextchar的所有列。
应始终备份数据库,以防出现问题。


复制以下内容查询到gen_queries.sql中,用要转换的数据库名称替换出现的4个YOUR_DATABASE_NAME
 USE information_schema;
SELECT CONCAT("ALTER DATABASE `",table_schema,"` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") AS _sql
FROM `TABLES` WHERE table_schema LIKE "YOUR_DATABASE_NAME" AND TABLE_TYPE='BASE TABLE' GROUP BY table_schema UNION
SELECT CONCAT("ALTER TABLE `",table_schema,"`.`",table_name,"` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") AS _sql  
FROM `TABLES` WHERE table_schema LIKE "YOUR_DATABASE_NAME" AND TABLE_TYPE='BASE TABLE' GROUP BY table_schema, table_name UNION
SELECT CONCAT("ALTER TABLE `",`COLUMNS`.table_schema,"`.`",`COLUMNS`.table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type,"(",character_maximum_length,") CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") AS _sql 
FROM `COLUMNS` INNER JOIN `TABLES` ON `TABLES`.table_name = `COLUMNS`.table_name WHERE `COLUMNS`.table_schema like "YOUR_DATABASE_NAME" and data_type in ('varchar','char') AND TABLE_TYPE='BASE TABLE' UNION
SELECT CONCAT("ALTER TABLE `",`COLUMNS`.table_schema,"`.`",`COLUMNS`.table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type," CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") AS _sql 
FROM `COLUMNS` INNER JOIN `TABLES` ON `TABLES`.table_name = `COLUMNS`.table_name WHERE `COLUMNS`.table_schema like "YOUR_DATABASE_NAME" and data_type in ('text','tinytext','mediumtext','longtext') AND TABLE_TYPE='BASE TABLE';
 



运行以下命令以生成新文件queries.sql,其中包含转换数据库所需的所有查询。:
mysql -u root -p -s < gen_queries.sql > queries.sql



运行以下命令以运行查询,执行转换:
mysql -u root -p < queries.sql




注意:

要在多个数据库上运行转换,请调整table_schema LIKE "YOUR_DATABASE_NAME"部分的查询,例如:

table_schema LIKE "wiki_%"替换将转换名称以wiki_开头的所有数据库
table_type != 'SYSTEM VIEW'替换将转换所有数据库


一个问题我在mysql键中使用了一些varchar(255)列时,产生了以下错误:ERROR 1071 (42000) at line x: Specified key was too long; max key length is 767 bytes
如果发生这种情况,请将列更改为较小的值,例如varchar(150),然后重新运行命令。整理到utf8mb4_unicode_ci而不是问题中要求的utf8mb4_bin。根据需要在查询中替换您喜欢的排序规则。


评论


我必须使用“ SET foreign_key_checks = 0;”,然后应用更改,然后使用“ SET foreign_key_checks = 1;”。

– dfrankow
19年8月15日在19:08

很有帮助。虽然有关“ 767超出”的错误在mariadb 5.5.x中未修复,但在更高版本(mariadb 10.2.x及更高版本)中进行测试时可以解决。

–icasimpan
3月4日23:34

请注意,这似乎会破坏生成的列(仅显示持久列,而不是虚拟列),请注意!

–syserr0r
10月21日11:52

#3 楼

我使用了以下shell脚本。它以数据库名称为参数,并将所有表转换为另一个字符集和排序规则(由脚本中定义的另一个参数或默认值提供)。

#!/bin/bash

# mycollate.sh <database> [<charset> <collation>]
# changes MySQL/MariaDB charset and collation for one database - all tables and
# all columns in all tables

DB=""
CHARSET=""
COLL=""

[ -n "$DB" ] || exit 1
[ -n "$CHARSET" ] || CHARSET="utf8mb4"
[ -n "$COLL" ] || COLL="utf8mb4_general_ci"

echo $DB
echo "ALTER DATABASE \`$DB\` CHARACTER SET $CHARSET COLLATE $COLL;" | mysql

echo "USE \`$DB\`; SHOW TABLES;" | mysql -s | (
    while read TABLE; do
        echo $DB.$TABLE
        echo "ALTER TABLE \`$TABLE\` CONVERT TO CHARACTER SET $CHARSET COLLATE $COLL;" | mysql $DB
    done
)


#4 楼

我将编写一个脚本(用Perl或其他语言编写)以使用information_schema(TABLES和COLUMNS)遍历所有表,并对每个CHAR / VARCHAR / TEXT字段执行MODIFY COLUMN。我将所有修改收集到每个表的单个ALTER中;这样会更有效。

我认为(但不确定)Raihan的建议只会更改表格的默认设置。

#5 楼

遇到这种情况;这是我用来转换数据库的方法:首先,您需要编辑my.cnf以使默认数据库连接(在应用程序和MYSQL之间)与utf8mb4_unicode_ci兼容。如果没有这些字符,例如表情符号和您的应用程序提交的类似字符,它们将无法以正确的字节/编码形式出现在表中(除非应用程序的DB CNN参数指定utf8mb4连接)。

此处给出的说明。



执行以下SQL(无需准备SQL来更改单个列,ALTER TABLE语句将执行此操作)。

在执行以下代码之前代码用您的实际数据库名称替换“ DbName”。

USE information_schema;

SELECT concat("ALTER DATABASE `",table_schema,
              "` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") as _sql
  FROM `TABLES`
 WHERE table_schema like "DbName"
 GROUP BY table_schema;

SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,
              "` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") as _sql
  FROM `TABLES`
 WHERE table_schema like "DbName"
 GROUP BY table_schema, table_name;


收集上述SQL的输出并将其保存在点sql文件中并执行。


如果出现类似#1071 - Specified key was too long; max key length is 1000 bytes.的错误以及有问题的表名,则意味着该表某个列的索引键(应该转换为MB4字符串)将很大,因此Varchar列应为<= 250,以便其索引键最大为1000个字节。检查具有索引的列,如果其中之一是varchar> 250(最有可能是255),则




步骤1:检查该列中的数据以使确保该列中的最大字符串大小为<=250。

示例查询:

select `id`,`username`, `email`,
       length(`username`) as l1,
       char_length(`username`) as l2,
       length(`email`) as l3,
       char_length(`email`) as l4
  from jos_users
 order by l4 Desc;


步骤2:如果索引列的最大字符长度数据<= 250,然后将col长度更改为250。如果不可能,请删除该列上的索引
步骤3:然后再次对该表运行alter table查询,现在应该将表成功转换为utf8mb4了。



干杯!

评论


有一种方法可以对超过191个字符的长VARCHAR使用索引。您必须具有DBA / SUPER USER特权才能执行以下操作:设置数据库参数:innodb_large_prefix:ON; innodb_file_format:梭子鱼; innodb_file_format_max:梭子鱼;

–ChâuHồngLĩnh
17-10-4在4:34



#6 楼

我写了本指南:http://hanoian.com/content/index.php/24-automate-the-converting-a-mysql-database-character-set-to-utf8mb4

来自我工作时,我发现仅更改数据库和表是不够的。我必须进入每个表,并也更改每个text / mediumtext / varchar列。

幸运的是,我能够编写一个脚本来检测MySQL数据库的元数据,因此它可以遍历整个表和列并自动更改它们。

MySQL 5.6的长索引:

您必须具有DBA / SUPER USER权限才能执行以下操作:
设置数据库参数:

innodb_large_prefix : ON
innodb_file_format : Barracuda 
innodb_file_format_max : Barracuda


在此问题的答案中,上面有说明如何设置这些参数的说明:
https://stackoverflow.com/ questions / 35847015 / mysql-change-innodb-large-prefix

当然,在我的文章中,也有这样做的说明。

对于MySQL 5.7或更高版本,则innodb_large_prefix默认情况下为ON,innodb_file_format默认情况下也为梭子鱼。

评论


您的博客文章仍然存在-但是文章中指向您在Github中的ruby脚本的链接不起作用。

– Claudio Kuenzler
9月29日8:36

#7 楼

对于可能遇到此问题的人,最佳解决方案是根据此表将列首先修改为二进制类型:


BINARY
TEXT => BLOB
TINYTEXT => TINYBLOB
MEDIUMBLOB
LONGTEXT => LONGBLOB
VARCHAR => VARBINARY

然后将列修改回原来的类型和所需的字符集。

例如:

ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] LONGBLOB;
ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] VARCHAR(140) CHARACTER SET utf8mb4;


我尝试了几张latin1表,并保留了所有变音符号。 />
您可以为执行此操作的所有列提取此查询:

SELECT
CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' VARBINARY;'),
CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' ', COLUMN_TYPE,' CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;')
FROM information_schema.columns
WHERE TABLE_SCHEMA IN ('[TABLE_SCHEMA]')
AND COLUMN_TYPE LIKE 'varchar%'
AND (COLLATION_NAME IS NOT NULL AND COLLATION_NAME NOT LIKE 'utf%');


#8 楼

如果您像我一样不信任自动化,那么这就是我处理问题的方式。
首先停止挖掘!
首先通过更改数据库定义来更改新表的默认字符集(就像所有其他答案):
ALTER DATABASE database_name 
  CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

然后生成sql来更改所有现有表的新列的默认字符集:
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,"` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_swedish_ci;") as _sql
  FROM information_schema.TABLES
 WHERE table_schema like "database_name" and TABLE_TYPE="BASE TABLE"
 GROUP BY table_schema, table_name ;

现在我们可以处理“旧版”了
列出您正在使用的字符数据类型:
select distinct data_type  from information_schema.columns where table_schema = "database_name" and CHARACTER_SET_NAME is not null;

对我来说,该列表是“ varchar”和“ text”
列出正在使用的character_SETS_:
select distinct character_set_name from information_schema.columns where table_schema = "database_name";

这给了我“ utf8”,“ latin1”和“ utf8mb4”,这是我不信任自动化的原因,latin1列可能存在脏数据。
现在,您可以列出所有需要更新的列了:
select table_name, column_name, data_type, character_set_name, collation_name
  from information_schema.columns 
 where table_schema = "database_name" and CHARACTER_SET_NAME is not null AND CHARACTER_SET_NAME <> "utf8mb4"
 group by table_name, , data_type, character_set_name, collation_name;

如Mathias和MrJingles所述,可以使用“转换为字符集”来转换仅包含utf8或utf8mb4的表,但是这样做可能会给MySQL带来改变的风险,因此最好运行“ CHANGE” COLUMN”,因为这样可以使您完全控制发生的事情。
如果您有非utf8列,这些问题可能会启发您检查列数据:https://stackoverflow.com/q/401771/671282
https://stackoverflow.com/q/9304485/671282
由于您可能知道大多数列中希望包含的内容,因此在修改了允许的非ascii字符后,类似这样的内容可能会处理其中的大多数满足您的需求:
SELECT distinct section FROM table_name WHERE column_name NOT REGEXP '^([A-Za-z0-9åäöÅÄÖ&.,_ -])*$';

当上面的内容不合适时,我在下面使用了一些“模糊”处理的内容:
SELECT distinct
  CONVERT(CONVERT(column_name USING BINARY) USING latin1) AS latin1,
  CONVERT(CONVERT(column_name USING BINARY) USING utf8) AS utf8
FROM table_name
WHERE CONVERT(column_name USING BINARY) RLIKE CONCAT('[', UNHEX('C0'), '-', UNHEX('F4'), '][',UNHEX('80'),'-',UNHEX('FF'),']') limit 5;

此查询匹配可以启动一个utf8字符,从而使您可以检查这些记录,这可能会给您带来很多误报。
如果有任何字符无法转换,则utf8转换将无法返回null,因此在大字段中很有可能没有用。

#9 楼

我制作了一个脚本,它或多或少地自动执行此操作:

 <?php
/**
 * Requires php >= 5.5
 * 
 * Use this script to convert utf-8 data in utf-8 mysql tables stored via latin1 connection
 * This is a PHP port from: https://gist.github.com/njvack/6113127
 *
 * BACKUP YOUR DATABASE BEFORE YOU RUN THIS SCRIPT!
 *
 * Once the script ran over your databases, change your database connection charset to utf8:
 *
 * $dsn = 'mysql:host=localhost;port=3306;charset=utf8';
 * 
 * DON'T RUN THIS SCRIPT MORE THAN ONCE!
 *
 * @author hollodotme
 *
 * @author derclops since 2019-07-01
 *
 *         I have taken the liberty to adapt this script to also do the following:
 *
 *         - convert the database to utf8mb4
 *         - convert all tables to utf8mb4
 *         - actually then also convert the data to utf8mb4
 *
 */

header('Content-Type: text/plain; charset=utf-8');

$dsn      = 'mysql:host=localhost;port=3306;charset=utf8';
$user     = 'root';
$password = 'root';
$options  = [
    \PDO::ATTR_CURSOR                   => \PDO::CURSOR_FWDONLY,
    \PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
    \PDO::MYSQL_ATTR_INIT_COMMAND       => "SET CHARACTER SET latin1",
];


$dbManager = new \PDO( $dsn, $user, $password, $options );

$databasesToConvert = [ 'database1',/** database3, ... */ ];
$typesToConvert     = [ 'char', 'varchar', 'tinytext', 'mediumtext', 'text', 'longtext' ];

foreach ( $databasesToConvert as $database )
{
    echo $database, ":\n";
    echo str_repeat( '=', strlen( $database ) + 1 ), "\n";

    $dbManager->exec( "USE `{$database}`" );

    echo "converting database to correct locale too ... \n";

    $dbManager->exec("ALTER DATABASE `{$database}` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci");


    $tablesStatement = $dbManager->query( "SHOW TABLES" );
    while ( ($table = $tablesStatement->fetchColumn()) )
    {
        echo "Table: {$table}:\n";
        echo str_repeat( '-', strlen( $table ) + 8 ), "\n";

        $columnsToConvert = [ ];

        $columsStatement = $dbManager->query( "DESCRIBE `{$table}`" );

        while ( ($tableInfo = $columsStatement->fetch( \PDO::FETCH_ASSOC )) )
        {
            $column = $tableInfo['Field'];
            echo ' * ' . $column . ': ' . $tableInfo['Type'];

            $type = preg_replace( "#\(\d+\)#", '', $tableInfo['Type'] );

            if ( in_array( $type, $typesToConvert ) )
            {
                echo " => must be converted\n";

                $columnsToConvert[] = $column;
            }
            else
            {
                echo " => not relevant\n";
            }
        }


        //convert table also!!!
        $convert = "ALTER TABLE `{$table}` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci";

        echo "\n", $convert, "\n";
        $dbManager->exec( $convert );
        $databaseErrors = $dbManager->errorInfo();
        if( !empty($databaseErrors[1]) ){
            echo "\n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." \n";
            exit;
        }


        if ( !empty($columnsToConvert) )
        {
            $converts = array_map(
                function ( $column )
                {
                    //return "`{$column}` = IFNULL(CONVERT(CAST(CONVERT(`{$column}` USING latin1) AS binary) USING utf8mb4),`{$column}`)";
                    return "`{$column}` = CONVERT(BINARY(CONVERT(`{$column}` USING latin1)) USING utf8mb4)";
                },
                $columnsToConvert
            );

            $query = "UPDATE IGNORE `{$table}` SET " . join( ', ', $converts );

            //alternative
            // UPDATE feedback SET reply = CONVERT(BINARY(CONVERT(reply USING latin1)) USING utf8mb4) WHERE feedback_id = 15015;


            echo "\n", $query, "\n";


            $dbManager->exec( $query );

            $databaseErrors = $dbManager->errorInfo();
            if( !empty($databaseErrors[1]) ){
                echo "\n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." \n";
                exit;
            }
        }

        echo "\n--\n";
    }

    echo "\n";
}
 


#10 楼

运行以下SQL语句以获取用于更新字符集和排序规则的结果语句:

SET @database = "your_database_name";

SET @charset = "utf8mb4";

SET @collate = "utf8mb4_0900_ai_ci";

SELECT "SET foreign_key_checks = 0;"
UNION ALL
SELECT concat(
  "ALTER DATABASE `",
  `SCHEMA_NAME`,
  "` CHARACTER SET = ",
  @charset,
  " COLLATE = ",
  @collate,
  ";"
) AS `sql`
FROM `information_schema`.`SCHEMATA`
WHERE `SCHEMA_NAME` = @database
  AND (
    `DEFAULT_CHARACTER_SET_NAME` <> @charset
    OR `DEFAULT_COLLATION_NAME` <> @collate
  )
UNION ALL
SELECT concat(
  "ALTER TABLE `",
  `TABLE_SCHEMA`,
  "`.`",
  `TABLE_NAME`,
  "` CONVERT TO CHARACTER SET ",
  @charset,
  " COLLATE ",
  @collate,
  ";"
) AS `sql`
FROM `information_schema`.`TABLES`
WHERE `TABLE_SCHEMA` = @database
  AND `TABLE_TYPE` = "BASE TABLE"
  AND `TABLE_COLLATION` <> @collate
UNION ALL
SELECT concat(
  "ALTER TABLE `",
  c.`TABLE_SCHEMA`,
  "`.`",
  c.`TABLE_NAME`,
  "` CHANGE `",
  c.`COLUMN_NAME`,
  "` `",
  c.`COLUMN_NAME`,
  "` ",
  c.`COLUMN_TYPE`,
  " CHARACTER SET ",
  @charset,
  " COLLATE ",
  @collate,
  if(c.`IS_NULLABLE`="YES", " NULL", " NOT NULL"),
  ";"
) AS `sql`
FROM `information_schema`.`COLUMNS` c,
  `information_schema`.`TABLES` t,
  `information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` a
WHERE c.`TABLE_SCHEMA` = t.`TABLE_SCHEMA`
  AND c.`TABLE_NAME` = t.`TABLE_NAME`
  AND t.`TABLE_COLLATION` = a.`COLLATION_NAME`
  AND c.`TABLE_SCHEMA` = @database
  AND c.`DATA_TYPE` IN (
    'varchar',
    'char',
    'text',
    'tinytext',
    'mediumtext',
    'longtext'
  )
  AND (
    c.`CHARACTER_SET_NAME` <> a.`CHARACTER_SET_NAME`
    OR c.`COLLATION_NAME` <> t.`TABLE_COLLATION`
  )
  AND t.`TABLE_TYPE` = "BASE TABLE"
UNION ALL
SELECT "SET foreign_key_checks = 1;";


注意:


仅更改与给定字符集或排序规则不一致的数据库,表或列。
不会将视图视为常规表。
通过关闭相关检查来避免外键冲突。
在运行上述SQL语句之前,请相应地更新变量值。


默认排序规则utf8mb4_0900_ai_ci可能对您的首选语言不友好。例如,它认为半角括号()与全角括号()相同,并且在某些情况下会引起麻烦。
运行SHOW COLLATION WHERE CHARSET = 'utf8mb4' AND COLLATION LIKE 'utf8mb4%0900%';并选择适合您需要的排序规则。