前两天在群里看到同事反馈一个空格问题,大致现象如下:
mysql> select @@version; +-----------+ | @@version | +-----------+ | 8.0.25 | +-----------+ 1 row in set (0.00 sec) mysql> create table t1( -> c1 int, -> c2 varchar(4) check(c2<>'') #单引号之间无空格 -> )engine=innodb; Query OK, 0 rows affected (0.21 sec) mysql> insert into t1 select 1,' '; #c2字段插入两个空格 ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.
check定义c2<>’’,往c2字段插入空格,提示违反check约束。
为什么insert语句中的’ ‘(单引号之间有一个或多个空格)会被判断为’’(单引号之间无空格),导致插入失败?
https://dev.mysql.com/doc/refman/8.0/en/char.html
When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.
VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.
CHAR(N):当插入的字符数小于N,它会在字符串的右边补充空格,直到总字符数达到N再进行存储;当查询返回数据时默认会将字符串尾部的空格去掉,除非SQL_MODE设置PAD_CHAR_TO_FULL_LENGTH(手册显示8.0.13 deprecated,8.0.25还能使用)。
VARCHAR(N):当插入的字符数小于N,它不会在字符串的右边补充空格,insert内容原封不动的进行存储;如果原本字符串右边有空格,在存储和查询返回时都会保留空格。
https://dev.mysql.com/doc/refman/8.0/en/char.html
Values in CHAR, VARCHAR, and TEXT columns are sorted and compared according to the character set collation assigned to the column.
MySQL collations have a pad attribute of PAD SPACE, other than Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD.
对于CHAR、VARCHAR、TEXT字段,排序和比较运算依赖字段上的Collation,Collation的Pad属性控制字符串尾部空格处理方式。
可以通过INFORMATION_SCHEMA.COLLATIONS表,查看Collation所使用的Pad属性:
mysql> select collation_name,pad_attribute from information_schema.collations; +----------------------------+---------------+ | collation_name | pad_attribute | +----------------------------+---------------+ | armscii8_general_ci | PAD SPACE | ... | utf8mb4_0900_bin | NO PAD | +----------------------------+---------------+ 272 rows in set (0.01 sec)
https://dev.mysql.com/doc/refman/8.0/en/charset-binary-collations.html#charset-binary-collations-trailing-space-comparisons
For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings:
• For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.
• NO PAD collations treat trailing spaces as significant in comparisons, like any other character.
“Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant, regardless of collation.
PAD SPACE:在排序和比较运算中,忽略字符串尾部空格。
NO PAD:在排序和比较运算中,字符串尾部空格当成普通字符,不能忽略。
以下操作基于MySQL 8.0.25 社区版
mysql> show full fields in t1; +-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+ | c1 | int | NULL | YES | | NULL | | select,insert,update,references | | | c2 | varchar(4) | utf8mb4_unicode_ci | YES | | NULL | | select,insert,update,references | | +-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+ 2 rows in set (0.00 sec)
c2列的Collation是utf8mb4_unicode_ci。
mysql> select COLLATION_NAME,PAD_ATTRIBUTE from INFORMATION_SCHEMA.COLLATIONS where COLLATION_NAME in('utf8mb4_unicode_ci','utf8mb4_0900_ai_ci'); +--------------------+---------------+ | COLLATION_NAME | PAD_ATTRIBUTE | +--------------------+---------------+ | utf8mb4_0900_ai_ci | NO PAD | | utf8mb4_unicode_ci | PAD SPACE | +--------------------+---------------+ 1 row in set (0.00 sec)
utf8mb4_unicode_ci的Pad属性是PAD SPACE,由2.3可知c2列在排序和比较运算中,忽略字符串尾部空格。
因此check比较时,会将插入的’ '中的空格忽略,显然忽略空格后和check约束存在冲突,插入失败。
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci; Query OK, 0 rows affected (0.00 sec) mysql> select ' ' = ''; +--------+ | ' '='' | +--------+ | 1 | +--------+ 1 row in set (0.00 sec)
这里的常规是指空格就是空格,不应该把空格忽略。只需将c2字段修改为NO PAD的Collation后,就能将空格正常插入:
mysql> insert into t1 select 1,' '; #c2字段插入两个空格 ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated. mysql> alter table t1 modify c2 varchar(4) collate utf8mb4_0900_ai_ci; #修改为NO PAD的Collation Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> insert into t1 select 1,' '; #c2字段插入两个空格 Query OK, 1 row affected (0.12 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> insert into t1 select 1,''; #''之间无空格 ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated. mysql> select c1,c2,hex(c2) from t1; +------+------+---------+ | c1 | c2 | hex(c2) | +------+------+---------+ | 1 | | 2020 | +------+------+---------+ 1 row in set (0.01 sec)
一样。CHAR、VARCHAR、TEXT在做排序和比较运算时,都是依据列的Collation的Pad属性处理字符串尾部的空格。此时拿来做比较运算的字符串是insert中的内容。
创建一张新表并插入数据
mysql> create table t3( -> c1 int, -> c2 char(4) collate utf8mb4_unicode_ci, -> c3 char(4) collate utf8mb4_0900_ai_ci, -> c4 varchar(4) collate utf8mb4_unicode_ci, -> c5 varchar(4) collate utf8mb4_0900_ai_ci -> )engine=innodb; Query OK, 0 rows affected (0.29 sec) mysql> insert into t3 select 1,'a','a','a','a'; Query OK, 1 row affected (0.09 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> insert into t3 select 2,'a ','a ','a ','a '; #各列包含1个空格 Query OK, 1 row affected (0.20 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> insert into t3 select 3,'a ','a ','a ','a '; #前两列3个空格,后两列2个空格 Query OK, 1 row affected (0.17 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> insert into t3 select 4,'a ','a ','a ','a '; #前两列2个空格,后两列3个空格 Query OK, 1 row affected (0.14 sec) Records: 1 Duplicates: 0 Warnings: 0
观察WHERE条件返回结果,CHAR类型的返回受PAD_CHAR_TO_FULL_LENGTH影响(参考2.1)
mysql> set sql_mode=''; Query OK, 0 rows affected (0.00 sec) mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a'; +------+------+------+------+------+---------+---------+----------+----------+ | c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) | +------+------+------+------+------+---------+---------+----------+----------+ | 1 | a | a | a | a | 61 | 61 | 61 | 61 | | 2 | a | a | a | a | 61 | 61 | 6120 | 6120 | | 3 | a | a | a | a | 61 | 61 | 612020 | 612020 | | 4 | a | a | a | a | 61 | 61 | 61202020 | 61202020 | +------+------+------+------+------+---------+---------+----------+----------+ 4 rows in set (0.00 sec) c2 char->返回数据去掉字符串尾部的空格 c2 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格 mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a'; +------+------+------+------+------+---------+---------+----------+----------+ | c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) | +------+------+------+------+------+---------+---------+----------+----------+ | 1 | a | a | a | a | 61 | 61 | 61 | 61 | | 2 | a | a | a | a | 61 | 61 | 6120 | 6120 | | 3 | a | a | a | a | 61 | 61 | 612020 | 612020 | | 4 | a | a | a | a | 61 | 61 | 61202020 | 61202020 | +------+------+------+------+------+---------+---------+----------+----------+ 4 rows in set (0.01 sec) c3 char->返回数据去掉字符串尾部的空格 c3 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符 mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a'; +------+------+------+------+------+---------+---------+----------+----------+ | c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) | +------+------+------+------+------+---------+---------+----------+----------+ | 1 | a | a | a | a | 61 | 61 | 61 | 61 | | 2 | a | a | a | a | 61 | 61 | 6120 | 6120 | | 3 | a | a | a | a | 61 | 61 | 612020 | 612020 | | 4 | a | a | a | a | 61 | 61 | 61202020 | 61202020 | +------+------+------+------+------+---------+---------+----------+----------+ 4 rows in set (0.00 sec) c4 varchar->返回数据保留插入时的空格 c4 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格 mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a'; +------+------+------+------+------+---------+---------+---------+---------+ | c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) | +------+------+------+------+------+---------+---------+---------+---------+ | 1 | a | a | a | a | 61 | 61 | 61 | 61 | +------+------+------+------+------+---------+---------+---------+---------+ 1 row in set (0.00 sec) c5 varchar->返回数据保留插入时的空格 c5 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符 mysql> set sql_mode='PAD_CHAR_TO_FULL_LENGTH'; Query OK, 0 rows affected, 1 warning (0.01 sec) mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a'; +------+------+------+------+------+----------+----------+----------+----------+ | c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) | +------+------+------+------+------+----------+----------+----------+----------+ | 1 | a | a | a | a | 61202020 | 61202020 | 61 | 61 | | 2 | a | a | a | a | 61202020 | 61202020 | 6120 | 6120 | | 3 | a | a | a | a | 61202020 | 61202020 | 612020 | 612020 | | 4 | a | a | a | a | 61202020 | 61202020 | 61202020 | 61202020 | +------+------+------+------+------+----------+----------+----------+----------+ 4 rows in set (0.00 sec) c2 char->PAD_CHAR_TO_FULL_LENGTH->返回数据字符串右边补充空格 c2 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格 mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a'; Empty set (0.00 sec) c3 char->PAD_CHAR_TO_FULL_LENGTH->返回数据字符串右边补充空格 c3 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符 1~4行c3列返回值都包含空格,且c3列的Collation是NO PAD,字符串尾部空格不能忽略,where过滤找不到记录 mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a'; +------+------+------+------+------+----------+----------+----------+----------+ | c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) | +------+------+------+------+------+----------+----------+----------+----------+ | 1 | a | a | a | a | 61202020 | 61202020 | 61 | 61 | | 2 | a | a | a | a | 61202020 | 61202020 | 6120 | 6120 | | 3 | a | a | a | a | 61202020 | 61202020 | 612020 | 612020 | | 4 | a | a | a | a | 61202020 | 61202020 | 61202020 | 61202020 | +------+------+------+------+------+----------+----------+----------+----------+ 4 rows in set (0.00 sec) c4 varchar->返回数据保留插入时的空格 c4 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格 mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a'; +------+------+------+------+------+----------+----------+---------+---------+ | c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) | +------+------+------+------+------+----------+----------+---------+---------+ | 1 | a | a | a | a | 61202020 | 61202020 | 61 | 61 | +------+------+------+------+------+----------+----------+---------+---------+ 1 row in set (0.00 sec) c5 varchar->返回数据保留插入时的空格 c5 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符
此时拿来做比较运算的字符串是Retrieved的内容,CHAR和VARCHAR返回数据时对字符串尾部的空格处理方式不同,并且PAD_CHAR_TO_FULL_LENGTH只影响CHAR类型。
https://dev.mysql.com/doc/refman/8.0/en/char.html
For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters results in a duplicate-key error. For example, if a table contains ‘a’, an attempt to store 'a ’ causes a duplicate-key error.
如果存在唯一索引(单列、字符类型),插入的数据仅在尾部空格个数不同,有可能会报duplicate-key错误:
mysql> select c1,c4,c5,hex(c4),hex(c5) from t3; +------+------+------+----------+----------+ | c1 | c4 | c5 | hex(c4) | hex(c5) | +------+------+------+----------+----------+ | 1 | a | a | 61 | 61 | | 2 | a | a | 6120 | 6120 | | 3 | a | a | 612020 | 612020 | | 4 | a | a | 61202020 | 61202020 | +------+------+------+----------+----------+ 4 rows in set (0.00 sec) mysql> alter table t3 add unique(c4); ERROR 1062 (23000): Duplicate entry 'a' for key 't3.c4' mysql> alter table t3 add unique(c5); Query OK, 0 rows affected (0.44 sec) Records: 0 Duplicates: 0 Warnings: 0
可以看到c4列创建唯一索引失败,c5列创建唯一索引成功。
c4 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格,4行数据重复。
c5 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符,4行数据不同。
Stored
- | CHAR(N) | VARCHAR(N) |
---|---|---|
Stored | 字符不足N右边补空格 | 保留插入时的空格,不会在右边额外补充空格 |
Retrieved
SQL_MODE | CHAR(N) | VARCHAR(N) |
---|---|---|
Default Value | 去掉字符串尾部的空格 | 保留插入时的空格 |
PAD_CHAR_TO_FULL_LENGTH | 返回完整字符串,不足N右边补空格 | 保留插入时的空格 |
Comparison(不包括like)
Pad Attribute | CHAR(N)/VARCHAR(N) |
---|---|
PAD SPACE | 忽略字符串尾部空格 |
NO PAD | 字符串尾部空格当成普通字符,不能忽略 |
Enjoy GreatSQL :)