是否有一种优雅的方法可以在MySQL数据库中进行高性能的自然排序?

例如,如果我有以下数据集:


最终幻想
Final Fantasy 4
Final Fantasy 10
Final Fantasy 12
Final Fantasy 12:Promathia的链条
Final Fantasy Adventure
Final Fantasy Origins
Final Fantasy Tactics

除了将游戏名称拆分成各个组成部分以外,还有其他优雅的解决方案



标题:“最终幻想”

数字:“ 12”

字幕:“ Promathia的链子”

,以确保它们以正确的顺序出现? (4点后10点,而不是2点之前)。

这样做是痛苦的,因为不时有另一款游戏打破了解析游戏标题的机制(例如“战锤40,000 ”,“詹姆斯·邦德007”)

评论

Promathia链与11有关。

MySQL'Order By'的可能重复项-正确排序字母数字

相关:stackoverflow.com/questions/48600059/…

#1 楼

我认为这就是为什么很多事情都按发布日期排序的原因。
一种解决方案是在表中为“ SortKey”创建另一列。这可能是标题的经过净化处理的版本,与您创建的易于排序的图案或计数器相符。

评论


我只是为该stackoverflow.com/a/47522040/935122写了一个类

–基督徒
17年11月28日在17:34

这绝对是正确的方法,但仅靠它本身并不是一个答案!

–Doin
19-09-29在9:58

#2 楼

这是一个快速的解决方案:

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric


评论


如果一切都是“最终幻想”,那很好,但它将“高飞”放在FF套件之前。

–fortboise
2011-12-7 19:22

此解决方案并非始终有效。有时会破裂。您应该宁愿使用这个:stackoverflow.com/a/12257917/384864

– Borut Tomazin
13年1月27日在17:38

将Kludge逐个堆积:SELECT字母数字,整数,来自sorting_test ORDER BY SOUNDEX(字母数字),LENGTH(字母数字),字母数字。如果这完全奏效,那是因为SOUNDEX方便地丢弃了这些数字,从而确保了例如apple1在z1之前。

–offby1
14-10-9在20:18



很好的解决方案,谢谢,尽管我不得不切换字母数字,长度(字母数字)以避免在“最终幻想”之前使用“高飞”

–已分配
14-10-29在16:26



@ offby1建议仅在文本是100%用英语书写时才有效,因为SOUNDEX()旨在仅对英语单词正确工作。

–雷蒙德·尼兰(Raymond Nijland)
19年8月16日在16:37



#3 楼

刚刚发现了这一点:

SELECT names FROM your_table ORDER BY games + 0 ASC


数字在最前面时是自然排序,也可能在中间起作用。

评论


我没有尝试过,但我对此表示严重怀疑。它与前面的数字一起使用的原因是因为游戏是在数字上下文中使用的,因此在进行比较之前会转换为数字。如果在中间,它将始终转换为0,并且排序将变为伪随机。

– manixrock
2011年5月3日13:59

这不是自然的排序。而是看一下这个可行的解决方案:stackoverflow.com/a/12257917/384864

– Borut Tomazin
13年1月27日在17:39

@fedir这对我也很好。我什至不完全确定为什么会这样。有机会解释markletp吗?

–BizNuge
2014年4月4日在13:12

刚刚对此进行了快速调查,我明白了。我什至没有意识到MySQL仅通过在字符串上使用数学运算符就可以进行这种类型的转换!很酷的事情是,在要“ cast”的字符串的开头没有整数的情况下,它仅返回zer0。谢谢你! --->选择地址,(ADDRESS * 1)as _cast from场所,其中邮政编码按“ NE1%”顺序按地址排序* 1 ASC,地址限制为100000;

–BizNuge
2014年4月4日在13:34

当数字位于中间时,例如“最终幻想100”或“最终幻想2”,这实际上不起作用。 《最终幻想100》将首次亮相。但是,当整数是第一个“ 100 Final Fantasy”时,它将起作用

– dwenaus
16-3-2在21:54

#4 楼

与@plalx发布的功能相同,但重写为MySQL:

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;


用法:

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")


评论


这是唯一有效的解决方案。我还测试了drupals代码,但有时会失败。谢啦!

– Borut Tomazin
13年1月27日在17:36

有人在10万以上的大桌子上使用它吗?

– Mark Steudel
17年1月31日在23:58

@MarkSteudel我们使用与该函数相似的函数(尽管不是精确函数)对多个表进行自然排序,其中最大表为500万行。但是,我们不会在查询中直接调用它,而是使用它来设置nat_name列的值。每当行更新时,我们都会使用触发器来运行该函数。这种方法使您可以自然排序,而没有实际的性能成本,但会增加额外的列。

–雅各布
19年1月18日在22:39

这可以在字母之前对数字进行排序,并且可以使用hook_views_query_alter在Drupal中实现,并且可以使用以下类似的方式if($ query-> orderby [0] [“ field”] ===“ node_field_data.title”){$ orderBySql = “ udf_NaturalSortFormat(node_field_data.title,10,'。')”; $ query-> orderby = []; $ query-> addOrderBy(NULL,$ orderBySql,$ query-> orderby [0] [“ direction”],'title_natural'); array_unshift($ query-> orderby,end($ query-> orderby)); }

–realgt
19年7月22日在16:43

#5 楼

之前我已经为MSSQL 2000编写了此函数:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO


评论


@MarkSteudel您必须试一试并自己进行测试。更糟糕的是,您总是可以缓存格式化的值。这可能是我要处理大型表的原因,因为您也可以为该字段建立索引。

– plalx
17年2月1日,0:48



#6 楼

MySQL不允许这种“自然排序”,因此,获得所需信息的最好方法似乎是按照上述方法拆分数据集(单独的id字段等),否则将失败然后,基于非标题元素,数据库中的索引元素(日期,数据库中插入的ID等)执行排序。

让数据库为您进行排序几乎总是在进行比将大型数据集读取到您选择的编程语言中并对其进行排序要快,因此,如果您对此处的db模式完全有任何控制权,那么可以考虑如上所述添加容易排序的字段,这样可以节省从长远来看,您会遇到很多麻烦和维护问题。

在MySQL错误和讨论论坛上,有时会出现添加“自然排序”的请求,许多解决方案都围绕着剥离特定的部分数据并将其转换为查询的ORDER BY部分,例如

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned) 


这种解决方案几乎可以实现可以在上面的“最终幻想”示例中使用,但不是特别灵活,恐怕无法干净地扩展到包括“战锤40,000”和“詹姆斯·邦德007”在内的数据集。

#7 楼

因此,虽然我知道您已经找到了满意的答案,但是我已经为这个问题苦苦挣扎了一段时间,并且我们先前确定它在SQL中不能做得很好,我们将不得不在JSON上使用javascript数组。

这就是我仅使用SQL即可解决的方法。希望这对其他人有帮助:

我有如下数据:

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2


我实际上并没有“投射”东西,尽管我想

我首先替换了数据中不变的部分(在本例中为“场景”),然后进行了LPAD整理。这似乎可以很好地使字母字符串以及编号字符串正确排序。

我的ORDER BY子句看起来像:

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')


显然,这对于原本不太统一的问题无济于事-但我想这可能会解决许多其他相关问题,因此将其解决。

评论


LPAD()提示非常有用。我可以对单词和数字进行排序,使用LPAD可以自然地对数字进行排序。并使用CONCAT我忽略非数字。我的查询如下所示(别名是要排序的列):IF(CONCAT(“”,alias * 1)= alias,LPAD(alias,5,“ 0”),alias)ASC; 👍

– Kai Noack
2月5日6:44



#8 楼


在表格中添加一个排序键(等级)。 ORDER BY rank
使用“发布日期”列。 ORDER BY release_date
从SQL提取数据时,让您的对象进行排序,例如,如果提取到Set中,使其成为TreeSet,并让您的数据模型实现Comparable并在此处制定自然排序算法(插入排序将如果您使用的是不带集合的语言,就足够了),因为您将在创建模型并将其插入集合时从SQL逐行读取行。


#9 楼

关于Richard Toth的最佳响应,请注意https://stackoverflow.com/a/12257917/4052357

请注意包含2字节(或更多)字符和数字的UTF8编码字符串,例如

12 南新宿


LENGTH()函数中使用MySQL的udf_NaturalSortFormat将返回字符串的字节长度并且是不正确的,而是使用CHAR_LENGTH()它将返回正确的字符长度。

在我在使用LENGTH()的情况下导致查询无法完成,导致MySQL的CPU使用率达到100%

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;


ps我会将其添加为原始文档的注释,但我的声誉(尚)不足

#10 楼

为“排序关键字”添加一个字段,该字段将所有数字字符串都零填充到固定长度,然后在该字段上进行排序。

如果您可能有很长的数字字符串,另一种方法是在每个数字字符串前添加数字位数(固定宽度,零填充)。例如,如果连续的位数不超过99位,则对于“ Super Blast 10 Ultra”,排序键将为“ Super Blast 0210 Ultra”。

#11 楼

要订购:
0
1
2
10
23
205
1000
a
aac
b
casdsadsa
css

使用此查询:

SELECT 
    column_name 
FROM 
    table_name 
ORDER BY
    column_name REGEXP '^\d*[^\da-z&\.\' \-\"\!\@\#$\%\^\*\(\)\;\:\,\?\/\~\`\|\_\-]' DESC, 
    column_name + 0, 
    column_name;


评论


不幸的是,如果您在其中添加值(例如a1,a2,a11等),此操作将失败。

–random_user_name
17年6月8日14:25



#12 楼

如果您不想重新发明轮子或者对很多无法使用的代码感到头疼,只需使用Drupal Natural Sort ...即可运行压缩的SQL(MySQL或Postgre),仅此而已。进行查询时,只需使用以下命令订购:

... ORDER BY natsort_canon(column_name, 'natural')


评论


为此,我一直在尝试各种解决方案(哈哈,看看我在这里做了什么?),但是这些解决方案都无法真正解决我拥有的所有数据。 drupal功能就像一个魅力。感谢您的发布。

–本·希区柯克
18年3月13日在8:15

这有效,但在末尾对数字进行排序(A-Z然后是0-9)

–realgt
19年7月22日在15:42

#13 楼

另一种选择是从mysql提取数据后在内存中进行排序。从性能的角度来看,这并不是最佳选择,但如果您不对庞大的列表进行排序,则应该没问题。

如果您查看Jeff的文章,您会发现很多算法可以解决您使用的任何语言。
对人类的排序:自然排序顺序

#14 楼

您还可以动态创建“排序列”:

SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum 
FROM table 
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name


这样,您可以创建要排序的组。

在我的查询中,我想在所有内容前面加上“-”,然后是数字,然后是文本。这可能会导致类似以下内容:

-
0    
1
2
3
4
5
10
13
19
99
102
Chair
Dog
Table
Windows


这样,您在添加数据时不必按正确的顺序维护排序列。您还可以根据需要更改排序顺序。

评论


我不知道这种表现会如何。我一直在使用它,没有任何麻烦。我的数据库不大。

– antoine
13-10-17在12:39

#15 楼

如果您使用的是PHP,则可以在php中进行自然排序。

$keys = array();
$values = array();
foreach ($results as $index => $row) {
   $key = $row['name'].'__'.$index; // Add the index to create an unique key.
   $keys[] = $key;
   $values[$key] = $row; 
}
natsort($keys);
$sortedValues = array(); 
foreach($keys as $index) {
  $sortedValues[] = $values[$index]; 
}


我希望MySQL在将来的版本中实现自然排序,但是功能要求( #1588)自2003年开始营业,所以我不会屏息。

评论


从理论上讲这是可能的,但是我需要先将所有数据库记录读取到我的Web服务器上。

–黑色
2011年3月9日17:33

或者考虑:usort($ mydata,function($ item1,$ item2){return strnatcmp($ item1 ['key'],$ item2 ['key']);}); (我有一个关联数组,并按键排序。)参考:stackoverflow.com/q/12426825/1066234

– Kai Noack
19年11月14日在11:09

#16 楼

@ plaix / Richard Toth / Luke Hoggett的最佳响应的简化非udf版本(仅适用于该字段中的第一个整数)是

SELECT name,
LEAST(
    IFNULL(NULLIF(LOCATE('0', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('1', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('2', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('3', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('4', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('5', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('6', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('7', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('8', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('9', name), 0), ~0)
) AS first_int
FROM table
ORDER BY IF(first_int = ~0, name, CONCAT(
    SUBSTR(name, 1, first_int - 1),
    LPAD(CAST(SUBSTR(name, first_int) AS UNSIGNED), LENGTH(~0), '0'),
    SUBSTR(name, first_int + LENGTH(CAST(SUBSTR(name, first_int) AS UNSIGNED)))
)) ASC


#17 楼

我尝试了几种解决方案,但实际上非常简单:

SELECT test_column FROM test_table ORDER BY LENGTH(test_column) DESC, test_column DESC

/* 
Result 
--------
value_1
value_2
value_3
value_4
value_5
value_6
value_7
value_8
value_9
value_10
value_11
value_12
value_13
value_14
value_15
...
*/


评论


对23-4244格式的数字排序非常有效。谢谢 :)

– Pyton
16年9月1日在9:48

仅适用于此测试数据,因为数字前面的字符串都相同。尝试在其中粘贴值z_99,它将被放在顶部,但z在v之后。

–塞缪尔·内夫(Samuel Neff)
17年8月19日在17:12

@SamuelNeff请参阅SQL:ORDER BY LENGTH(test_column)DESC,test_column DESC,所以是的,因为它将首先按列的长度排序。这可以很好地对表的前缀组进行排序,否则只能使用“ test_column DESC”进行排序

–塔里克
17年8月20日在13:48

#18 楼

我在这里(以及重复出现的问题)中看到的许多其他答案基本上只适用于格式非常特殊的数据,例如一个完全是数字的字符串,或者具有固定长度的字母前缀的字符串。在一般情况下这是行不通的。

的确,在MySQL中并没有实现100%常规nat-sort的任何方法,因为要做到这一点,您真正需要的是修改后的比较功能,当遇到数字时,可在字符串的字典排序排序和数字排序之间切换。这样的代码可以实现您希望用于识别和比较两个字符串中的数字部分的任何算法。但是,不幸的是,MySQL中的比较功能是其代码的内部功能,用户无法更改。

这留下了某种形式的hack,您尝试在其中为字符串创建排序键在其中重新格式化数字部分,以便标准的词典编目排序实际上按照您想要的方式对它们进行排序。

对于最大位数不超过1的纯整数,显而易见的解决方案是简单地将数字左移-用零填充它们,以便它们都是固定宽度。这是Drupal插件采用的方法,也是@plalx / @RichardToth的解决方案。 (@Christian有一个不同且复杂得多的解决方案,但它没有我看到的优点)。

@tye指出,您可以通过在前面加上一个固定的数字长度来改善这一点。每个数字,而不是简单地左填充它。尽管有本质上笨拙的hack的局限性,但是您还有很多可以改进的地方。但是,似乎没有任何预构建的解决方案!例如,关于:


加号和减号? +10 vs 10 vs -10
小数? 8.2、8.5、1.006,.75
前导零? 020、030、00000922
千位分隔符? “ 1,001 Dalmations”与“ 1001 Dalmations”
版本号? MariaDB v10.3.18与MariaDB v10.3.3
数字很​​长? 103,768,276,592,092,364,859,236,487,687,870,234,598.55


扩展@tye的方法,我创建了一个相当紧凑的NatSortKey()存储函数,该函数将任意字符串转换为nat-sort键,并处理所有在上述情况下,效率相当高,并且保留了总排序顺序(没有两个不同的字符串具有比较相等的排序键)。第二个参数可用于限制每个字符串中处理的数字数量(例如,限制为前10个数字),这可用于确保输出适合给定长度。

注意:使用此第二个参数的给定值生成的排序键字符串仅应与使用相同参数值的其他字符串进行排序,否则它们可能无法正确排序!

您可以使用它直接排序,例如

SELECT myString FROM myTable ORDER BY NatSortKey(myString,0);  ### 0 means process all numbers - resulting sort key might be quite long for certain inputs


但是为了对大型表进行有效排序,最好将排序键预先存储在另一列中(可能带有索引) :

INSERT INTO myTable (myString,myStringNSK) VALUES (@theStringValue,NatSortKey(@theStringValue,10)), ...
...
SELECT myString FROM myTable ORDER BY myStringNSK;


[理想情况下,您可以通过以下方式将键列创建为计算的存储列来自动进行此操作:

CREATE TABLE myTable (
...
myString varchar(100),
myStringNSK varchar(150) AS (NatSortKey(myString,10)) STORED,
...
KEY (myStringNSK),
...);


但是目前,MySQL和MariaDB都不允许在计算列中存储函数,因此很遗憾,您还不能这样做。]


我的功能会影响numbe的排序仅限于rs。如果要执行其他排序规范化操作,例如删除所有标点符号或修剪两端的空白或用单个空格替换多个空白序列,则可以扩展该函数,也可以在NatSortKey()之前或之后执行应用于您的数据。 (我建议为此目的使用REGEXP_REPLACE()。)

我认为“。”在某种程度上也是以英语为中心的。小数点和','代表千位分隔符,但是如果您想要反向显示,或者希望将其作为参数进行切换,应该足够容易地进行修改。

可以通过其他方式进一步改善;例如,它当前按绝对值对负数进行排序,因此-1位于-2之前,而不是相反。也没有办法为数字指定DESC排序顺序,同时保留文本的ASC字典顺序。这两个问题可以通过更多的工作来解决。如果有时间,我会更新代码。

还有很多其他细节需要注意-包括对您所使用的字符集和排序规则的一些关键依赖性-但是我将它们全部放入SQL代码的注释块中。在自己使用该函数之前,请仔细阅读!

所以,这是代码。如果您发现错误,或有我未提及的改进,请在评论中让我知道!


delimiter $$
CREATE DEFINER=CURRENT_USER FUNCTION NatSortKey (s varchar(100), n int) RETURNS varchar(350) DETERMINISTIC
BEGIN
/****
  Converts numbers in the input string s into a format such that sorting results in a nat-sort.
  Numbers of up to 359 digits (before the decimal point, if one is present) are supported.  Sort results are undefined if the input string contains numbers longer than this.
  For n>0, only the first n numbers in the input string will be converted for nat-sort (so strings that differ only after the first n numbers will not nat-sort amongst themselves).
  Total sort-ordering is preserved, i.e. if s1!=s2, then NatSortKey(s1,n)!=NatSortKey(s2,n), for any given n.
  Numbers may contain ',' as a thousands separator, and '.' as a decimal point.  To reverse these (as appropriate for some European locales), the code would require modification.
  Numbers preceded by '+' sort with numbers not preceded with either a '+' or '-' sign.
  Negative numbers (preceded with '-') sort before positive numbers, but are sorted in order of ascending absolute value (so -7 sorts BEFORE -1001).
  Numbers with leading zeros sort after the same number with no (or fewer) leading zeros.
  Decimal-part-only numbers (like .75) are recognised, provided the decimal point is not immediately preceded by either another '.', or by a letter-type character.
  Numbers with thousand separators sort after the same number without them.
  Thousand separators are only recognised in numbers with no leading zeros that don't immediately follow a ',', and when they format the number correctly.
  (When not recognised as a thousand separator, a ',' will instead be treated as separating two distinct numbers).
  Version-number-like sequences consisting of 3 or more numbers separated by '.' are treated as distinct entities, and each component number will be nat-sorted.
  The entire entity will sort after any number beginning with the first component (so e.g. 10.2.1 sorts after both 10 and 10.995, but before 11)
  Note that The first number component in an entity like this is also permitted to contain thousand separators.

  To achieve this, numbers within the input string are prefixed and suffixed according to the following format:
  - The number is prefixed by a 2-digit base-36 number representing its length, excluding leading zeros.  If there is a decimal point, this length only includes the integer part of the number.
  - A 3-character suffix is appended after the number (after the decimals if present).
    - The first character is a space, or a '+' sign if the number was preceded by '+'.  Any preceding '+' sign is also removed from the front of the number.
    - This is followed by a 2-digit base-36 number that encodes the number of leading zeros and whether the number was expressed in comma-separated form (e.g. 1,000,000.25 vs 1000000.25)
    - The value of this 2-digit number is: (number of leading zeros)*2 + (1 if comma-separated, 0 otherwise)
  - For version number sequences, each component number has the prefix in front of it, and the separating dots are removed.
    Then there is a single suffix that consists of a ' ' or '+' character, followed by a pair base-36 digits for each number component in the sequence.

  e.g. here is how some simple sample strings get converted:
  'Foo055' --> 'Foo0255 02'
  'Absolute zero is around -273 centigrade' --> 'Absolute zero is around -03273 00 centigrade'
  'The ,000,000 prize' --> 'The 1000000 01 prize'
  '+99.74 degrees' --> '0299.74+00 degrees'
  'I have 0 apples' --> 'I have 00 02 apples'
  '.5 is the same value as 0000.5000' --> '00.5 00 is the same value as 00.5000 08'
  'MariaDB v10.3.0018' --> 'MariaDB v02100130218 000004'

  The restriction to numbers of up to 359 digits comes from the fact that the first character of the base-36 prefix MUST be a decimal digit, and so the highest permitted prefix value is '9Z' or 359 decimal.
  The code could be modified to handle longer numbers by increasing the size of (both) the prefix and suffix.
  A higher base could also be used (by replacing CONV() with a custom function), provided that the collation you are using sorts the "digits" of the base in the correct order, starting with 0123456789.
  However, while the maximum number length may be increased this way, note that the technique this function uses is NOT applicable where strings may contain numbers of unlimited length.

  The function definition does not specify the charset or collation to be used for string-type parameters or variables:  The default database charset & collation at the time the function is defined will be used.
  This is to make the function code more portable.  However, there are some important restrictions:

  - Collation is important here only when comparing (or storing) the output value from this function, but it MUST order the characters " +0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" in that order for the natural sort to work.
    This is true for most collations, but not all of them, e.g. in Lithuanian 'Y' comes before 'J' (according to Wikipedia).
    To adapt the function to work with such collations, replace CONV() in the function code with a custom function that emits "digits" above 9 that are characters ordered according to the collation in use.

  - For efficiency, the function code uses LENGTH() rather than CHAR_LENGTH() to measure the length of strings that consist only of digits 0-9, '.', and ',' characters.
    This works for any single-byte charset, as well as any charset that maps standard ASCII characters to single bytes (such as utf8 or utf8mb4).
    If using a charset that maps these characters to multiple bytes (such as, e.g. utf16 or utf32), you MUST replace all instances of LENGTH() in the function definition with CHAR_LENGTH()

  Length of the output:

  Each number converted adds 5 characters (2 prefix + 3 suffix) to the length of the string. n is the maximum count of numbers to convert;
  This parameter is provided as a means to limit the maximum output length (to input length + 5*n).
  If you do not require the total-ordering property, you could edit the code to use suffixes of 1 character (space or plus) only; this would reduce the maximum output length for any given n.
  Since a string of length L has at most ((L+1) DIV 2) individual numbers in it (every 2nd character a digit), for n<=0 the maximum output length is (inputlength + 5*((inputlength+1) DIV 2))
  So for the current input length of 100, the maximum output length is 350.
  If changing the input length, the output length must be modified according to the above formula.  The DECLARE statements for x,y,r, and suf must also be modified, as the code comments indicate.
****/
  DECLARE x,y varchar(100);            # need to be same length as input s
  DECLARE r varchar(350) DEFAULT '';   # return value:  needs to be same length as return type
  DECLARE suf varchar(101);   # suffix for a number or version string. Must be (((inputlength+1) DIV 2)*2 + 1) chars to support version strings (e.g. '1.2.33.5'), though it's usually just 3 chars. (Max version string e.g. 1.2. ... .5 has ((length of input + 1) DIV 2) numeric components)
  DECLARE i,j,k int UNSIGNED;
  IF n<=0 THEN SET n := -1; END IF;   # n<=0 means "process all numbers"
  LOOP
    SET i := REGEXP_INSTR(s,'\d');   # find position of next digit
    IF i=0 OR n=0 THEN RETURN CONCAT(r,s); END IF;   # no more numbers to process -> we're done
    SET n := n-1, suf := ' ';
    IF i>1 THEN
      IF SUBSTRING(s,i-1,1)='.' AND (i=2 OR SUBSTRING(s,i-2,1) RLIKE '[^.\p{L}\p{N}\p{M}\x{608}\x{200C}\x{200D}\x{2100}-\x{214F}\x{24B6}-\x{24E9}\x{1F130}-\x{1F149}\x{1F150}-\x{1F169}\x{1F170}-\x{1F189}]') AND (SUBSTRING(s,i) NOT RLIKE '^\d++\.\d') THEN SET i:=i-1; END IF;   # Allow decimal number (but not version string) to begin with a '.', provided preceding char is neither another '.', nor a member of the unicode character classes: "Alphabetic", "Letter", "Block=Letterlike Symbols" "Number", "Mark", "Join_Control"
      IF i>1 AND SUBSTRING(s,i-1,1)='+' THEN SET suf := '+', j := i-1; ELSE SET j := i; END IF;   # move any preceding '+' into the suffix, so equal numbers with and without preceding "+" signs sort together
      SET r := CONCAT(r,SUBSTRING(s,1,j-1)); SET s = SUBSTRING(s,i);   # add everything before the number to r and strip it from the start of s; preceding '+' is dropped (not included in either r or s)
    END IF;
    SET x := REGEXP_SUBSTR(s,IF(SUBSTRING(s,1,1) IN ('0','.') OR (SUBSTRING(r,-1)=',' AND suf=' '),'^\d*+(?:\.\d++)*','^(?:[1-9]\d{0,2}(?:,\d{3}(?!\d))++|\d++)(?:\.\d++)*+'));   # capture the number + following decimals (including multiple consecutive '.<digits>' sequences)
    SET s := SUBSTRING(s,LENGTH(x)+1);   # NOTE: LENGTH() can be safely used instead of CHAR_LENGTH() here & below PROVIDED we're using a charset that represents digits, ',' and '.' characters using single bytes (e.g. latin1, utf8)
    SET i := INSTR(x,'.');
    IF i=0 THEN SET y := ''; ELSE SET y := SUBSTRING(x,i); SET x := SUBSTRING(x,1,i-1); END IF;   # move any following decimals into y
    SET i := LENGTH(x);
    SET x := REPLACE(x,',','');
    SET j := LENGTH(x);
    SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
    SET k := LENGTH(x);
    SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294) + IF(i=j,0,1),10,36),2,'0'));   # (j-k)*2 + IF(i=j,0,1) = (count of leading zeros)*2 + (1 if there are thousands-separators, 0 otherwise)  Note the first term is bounded to <= base-36 'ZY' as it must fit within 2 characters
    SET i := LOCATE('.',y,2);
    IF i=0 THEN
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x,y,suf);   # k = count of digits in number, bounded to be <= '9Z' base-36
    ELSE   # encode a version number (like 3.12.707, etc)
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
      WHILE LENGTH(y)>0 AND n!=0 DO
        IF i=0 THEN SET x := SUBSTRING(y,2); SET y := ''; ELSE SET x := SUBSTRING(y,2,i-2); SET y := SUBSTRING(y,i); SET i := LOCATE('.',y,2); END IF;
        SET j := LENGTH(x);
        SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
        SET k := LENGTH(x);
        SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
        SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294),10,36),2,'0'));   # (j-k)*2 = (count of leading zeros)*2, bounded to fit within 2 base-36 digits
        SET n := n-1;
      END WHILE;
      SET r := CONCAT(r,y,suf);
    END IF;
  END LOOP;
END
$$
delimiter ;


评论


我是MySQL的初学者,并尝试过此方法。收到此错误:“#1305-FUNCTION mydatabase.REGEXP_INSTR不存在”。任何想法?

– John T
19年11月7日在23:09

对于那里的任何其他新手。我没有安装MySQL 8.0。 REGEXP_INSTR(和其他REGEXP东西)需要它。

– John T
19年11月8日,0:40

刚刚修复了NatSortKey中的一个严重错误:存在不正确的正则表达式字符。如果您自己使用过此功能,请更新您的代码!

–Doin
5月19日6:27

#19 楼

也有natsort。它打算作为drupal插件的一部分,但可以独立运行。

#20 楼

如果标题仅具有版本号,这是一个简单的例子:
ORDER BY CAST(REGEXP_REPLACE(title, "[a-zA-Z]+", "") AS INT)';

如果使用模式(此模式在版本之前使用#),则可以使用简单的SQL:
create table titles(title);

insert into titles (title) values 
('Final Fantasy'),
('Final Fantasy #03'),
('Final Fantasy #11'),
('Final Fantasy #10'),
('Final Fantasy #2'),
('Bond 007 ##2'),
('Final Fantasy #01'),
('Bond 007'),
('Final Fantasy #11}');

select REGEXP_REPLACE(title, "#([0-9]+)", "\1") as title from titles
ORDER BY REGEXP_REPLACE(title, "#[0-9]+", ""),
CAST(REGEXP_REPLACE(title, ".*#([0-9]+).*", "\1") AS INT);     
+-------------------+
| title             |
+-------------------+
| Bond 007          |
| Bond 007 #2       |
| Final Fantasy     |
| Final Fantasy 01  |
| Final Fantasy 2   |
| Final Fantasy 03  |
| Final Fantasy 10  |
| Final Fantasy 11  |
| Final Fantasy 11} |
+-------------------+
8 rows in set, 2 warnings (0.001 sec)

如果需要,您可以使用其他模式。
例如,如果您有一部电影“ I'm#1”和“ I'm#1 part 2”,则可以包装该版本,例如“最终幻想{11}”

#21 楼

我知道这个主题很古老,但是我想我已经找到了一种方法来解决这个问题:

SELECT * FROM `table` ORDER BY 
CONCAT(
  GREATEST(
    LOCATE('1', name),
    LOCATE('2', name),
    LOCATE('3', name),
    LOCATE('4', name),
    LOCATE('5', name),
    LOCATE('6', name),
    LOCATE('7', name),
    LOCATE('8', name),
    LOCATE('9', name)
   ),
   name
) ASC


废话说,它对以下集合进行了不正确的排序(这是无用的笑声) :

最终幻想1
最终幻想2
最终幻想5
最终幻想7
最终幻想7:降临之子
最终幻想12
最终幻想112
FF1

评论


为什么不删除此答案?你会为此得到一个徽章

– m47730
17年1月25日在8:06