有人在Meta上建议2015年7月的社区挑战赛,以使其机器人可以查找标题错误的问题。我编写此查询是为了帮助其他人找到示例来帮助他们编写代码。

我在编写文档时着重指出了文档的重点,以便SQL初学者仍然可以轻松地对其进行更改。有改进的空间吗?

这里是SEDE上的查询

DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');

SELECT 
/* Uncomment below if you want to limit the results */
/* TOP 100 */
    Posts.Id AS [Post Link]
  /* only for sanity check in case the filter is removed from WHERE clause */
  , CASE 
      WHEN Posts.PostTypeId = @Question THEN 'Question'
      WHEN Posts.PostTypeId = @Answer THEN 'Answer'
      ELSE 'Other' END 
      AS [Post Type]
  , Posts.Score
  , Posts.CreationDate
  , Users.Id AS [User Link]
  , Users.Reputation
FROM Posts 
INNER JOIN Users
  ON Posts.OwnerUserId = Users.Id
WHERE
  /* Filter out non-question posts */
  Posts.PostTypeId = @Question

  /* Generic title */
  AND (
    Posts.Title LIKE '%please%' OR
    Posts.Title LIKE '%review%' OR
    Posts.Title LIKE '%improve%' OR
    Posts.Title LIKE '%better%' OR
    Posts.Title LIKE '%how to%'OR
    Posts.Title LIKE '%how can i%' OR
    Posts.Title LIKE '%how do i%'
  )

  /* Post not closed yet */
  AND Posts.ClosedDate IS NULL

  /* Question doesn't have positive score */
  /* AND Posts.Score <= 0 */

  /* User has low reputation */
  AND Users.Reputation <= 500

ORDER BY

  /* Lowest scores first - Change to DESC if you want Highest first */
  Posts.Score ASC

  /* Newest first - Change to ASC if you want oldest first */
  , Posts.CreationDate DESC


评论

您确实意识到自己的帖子将被识别为标题不对...对吗?

由于尚未满足rep-require的要求,所以现在还不是@rolfl。

@rolfl你觉得呢? ;-)

这已经是我见过的一个备受质疑的问题的最差标题。做得好,SQL专家

哦,我知道您为什么选择了不好的称呼。不过,这仍然很糟糕。 (好吧,除了右边的小括号)

#1 楼

您的查询结构合理且一致,但是有一个问题您未能纳入....并非所有的不良问题都有用户。迁移问题或删除用户后,问题可能没有链接回Users表。

这需要对用户的外部联接。

此外,我已经发现CTE表达式非常适合别名列名。例如,“帖子”和“用户”表都具有一个ID列,并且您需要在不同的位置放置Posts.IDUsers.ID。两者也都有一个CreatedDate。当这些别名变得冗长而复杂时,它将使查询的可读性陷入困境。

关于可读性,case语句就是无效代码。不要使用无用的代码,否则,应在部署之前删除“调试代码”。如果您不信任自己拥有正确的查询条件,那么您也将不信任自己拥有正确的调试代码。

因此,如果您在CTE中为这些值加上别名,并使用CTE为了使外部联接变得整洁,您可以将查询简化为:

SELECT PostID as [Post Link],
       Score,
       PostDate,
       UserID as [User Link],
       Reputation
FROM BadTitles
LEFT OUTER JOIN Users on UserId = Id
ORDER BY Score ASC, PostDate DESC


好吧,这是一个骗局,因为BadTitles是CTE,但是完整的查询使用您扩展后的注释为:

DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');

with BadTitles as (
    SELECT Id as PostId,
           OwnerUserId as UserId,
           Score,
           CreationDate as PostDate
    FROM Posts
    WHERE 
      /* Filter out non-question posts */
          Posts.PostTypeId = @Question

      /* Post not closed yet */
      AND Posts.ClosedDate IS NULL

      /* Generic title */
      AND (
            Posts.Title LIKE '%please%'
        OR  Posts.Title LIKE '%review%'
        OR  Posts.Title LIKE '%improve%'
        OR  Posts.Title LIKE '%better%'
        OR  Posts.Title LIKE '%how to%'
        OR  Posts.Title LIKE '%how can i%'
        OR  Posts.Title LIKE '%how do i%'
        )
)
SELECT PostID as [Post Link],
       Score,
       PostDate,
       UserID as [User Link],
       Reputation
FROM BadTitles
LEFT OUTER JOIN Users on UserId = Id
WHERE (Reputation IS NULL OR Reputation <= 500)
ORDER BY

  /* Lowest scores first - Change to DESC if you want Highest first */
  Score ASC,

  /* Newest first - Change to ASC if you want oldest first */
  PostDate DESC


请注意,这还会因迁移等问题而产生数百个左右的问题。

评论


\ $ \ begingroup \ $
嗯...只是意识到这不符合声誉标准...我放错了位置。
\ $ \ endgroup \ $
–rolfl
2015年6月11日21:00

\ $ \ begingroup \ $
我总是可以依靠您在我的SEDE查询中找到晦涩的错误,非常感谢!
\ $ \ endgroup \ $
– ran
2015年6月11日在21:01

#2 楼

我注意到您在选择语句中将逗号放在第一位,我也要做的事情是在我的where条件中将连接符放在第一位,它具有类似的优势。


  AND (
    Posts.Title LIKE '%please%' OR
    Posts.Title LIKE '%review%' OR
    Posts.Title LIKE '%improve%' OR
    Posts.Title LIKE '%better%' OR
    Posts.Title LIKE '%how to%'OR
    Posts.Title LIKE '%how can i%' OR
    Posts.Title LIKE '%how do i%'
  )



我会这样写的

  AND (
    Posts.Title LIKE '%please%' 
    OR Posts.Title LIKE '%review%'
    OR Posts.Title LIKE '%improve%'
    OR Posts.Title LIKE '%better%'
    OR Posts.Title LIKE '%how to%'
    OR Posts.Title LIKE '%how can i%'
    OR Posts.Title LIKE '%how do i%'
  )


评论


\ $ \ begingroup \ $
虽然我也喜欢该行开头的or,and 、、's等...我身上的OCD还是要大声疾呼,您的帖子没有排队:)
\ $ \ endgroup \ $
–WernerCD
2015年6月11日23:47

\ $ \ begingroup \ $
大声笑,我明白你的意思。
\ $ \ endgroup \ $
–马拉奇♦
15年6月12日在1:01

#3 楼

让我解决查询中可能存在的DRY问题,以换取可能更差的性能。

我特别在谈论此位:

  /* Generic title */
  AND (
    Posts.Title LIKE '%please%' OR
    Posts.Title LIKE '%review%' OR
    Posts.Title LIKE '%improve%' OR
    Posts.Title LIKE '%better%' OR
    Posts.Title LIKE '%how to%'OR
    Posts.Title LIKE '%how can i%' OR
    Posts.Title LIKE '%how do i%'
  )
 


在兼容性级别130 SQL Server提供了String_Split函数。它允许您为它提供一个字符串,并将其按单个字符拆分为一个表。例如:

 STRING_SPLIT('please|review|improve|better|how to|how can i|how do i','|') 
 


将为我们提供一个表,该表的列值包含各行用|分隔的单词。

我们可以使用该表然后对Posts表执行CROSS APPLY,因为这可以使我们将LIKE的嘶嘶声减少为: -override“> /* Generic title */ AND CHARINDEX( Phrases.Value , Posts.Title COLLATE Latin1_General_CI_AI ) > 0

请注意,我用CHARINDEX代替了LIKE。这本身并不重要,因为这些构造都不会从索引中受益。

我还添加了COLLATE指令,该指令带有大小写和不区分大小写的排序规则,因此不会错过大写字母的情况用过的。

此外,我建议删除PostType的CASE / WHEN / END语句。您可以与PostTypes表联接,然后在投影中使用Name列。

应用所有查询将其作为最终结果:

 DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');

DECLARE @Phrases NVARCHAR(200) ='please|review|improve|better|how to|how can i|how do i';

SELECT DISTINCT -- the distinct is a bit bogus here if you keep the Phrase column
/* Uncomment below if you want to limit the results */
  TOP 1000 
    Posts.Id AS [Post Link]
  , PostTypes.Name AS [Post Type]
  , Posts.Score
  , Posts.CreationDate
  , Users.Id AS [User Link]
  , Users.Reputation
  , Phrases.Value
FROM Posts 
INNER JOIN PostTypes 
  ON PostTypes.Id = Posts.PostTypeId
CROSS APPLY 
  STRING_SPLIT(@Phrases, '|') Phrases
LEFT OUTER JOIN Users
  ON Posts.OwnerUserId = Users.Id
WHERE
  /* Filter out non-question posts */
  Posts.PostTypeId = @Question

  /* Generic title */
  AND CHARINDEX(
          Phrases.Value
        , Posts.Title COLLATE Latin1_General_CI_AI
      ) > 0 

  /* Post not closed yet */
  AND Posts.ClosedDate IS NULL

  /* Question doesn't have positive score */
  /* AND Posts.Score <= 0 */

  /* User has low reputation - Un-comment to limit query further */
  AND (Users.Id is NULL OR Users.Reputation <= 500)

ORDER BY

  /* Lowest scores first - Change to DESC if you want Highest first */
  Posts.Score ASC

  /* Newest first - Change to ASC if you want oldest first */
  , Posts.CreationDate DESC
 


请注意,它如何显示额外的列短语以指示哪些短语符合要包含的标题。