首次回答的平均时间

只需尝试使用我的T-SQL foo。

编写脚本以获取平均时间（以秒为单位），以首次响应指定语言。

-- Could not get this so that the user entered
-- the language in a box at the bottom working.
-- so you have to edit the code here.
declare @Language nvarchar(20) = 'php' -- ##LanguageTag##

declare @AnswerTime table (Id int, CreationDate date, AnswerTime date, Elapse bigint, Slot int ) 

insert @AnswerTime
Select Q.Id, 
       max(Q.CreationDate) AS CreateTime, 
       min(A.CreationDate) AS AnswerTime, 
       DATEDIFF(second, max(Q.CreationDate), min(A.CreationDate)) AS Elapse,
       Year(max(Q.CreationDate))*12+Month(max(Q.CreationDate)) AS Slot
From   Posts Q, Posts A 
Where  Q.PostTypeId = 1
  and  Q.Id = A.ParentId
  and  CHARINDEX(@Language, Q.Tags) != 0
Group by Q.Id, Q.Tags
Order by Elapse

Select min(CreationDate), 
       count(*)         as [Count of Tickets], 
       -- min(Elapse)   as [Min Time to Answer],
       -- max(Elapse)   as [Max Time to Answer],
       avg(Elapse)    as [Average time to first Answer]
From   @AnswerTime
Group by Slot

它正在运行。

我冒昧地让您的参数启动并运行;）

您可能对此快速解答比率图表感兴趣

@rolfl：我在做实验的时候就想到了。我只是想在实验中取得一些微不足道的东西。

#1 楼

此查询中有一些关闭的内容。按某种顺序浏览它们：

我仍然是“老式”联接的粉丝，但是CTE概念在SQL Server中是真正的赢家，所以，而不是创建表变量，只需使用CTE。
我可以告诉您，您的代码经过了一些迭代，因此，您可以处理一些“内容”。删除那些您不选择的内容（例如选择从未使用过的min(A.CreationDate) as AnswerTime。

SEDE表需要一些习惯。您已经在标签列上完成了CharIndex。这很好，但是，您的示例php还引入了php5标签和cakephp。要解决此问题，您应该意识到SEDE将标签名称存储在标签字段的<...>大括号中，因此您要搜索<php>。 >
使用CharIndex是可以的，但是更多标准将使用like，例如：... and Tags like '%<php>%'

我实际上已经完全放弃了使用标签字段，而是改为加入
不需要Question的CreationDate上的聚合列，因为max(Q.CreationDate)是多余的（每个Question ID只有一个CreationDate）。
执行SQL查询时，应始终使用尽可能多的常量（至少要尽可能多，直到y ou发现性能原因不要过度使用它。...），因此，即使其ParentID是Q的ID，您也应该添加A.PostTypeId = 2。

我不喜欢您的“投放”方式数月之内的数据。我发现，从给定日期中减去月中少一天的日期会使日期返回到月初，然后截断时间，使您可以将所有日期精确地表示为月度：

Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate)) as CreationMonth

一旦我也使用了标签表，使变量替换更容易工作。愚蠢的“意外错误，已经分配了责任”错误是PITA，但现在已经消失了。
最后，Y轴的范围很大，以至于问题的数量显然没有吸引力/明显的。通过将刻度从“秒”更改为“小时”，刻度会更好。

我在这里派生了您的查询，这就是SQL（我试图使大小写和其他样式约定与您的代码保持一致）：

declare @Language nvarchar(25) = ##LanguageTag:string##;
declare @tagid int;

select @tagid = Id
from Tags
where TagName = @Language

print @Language + '->' + Convert(NVarchar(max), @tagid);

with AnswerTime as (
 Select Q.Id,
       Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate)) as CreationMonth,
       Convert(float, DATEDIFF(second, Q.CreationDate, min(A.CreationDate))) AS Elapsed
 From   PostTags T,
        Posts Q,
        Posts A 
 Where  T.TagId = @tagid
  and  T.PostId = Q.Id
  and  Q.PostTypeId = 1
  and  A.PostTypeId = 2
  and  Q.ClosedDate is null
  and  Q.Id = A.ParentId
 Group by Q.Id,
          Q.CreationDate,
          Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate))
)
Select CreationMonth, 
       count(*)         as [Count of Tickets], 
       avg(Elapsed)/3600.0 as [Average hours to first Answer]
From   AnswerTime
Group by CreationMonth

编辑/更新

我在Stack Overflow上运行了查询，但是它失败了，因为有太多问题，avg(Elapsed)失败了算术溢出。我已更新查询，以仅考虑过去68年的数据...（以秒为单位的日期差异较大），并将经过的数据转换为浮点数（不会在avg()上溢出）

\ $ \ begingroup \ $
您添加的未公开条件是一个好主意。
\ $ \ endgroup \ $
– 200_success
14年7月31日在17:53

#2 楼

我认为您的整体代码不错！没有什么比这特别有效的了。
挑剔
我发现您的关键字格式不一致：我看到declare和Select和DATEDIFF等。虽然确实SQL不区分大小写，但这是一个好的做法要一致。只需选择自己喜欢的一个并坚持下去即可。
老式的连接

From   Posts Q, Posts A
...
and  Q.Id = A.ParentId

自ANSI-92起已弃用，应避免使用。有许多避免的理由。
而是写：

From Posts Q 
Inner Join Posts A
On Q.Id = A.ParentId

秒？
虽然秒是一个精确的指标，但我认为它们有些难以理解，用人类的话来说。我认为可能最好选择几小时（或至少几分钟）。

#3 楼

逻辑正确性

c不是c ++； Java不是JavaScript。使用CHARINDEX(…, Q.Tags)检查以逗号分隔的字符串是一个坏主意。请改为对PostTags表进行连接。
请记住，此查询只为有答案的问题提供第一答案的时间。从某种意义上说，从未收到答案的问题的响应时间是无限的。您已隐式滤除了这些异常值，这可能不是好事。在任何情况下，均值可能会由于应答时间较长而偏斜。也许中位数可能比平均值更有趣。
您如何看待迁移的问题？
由于X4312079q不一定一定会影响图的X轴，因此会有一些抖动属于该月的第一天，尤其是该网站的早期。您可能希望将日期截断到月初，而不是选择每个min(CreationDate)的min(CreationDate)。
您想要几秒钟的回答时间？这是代码检查，而不是堆栈溢出。要在同一Y轴上获得有意义的首次回答时间和问题数量，您需要以天为单位来衡量回答时间！

表达力

首选公用表表达式而不是变量。将整个查询表述为一个Slot更为优雅，并为查询计划者提供了更大的自由度。
您不应在中间查询中放置一个SELECT。它不能保证最终结果的排序类似。
我将使用两个CTE编写此代码：一个收集所有相关的问题－答案对，另一个计算时间跨度。函数调用的大小写一致。
标签是标签。不论它是一个语言标签都没有关系，因此我不会命名变量或将输入字段标记为“语言”。

建议的解决方案

WITH FirstAnswers AS (
    SELECT Q.Id
         , Q.CreationDate AS QuestionTime
         , min(A.CreationDate) AS AnswerTime
        FROM Posts AS Q
            INNER JOIN Posts AS A
               ON Q.Id = A.ParentId
        WHERE EXISTS (
                SELECT PostId
                    FROM PostTags
                        INNER JOIN Tags
                            ON TagId = Tags.Id
                    WHERE PostId = Q.Id AND TagName = ##Tag:string?php##
            )
        GROUP BY Q.Id, Q.CreationDate
), AnswerTime AS (
    SELECT Id
         , QuestionTime
         , AnswerTime
         , datediff(second, QuestionTime, AnswerTime) / 86400.0 AS ElapsedDays
         , dateadd(m, datediff(m, 0, QuestionTime), 0) AS Month
        FROM FirstAnswers
)
SELECT Month
     , count(QuestionTime) AS [Question count]
     , avg(ElapsedDays) AS [Average time to first answer (Days)]
    FROM AnswerTime
    GROUP BY Month
    ORDER BY Month;

Stack Exchange数据资源管理器链接

#4 楼

代码看起来不错，可以做的改进很少：避免在charindex子句中使用像WHERE这样的函数/运算符，因为它将对每个单个值执行，随着时间的推移会降低性能。数据将会增加。
避免使用交叉JOIN。相反，请指定要执行的JOIN类型，否则首先将其加入每一行，然后它们将应用WHERE子句。我使用LEFT JOIN确保您得到正确的结果。

再读一遍有关temp表和表变量的信息。

SELECT  A.Id ,
        A.Tags ,
        MAX(A.CreationDate) AS CreateTime ,
        MIN(A.AnswerCreationDate) AS AnswerTime ,
        DATEDIFF(second, MAX(A.CreationDate), MIN(A.CreationDate)) AS Elapse ,
        YEAR(MAX(A.CreationDate)) * 12 + MONTH(MAX(A.CreationDate)) AS Slot
FROM    ( SELECT    Q.Id ,
                    Q.Tags ,
                    Q.CreationDate ,
                    A.CreationDate AS AnswerCreationDate ,
                    CHARINDEX(@Language, Q.Tags) languageIndex
          FROM      Posts Q
                    LEFT JOIN Posts A ON Q.Id = A.ParentId
          WHERE     Q.PostTypeId = 1
        ) A
WHERE   A.languageIndex != 0
GROUP BY Q.Id ,
        Q.Tags
ORDER BY Elapse

\ $ \ begingroup \ $
您刚刚将CHARINDEX从WHERE子句移到SELECT部分。这真的会改变效率吗？
\ $ \ endgroup \ $
–马丁·约克
14年7月31日在17:23

\ $ \ begingroup \ $
是的，可以肯定，因为select就像对数据应用包装器一样
\ $ \ endgroup \ $
– Paraitosh
2014年7月31日在17:25

#5 楼

跳到我身上的一件事是

WHERE Q.Id = A.ParentId

您可以JOIN代替该链接上的表格。

格式化明智，（而且您可能已经知道这一点，大多数人觉得这有点乏味）是确保所有关键字都大写，以便于阅读。

#6 楼

我只想添加到@rolfl的答案中。我完全不喜欢CTE。它们几乎总是比使用临时表或从子查询中选择要慢。

CTE通常也比表变量慢，至少以我的SQL Server经验来看，尽管没有什么理由使用如果您的行数不多，则使用表变量。

我很自由地从ROFL的答案中获取代码，并将CTE更改为直接从中选择的子查询，这大大提高了速度，从〜60ms降低到〜30ms。

这是我的改动：

declare @Language nvarchar(25) = ##LanguageTag:string##;
declare @tagid int;
declare @epoch datetime = DateDiff(yy, -68, current_timestamp);

select @tagid = Id
from Tags
where TagName = @Language

print @Language + '->' + Convert(NVarchar(max), @tagid);

Select CreationMonth, 
count(*) as [Count of Tickets], 
avg(Elapsed)/3600.0 as [Average hours to first Answer]
From 
(
 Select Q.Id,
 Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate)) as CreationMonth,
 Convert(float, DATEDIFF(second, Q.CreationDate, min(A.CreationDate))) AS Elapsed
 From PostTags T,
 Posts Q,
 Posts A 
 Where T.TagId = @tagid
 and T.PostId = Q.Id
 and Q.PostTypeId = 1
 and A.PostTypeId = 2
 and Q.ClosedDate is null
 and Q.Id = A.ParentId
 and A.CreationDate > Q.CreationDate -- validation, and also merges.
 and Q.CreationDate >= @epoch
 and A.CreationDate < CURRENT_TIMESTAMP
 Group by Q.Id,
 Q.CreationDate,
 Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate))
) As AnswerTime
Group by CreationMonth
order by CreationMonth

这是可行的

编程黑洞网