Savitzky-Golay平滑滤波器，用于间隔不相等的数据

我有一个频率为100Hz的信号，需要对此信号应用Savitzky-Golay平滑滤波器。但是，仔细检查后，我的信号并不是以完全恒定的速率测量的，测量之间的差值介于9.7到10.3 ms之间。

是否有办法在不等距的数据上使用Savitzky-Golay滤波器？
还有其他方法可以应用吗？

戈里（Gorry）于1991年发表的一篇论文几乎就是这个确切的主题数据。但是tldr，datageist的答案是正确的主要思想（局部最小二乘）。 Gorry观察到的是，系数仅取决于自变量，并且在因变量中是线性的（例如Savitzky-Golay）。然后，他给出了一种计算方法，但是，如果您没有编写优化的库，则可以使用任何旧的最小二乘拟合器。

#1 楼

一种方法是对数据进行重新采样，以使其等距分布，然后您可以执行所需的任何处理。使用线性滤波进行带宽有限的重采样将不是一个好的选择，因为数据的间隔不是均匀的，因此您可以使用某种局部多项式插值（例如三次样条）来估计底层信号的值“精确” 10毫秒间隔。

$ \ begingroup $
作为最后的选择，我想到了这个解决方案。我想知道最终这种方法是否能提供比仅以恒定速率测量信号更好的解决方案。
$ \ endgroup $
– VLC
2012年9月9日15:13

$ \ begingroup $
我认为，即使采样不均匀，您仍然可以使用sinc（）插值（或其他采样率高的低通滤波器）。这可能比样条或pchip产生更好的结果
$ \ endgroup $
–希尔马
2012年10月10日在16:42

$ \ begingroup $
@Hilmar：你是对的。您可以通过多种方法对数据进行重新采样。近似Sinc插值将是带限重采样的“理想”方法。
$ \ endgroup $
–Jason R
2012年3月10日16:56

#2 楼

由于Savitzky-Golay滤波器的推导方式（即作为局部最小二乘多项式拟合），因此自然地可以归纳为非均匀采样-它的计算量要大得多。

Savitzky-Golay滤波器一般而言，对于标准过滤器，其想法是将多项式拟合到局部样本集（使用最小二乘法），然后将中心样本替换为位于中心索引处的多项式值（即在0）。这意味着可以通过对样本索引的范德蒙德矩阵求逆来生成标准SG滤波器系数。例如，要生成跨越五个样本$ y_0 \ dots y_4 $（具有局部指数-2，-1,0,1,2）的局部抛物线拟合，设计方程式$ Ac = y $的系统如下：

$$
\ begin {bmatrix} -2 ^ 0＆-2 ^ 1＆-2 ^ 2 \\ -1 ^ 0＆-1 ^ 1＆-1 ^ 2 \ \ 0 ^ 0＆0 ^ 1＆0 ^ 2 \\ 1 ^ 0＆1 ^ 1＆1 ^ 2 \\ 2 ^ 0＆2 ^ 1＆2 ^ 2 \ end {bmatrix} \ begin {bmatrix} c_0 \ \ c_1 \\ c_2 \ end {bmatrix} = \ begin {bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \\ y_4 \ end {bmatrix}。
$$

在上面的$ c_0 \点c_2 $是最小二乘多项式$ c_0 + c_1x + c_2x ^ 2 $的未知系数。由于多项式在$ x = 0 $处的值仅为$ c_0 $，因此计算设计矩阵的伪逆（即$ c =（A ^ TA）^ {-1} A ^ T y \ space $）将得出第一行中的SG滤波器系数。在这种情况下，它们将是

$$
\ begin {bmatrix} c_0 \\ c_1 \\ c_2 \ end {bmatrix} =
\ begin {bmatrix}
-3＆12＆17＆12＆-3 \\
-7＆-4＆0＆4＆7 \\
5＆-3＆-5＆-3＆5 \\
\ end {bmatrix}
\ begin {bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \\ y_4 \ end {bmatrix}。
$$

请注意，由于$ c_0 + c_1x + c_2x ^ 2 $的导数为$ c_1 + 2c_2x $，因此矩阵的第二行（评估$ c_1 $）将为平滑的导数过滤器。相同的参数适用于连续的行-它们给出平滑的高阶导数。请注意，我将矩阵缩放了35，因此第一行将与Wikipedia上给出的平滑系数匹配（上）。导数滤波器的每个因其他比例因子而异。

非均匀采样

当样本均匀分布时，滤波器系数是平移不变的，因此结果只是FIR过滤。对于非均匀样本，系数将根据局部样本间距而有所不同，因此将需要构造设计矩阵并在每个样本处进行求逆。如果非均匀采样时间为$ x_n $，我们构造每个本地采样时间固定为$ 0 $的局部坐标$ t_n $，即

$$
\ begin {align}
t _ {-2}＆= x _ {-2}-x_0 \\
t _ {-1}＆= x _ {-1}-x_0 \\
t_0＆= x_0-x_0 \\
t_1＆= x_1-x_0 \\
t_2＆= x_2-x_0
\ end {align}

，则每个设计矩阵将格式如下：

$$
A =
\ begin {bmatrix}
t _ {-2} ^ 0＆t _ {-2} ^ 1 ＆t _ {-2} ^ 2 \\
t _ {-1} ^ 0＆t _ {-1} ^ 1＆t _ {-1} ^ 2 \\
t_0 ^ 0＆t_0 ^ 1 ＆t_0 ^ 2 \\
t_1 ^ 0＆t_1 ^ 1＆t_1 ^ 2 \\
t_2 ^ 0＆t_2 ^ 1＆t_2 ^ 2
\ end {bmatrix} =
\ begin {bmatrix}
1＆t _ {-2}＆t _ {-2} ^ 2 \\
1＆t _ {-1}＆t _ {-1} ^ 2 \\
1＆0＆0 \\
1＆t_1＆t_1 ^ 2 \\
1＆t_2＆t_2 ^ 2
\ end {bmatrix}。
$$

点缀本地样本值的$ A $伪逆的第一行将产生$ c_0 $，即该样本处的平滑值。

$ \ begingroup $
听起来像是从O（log（n））移到O（n ^ 2）。
$ \ endgroup $
– EngrStudent
19年2月12日在21:29

$ \ begingroup $
这是datageist向上描述的Scala实现。
$ \ endgroup $
–中芯
19年5月9日，2：12

$ \ begingroup $
@Mediumcore您没有在原始帖子中添加链接。另外，我删除了它，因为它没有提供问题的答案。请尝试编辑datageist的帖子以添加链接；审核后将进行审核。
$ \ endgroup $
– Peter K.♦
19年5月9日在13:41

#3 楼

“作为一种廉价的替代方案，如果$ f $在$ N $点窗口的整个宽度上的变化，则可以简单地假设数据点是等距的...
小于单点测量噪声的$ \ sqrt {N / 2} $倍，
则可以使用便宜的方法。“
$ \ qquad-$数字食谱771-772页

（派生任何人？）

（“假装等距”的意思是：
取每个点附近最近的$ \ pm N / 2 $点
$ t $想要SavGol（$ t $）的地方，
不将所有$ t_i \抢购到i $。
这很明显，但是让我呆了一会。）

#4 楼

就像techwinder在C ++中所做的那样，我使用了datageist的算法并在Python中实现了它。也许这对将来会有帮助。

import numpy as np


def non_uniform_savgol(x, y, window, polynom):
    """
    Applies a Savitzky-Golay filter to y with non-uniform spacing
    as defined in x

    This is based on https://dsp.stackexchange.com/questions/1676/savitzky-golay-smoothing-filter-for-not-equally-spaced-data
    The borders are interpolated like scipy.signal.savgol_filter would do

    Parameters
    ----------
    x : array_like
        List of floats representing the x values of the data
    y : array_like
        List of floats representing the y values. Must have same length
        as x
    window : int (odd)
        Window length of datapoints. Must be odd and smaller than x
    polynom : int
        The order of polynom used. Must be smaller than the window size

    Returns
    -------
    np.array of float
        The smoothed y values
    """
    if len(x) != len(y):
        raise ValueError('"x" and "y" must be of the same size')

    if len(x) < window:
        raise ValueError('The data size must be larger than the window size')

    if type(window) is not int:
        raise TypeError('"window" must be an integer')

    if window % 2 == 0:
        raise ValueError('The "window" must be an odd integer')

    if type(polynom) is not int:
        raise TypeError('"polynom" must be an integer')

    if polynom >= window:
        raise ValueError('"polynom" must be less than "window"')

    half_window = window // 2
    polynom += 1

    # Initialize variables
    A = np.empty((window, polynom))     # Matrix
    tA = np.empty((polynom, window))    # Transposed matrix
    t = np.empty(window)                # Local x variables
    y_smoothed = np.full(len(y), np.nan)

    # Start smoothing
    for i in range(half_window, len(x) - half_window, 1):
        # Center a window of x values on x[i]
        for j in range(0, window, 1):
            t[j] = x[i + j - half_window] - x[i]

        # Create the initial matrix A and its transposed form tA
        for j in range(0, window, 1):
            r = 1.0
            for k in range(0, polynom, 1):
                A[j, k] = r
                tA[k, j] = r
                r *= t[j]

        # Multiply the two matrices
        tAA = np.matmul(tA, A)

        # Invert the product of the matrices
        tAA = np.linalg.inv(tAA)

        # Calculate the pseudoinverse of the design matrix
        coeffs = np.matmul(tAA, tA)

        # Calculate c0 which is also the y value for y[i]
        y_smoothed[i] = 0
        for j in range(0, window, 1):
            y_smoothed[i] += coeffs[0, j] * y[i + j - half_window]

        # If at the end or beginning, store all coefficients for the polynom
        if i == half_window:
            first_coeffs = np.zeros(polynom)
            for j in range(0, window, 1):
                for k in range(polynom):
                    first_coeffs[k] += coeffs[k, j] * y[j]
        elif i == len(x) - half_window - 1:
            last_coeffs = np.zeros(polynom)
            for j in range(0, window, 1):
                for k in range(polynom):
                    last_coeffs[k] += coeffs[k, j] * y[len(y) - window + j]

    # Interpolate the result at the left border
    for i in range(0, half_window, 1):
        y_smoothed[i] = 0
        x_i = 1
        for j in range(0, polynom, 1):
            y_smoothed[i] += first_coeffs[j] * x_i
            x_i *= x[i] - x[half_window]

    # Interpolate the result at the right border
    for i in range(len(x) - half_window, len(x), 1):
        y_smoothed[i] = 0
        x_i = 1
        for j in range(0, polynom, 1):
            y_smoothed[i] += last_coeffs[j] * x_i
            x_i *= x[i] - x[-half_window - 1]

    return y_smoothed

#5 楼

我发现，在Matlab中有两种使用savitzky-golay算法的方法。曾经是一个滤波器，一次是一个平滑函数，但是基本上它们应该做同样的事情。

yy = sgolayfilt（y，k，f）：在这里，值y = y假设（x）在x中等距。不必等间距！

#6 楼

如果有帮助，我已对datageist描述的方法进行了C实现。
免费使用，后果自负。

编程黑洞网