我使用自底向上的动态编程算法编写了Python中背包问题的解决方案。给定具有值和权重的项目列表以及允许的最大权重,它可以正确地计算最佳值。

对代码风格,注释风格,可读性和最佳实践的任何批评将不胜感激。 。我不确定代码的Pythonic程度;填充矩阵似乎是实现动态编程的一种自然方法,但它不会“感觉” Pythonic(在这种情况下,这是一件坏事吗?)。

请注意,注释是有点冗长;在编写算法的过程中,由于要尽最大可能地理解它,所以我尝试对每个步骤都尽可能明确。如果它们过多(或完全不正确),请随时对其进行评论(哈!)。

import sys

def knapsack(items, maxweight):
    # Create an (N+1) by (W+1) 2-d list to contain the running values
    # which are to be filled by the dynamic programming routine.
    #
    # There are N+1 rows because we need to account for the possibility
    # of choosing from 0 up to and including N possible items.
    # There are W+1 columns because we need to account for possible
    # "running capacities" from 0 up to and including the maximum weight W.
    bestvalues = [[0] * (maxweight + 1)
                  for i in xrange(len(items) + 1)]

    # Enumerate through the items and fill in the best-value table
    for i, (value, weight) in enumerate(items):
        # Increment i, because the first row (0) is the case where no items
        # are chosen, and is already initialized as 0, so we're skipping it
        i += 1
        for capacity in xrange(maxweight + 1):
            # Handle the case where the weight of the current item is greater
            # than the "running capacity" - we can't add it to the knapsack
            if weight > capacity:
                bestvalues[i][capacity] = bestvalues[i - 1][capacity]
            else:
                # Otherwise, we must choose between two possible candidate values:
                # 1) the value of "running capacity" as it stands with the last item
                #    that was computed; if this is larger, then we skip the current item
                # 2) the value of the current item plus the value of a previously computed
                #    set of items, constrained by the amount of capacity that would be left
                #    in the knapsack (running capacity - item's weight)
                candidate1 = bestvalues[i - 1][capacity]
                candidate2 = bestvalues[i - 1][capacity - weight] + value

                # Just take the maximum of the two candidates; by doing this, we are
                # in effect "setting in stone" the best value so far for a particular
                # prefix of the items, and for a particular "prefix" of knapsack capacities
                bestvalues[i][capacity] = max(candidate1, candidate2)

    # Reconstruction
    # Iterate through the values table, and check
    # to see which of the two candidates were chosen. We can do this by simply
    # checking if the value is the same as the value of the previous row. If so, then
    # we say that the item was not included in the knapsack (this is how we arbitrarily
    # break ties) and simply move the pointer to the previous row. Otherwise, we add
    # the item to the reconstruction list and subtract the item's weight from the
    # remaining capacity of the knapsack. Once we reach row 0, we're done
    reconstruction = []
    i = len(items)
    j = maxweight
    while i > 0:
        if bestvalues[i][j] != bestvalues[i - 1][j]:
            reconstruction.append(items[i - 1])
            j -= items[i - 1][1]
        i -= 1

    # Reverse the reconstruction list, so that it is presented
    # in the order that it was given
    reconstruction.reverse()

    # Return the best value, and the reconstruction list
    return bestvalues[len(items)][maxweight], reconstruction


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('usage: knapsack.py [file]')
        sys.exit(1)

    filename = sys.argv[1]
    with open(filename) as f:
        lines = f.readlines()

    maxweight = int(lines[0])
    items = [map(int, line.split()) for line in lines[1:]]

    bestvalue, reconstruction = knapsack(items, maxweight)

    print('Best possible value: {0}'.format(bestvalue))
    print('Items:')
    for value, weight in reconstruction:
        print('V: {0}, W: {1}'.format(value, weight))


期望的输入文件如下:

165
92 23
57 31
49 29
68 44
60 53
43 38
67 63
84 85
87 89
72 82


第一行包含允许的最大权重,后续行包含由值-权重对表示的项目。

评论

我知道一个老帖子,但是如果您还没有碰到它,那么已经枚举允许您指定起始索引(例如,对于i,enumerate(items,1)中的item:应该让我从1开始。

@SimonT:太棒了!我已经用Python编程了2年多了,但我从未见过枚举以这种方式使用。我一定会记住的。

#1 楼

1.回顾


函数knapsack缺少一个文档字符串,该字符串解释了该函数采用的参数(items中是什么东西?items必须是序列,还是可以迭代) ?以及返回的内容。
这种功能非常适合doctests。

注释中的内容类似于“用(W + 1)2-d创建(N + 1)清单”。但是N是什么,W是什么?大概N是len(items),W是maxweight,因此在注释和代码中使用相同的名称将是一个好主意:

N = len(items)
W = maxweight



bestvalues上方的注释无法解释该表中的值的含义。我会这样写:

# bestvalues[i][j] is the best sum of values for any
# subsequence of the first i items, whose weights sum
# to no more than j.


这很清楚为什么\ $ 0≤i≤N \ $,为什么\ $ 0≤j≤W \ $。
在如下循环中:

bestvalues = [[0] * (maxweight + 1)
              for i in xrange(len(items) + 1)]


如果未使用循环变量(此处i),则通常将其命名为_


这些行可以省略:

# Increment i, because the first row (0) is the case where no items
# are chosen, and is already initialized as 0, so we're skipping it
i += 1


然后在其余代码中,使用i + 1代替ii而不是i - 1


重建循环:

i = N
while i > 0:
    # code
    i -= 1


可以这样写:

for i in reversed(range(1, N + 1)):
    # code



代码将显示如下错误消息:

print('usage: knapsack.py [file]')


错误消息应转为标准错误(非标准输出)。最好不要对程序名称进行硬编码,因为重命名程序很容易,但是忘记更新代码。因此改写:

sys.stderr.write("usage: {0} [file]\n".format(sys.argv[0]))



读取问题描述并打印结果的代码块仅在__name__ == '__main__'时运行。这使得很难进行测试,例如从交互式解释器进行测试。通常最好将这种代码放在自己的函数中,例如:

def main(filename):
    with open(filename) as f:
        # etc.

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('usage: knapsack.py [file]')
        sys.exit(1)
    main(sys.argv[1])


现在您可以从解释器运行main('problem.txt')对其进行测试。


代码将整个文件作为行列表读取到内存中:

lines = f.readlines()


这在这里是无害的,因为文件很小,但是通常最好处理一次归档一行,如下所示:

with open(filename) as f:
    maxweight = int(next(f))
    items = [[int(word) for word in line.split()] for line in f]



2。递归方法

任何动态编程算法都可以通过两种方式实现:通过自下而上构建部分结果表(如后文代码),或通过递归计算结果的方法。自上而下,使用备忘录来避免多次计算任何部分结果。

自上而下的方法通常会导致代码更简单,更清晰,并且仅计算特定对象所需的部分结果。问题实例(而采用自下而上的方法通常很难避免计算所有部分结果,即使其中一些未使用也是如此)。 this:

from functools import lru_cache

def knapsack(items, maxweight):
    """Solve the knapsack problem by finding the most valuable subsequence
    of items that weighs no more than maxweight.

    items must be a sequence of pairs (value, weight), where value is a
    number and weight is a non-negative integer.

    maxweight is a non-negative integer.

    Return a pair whose first element is the sum of values in the most
    valuable subsequence, and whose second element is the subsequence.

    >>> items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
    >>> knapsack(items, 15)
    (11, [(2, 1), (6, 4), (1, 1), (2, 2)])

    """
    @lru_cache(maxsize=None)
    def bestvalue(i, j):
        # Return the value of the most valuable subsequence of the first
        # i elements in items whose weights sum to no more than j.
        if j < 0:
            return float('-inf')
        if i == 0:
            return 0
        value, weight = items[i - 1]
        return max(bestvalue(i - 1, j), bestvalue(i - 1, j - weight) + value)

    j = maxweight
    result = []
    for i in reversed(range(len(items))):
        if bestvalue(i + 1, j) != bestvalue(i, j):
            result.append(items[i])
            j -= items[i][1]
    result.reverse()
    return bestvalue(len(items), maxweight), result


要查看需要计算多少局部解决方案,请在返回结果之前打印@functools.lru_cache。解决文档字符串中的示例问题时,输出:

CacheInfo(hits=17, misses=42, maxsize=None, currsize=42)


缓存中的42个条目少于自下而上方法计算出的96个部分解决方案。

评论


\ $ \ begingroup \ $
感谢您的答复(尤其是第3点和第9点)!我不敢相信我忘记考虑记忆化的递归方法。希望今天晚些时候我会应用您的一些建议。再次感谢。
\ $ \ endgroup \ $
–voithos
13年1月16日在22:19

\ $ \ begingroup \ $
注意此提议解决方案的递归深度限制问题。
\ $ \ endgroup \ $
–弗兰克·史密斯
2014年8月8日15:13

\ $ \ begingroup \ $
@Frank:是的,此解决方案使用len(item)个级别的堆栈。
\ $ \ endgroup \ $
–加雷斯·里斯(Gareth Rees)
2014年3月13日在8:26

\ $ \ begingroup \ $
@GarethRees是否需要在knapsack()中定义bestvalue()?我倾向于避免在函数内部定义函数以使事情更清楚。
\ $ \ endgroup \ $
–绿色Diod
17年1月17日在13:43

\ $ \ begingroup \ $
@greendiod:我在背包内定义了bestvalue,因为(i)仅在此处使用过; (ii)它使用了外部作用域中的项(必须通过所有递归调用将其传递下来很痛苦); (iii)已被记住,因此它具有关联的缓存,当背包返回时我们不想保留它。
\ $ \ endgroup \ $
–加雷斯·里斯(Gareth Rees)
17年1月17日在18:06



#2 楼

对于涉及嵌套循环的代码的第一部分,如果应该选择
-使用curr而不是i
-分配prev = curr - 1
-使用enumerate(items, 1)

该代码将按字面意义记录自身。

for curr, (value, weight) in enumerate(items, 1):
    prev = curr - 1
    for capacity in range(maxweight + 1):
        if weight > capacity:
            bestvalues[curr][capacity] = bestvalues[prev][capacity]
        else:
            candidate1 = bestvalues[prev][capacity]
            candidate2 = bestvalues[prev][capacity - weight] + value

            bestvalues[curr][capacity] = max(candidate1, candidate2)


#3 楼

在代码行和执行计算的最快时间方面,可以使用动态编程以一种非常有效的方式解决以下问题。最重要的是,以下代码执行备忘录以缓存先前计算的结果。

try:
    from functools import lru_cache
except ImportError:
    # For Python2
    # pip install backports.functools_lru_cache
    from backports.functools_lru_cache import lru_cache
class knapsack:
    """
    Maximize sum of selected weight.
    Sum of selected size is less than capacity.
    Algorithm: Dynamic Programming
    ----
    >>>items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
    >>>weight, size   = zip(*items)
    >>>weight = list(weight)
    >>>size = list(size)
    >>>capacity = 15
    >>>knapsack(size, weight).solve(capacity)

    >>>(11, [1, 2, 3, 4])

    """
    def __init__(self, size, weight):
        self.size = size
        self.weight = weight
    @lru_cache()
    def solve(self, cap, i=0):
        if cap < 0: return -sum(self.weight), []
        if i == len(self.size): return 0, []
        res1 = self.solve(cap,  i + 1)
        res2 = self.solve(cap - self.size[i], i + 1)
        res2 = (res2[0] + self.weight[i], [i] + res2[1])
        return res1 if res1[0] >= res2[0] else res2


评论


\ $ \ begingroup \ $
欢迎使用代码审查!您提出了替代解决方案,但尚未检查代码。请编辑以显示问题代码的哪些方面促使您编写此版本,以及对原始版本进行了哪些改进。也许(重新)阅读“如何回答”是值得的。
\ $ \ endgroup \ $
– Toby Speight
18-10-5在7:54