Python中的OEIS查找工具

我来自PPCG，所以我做了一个esolang，因此决定用Python编写它。最终，它从esolang变为OEIS（整数序列在线百科）查找工具。我是Python的新手。

本质上，该程序使用OEIS序列号（例如55表示序列A000055），并返回序列中的第n个数字。

代码获取OEIS页，对其进行解析使用BeautifulSoup，并返回结果，如果不存在，则返回“ OEIS在给定索引的序列中没有数字”。

import sys, re
from urllib2 import*
from bs4 import BeautifulSoup

# Main logic
def execute(code, nth):
  # Decode input stuff
  num = int(code)
  try:
    f = urlopen("http://oeis.org/A%06d/list" % num)
    # global tree, data # Debugging
    # >:D I'm sorry but I have to use RegEx to parse HTML
    print {key: int(value) for key, value in
      re.findall( r'(\d+) (\d+)', re.sub(r'\D+', " ", re.sub(r'<[^>]+>', "",
        str(BeautifulSoup(f, "lxml").find(lambda tag:
          tag.name == "table" and # A table
          not tag.get('cellspacing') == "0" and # A table table
          len(tag.contents) > 1 # A two column table
        ))
      )) )
    }.get(nth, "OEIS does not have a number in the sequence at the given index")
  except HTTPError:
    print "Could not find sequence A%06d" % num
  except URLError:
    print "Could not connect to sources";
  except:
    print "Verify your numbers are correct"
    raise

if __name__ == '__main__':
  if len(sys.argv) > 1:
    execute(sys.argv[1], sys.argv[2])
  else:
    print """This is the OEIS lookup tool
You haven't entered the sequence""" % (LANGNAME)

。我使用的是我只会理解的东西。我主要关心的是我如何布置此程序。尤其是HTML解析，我真的怀疑我是否以最好的方式做到了。

无法容纳...

“>：D很抱歉，我必须使用RegEx来解析HTML”-相关。

#1 楼

import*：

请避免不惜一切代价import所有模块。分别导入您使用的每个模块

Regexಠ_ಠ

Regex是邪恶的，最糟糕的。认真地。停下来。马上。立即杀死python.exe并进行更改。

几秒钟后，您实际上会使用dom解析库beautifulsoup，但是您选择使用regex。

您，先生，这是邪恶的。

我建议您进一步研究beautifulsoup，或者看看Scrapy，这是一个Python的抓取库，它利用生成器来抓取大元素集（甚至具有云支持！）

Lisp

您的代码读起来像Lisp。

    ))
  )) )

，严重的是，您需要更好地设置其格式。

请尝试使用Python的样式指南PEP8。通过在线分析器运行代码时，它立即引发20个错误。

编写良好的Python时，PEP8应该是您的goto样式文档。

循环

不应像这样循环：

print {key: int(value) for key, value in

如果要减少代码缩进级别，那么您应该使用单独的变量来存储过程中的每个步骤，或者仅使用标准循环即可。

字符串格式化

而不是使用% (myString)，您应该使用string.format，因为这是出于以下原因的改进：

参数顺序的独立性
更易读

"My name is {name}, and I work for {work_name} as a {job_name}".format(name="Batman", work_name="Gotham City", job_name="Crimestopper")

\ $ \ begingroup \ $
string.format和％mystring有什么区别
\ $ \ endgroup \ $
–下山羊
16年2月16日在1:48

\ $ \ begingroup \ $
解析OEIS HTML的原因将是一个混蛋
\ $ \ endgroup \ $
–下山羊
16年2月16日在2:22

\ $ \ begingroup \ $
这与我在评论中建议的相反。
\ $ \ endgroup \ $
– Quill
16-2-16在4:29

\ $ \ begingroup \ $
@Downgoat string.format具有更多的格式化选项，与％格式化相比，它可以使您做更多的事情。 string.format还可以将类型强制转换为字符串，因此通常不需要将内容显式转换为字符串。
\ $ \ endgroup \ $
–SuperBiasedMan
16年2月16日在9:33

\ $ \ begingroup \ $
@Downgoat：

是合法的HTML，请参阅w3.org/TR/html5/tabular-data.html#the-tr-element
\ $ \ endgroup \ $
–查尔斯
16-2-16在13:53

#2 楼

您根本不需要HTML解析。 OEIS具有不错的JSON输出格式。

https://oeis.org/search?fmt=json&q=id:A000045

所以程序的核心功能可以是写成类似

import sys
import urllib2
import json
f = urllib2.urlopen("https://oeis.org/search?fmt=json&q=id:%s" % sys.argv[1])
doc = json.loads(f.read())
comment = doc['results'][0]['comment']
print "\n".join(comment)

\ $ \ begingroup \ $
这可以变得更好-我认为-与请求：从请求导入get和doc = get（“ https://oeis.org/search”，params = {“ fmt”：“ json”，“ q “：” id：“ +代码}）。json（）。
\ $ \ endgroup \ $
– FinnÅrupNielsen
16-2-16在13:37

#3 楼

冗余代码

    print """This is the OEIS lookup tool
You haven't entered the sequence""" % (LANGNAME)

这很容易变成

    print "This is the OEIS lookup tool\nYou haven't entered the sequence"

您不小心将Putt的一些剩余代码留在那里。

♫让我们一起跳上代码审查总线...♫

#4 楼

错误消息

 except HTTPError:
  print "Could not find sequence A%06d" % num
except URLError:
  print "Could not connect to sources";
except:
  print "Verify your numbers are correct"
  raise

错误消息应该发送至STDERR，而不是打印至STDOUT 。这样可以从实际输出中过滤错误消息，这对于调试和日志记录很有用。

文档和变量命名

类似

< pre class =“ lang-py prettyprint-override”>

# Decode input stuff

对于不知道code是什么的人来说不是特别有用（“为什么一段代码是数字？”），对于知道code是什么的人来说可能仍然没有用。

 # Convert sequence number to integer

不那么模棱两可。

当我们讨论这个主题时，这里使用的许多变量名都是非常通用的。像key, value这样的一次性名称很好，但是code和num对其所指的含义并不特别清楚。代替num，而是sequence_number，seqnum还是a_number的原因，因为它与OEIS相关，所以execute的方法也没有得到很好的记录，即使您很好地描述了

print {key: int(value) for key, value in
  re.findall( r'(\d+) (\d+)', re.sub(r'\D+', " ", re.sub(r'<[^>]+>', "",
    str(BeautifulSoup(f, "lxml").find(lambda tag:
      tag.name == "table" and # A table
      not tag.get('cellspacing') == "0" and # A table table
      len(tag.contents) > 1 # A two column table
    ))
  )) )
}.get(nth, "OEIS does not have a number in the sequence at the given index")

方法的工作原理。帖子的顶部。考虑将您的解释或类似内容作为文档字符串放入其中。

一般可读性

此

 print "Could not connect to sources";

一次执行的操作太多，因此很难说出从哪里开始阅读。将其分成几部分-如果您有将非数字游程转换为空格的步骤，请将其放在自己的行上，而不是将所有内容链接在一起。

 not tag.get('cellspacing') == "0"

检查不等式

!=

为什么不cellspacing？或者，如果int(tag.get('cellspacing')) > 0是数字，则q4312079q吗？

#5 楼

这里是一些问题：

考虑docopt。它迫使您考虑脚本文档。
还考虑将脚本作为有关命名和文档的模块来使用。我想这取决于您的应用。
将print与函数分开。
考虑从一或零开始索引。当前，您正在使用从零开始索引get(nth, ...)。我宁愿从一个索引，所以第一个数字称为1。OEIS是否将第一个值称为第零？

request可能比urllib2更好

这是我尝试使用@Vortico建议的JSON接口（也应考虑缺少异常处理）：

"""OEIS.

Usage:
  oeis <code> <nth>

"""

from docopt import docopt

from requests import get


def oeis(code, nth):
    """Return n'th number in OEIS sequence.

    Parameters
    ----------
    code : int
        OEIS identifier, e.g., 55.
    nth : int
        Ordinal number in the sequence.

    Returns
    -------
    number : int
        Number in the sequence

    Examples
    --------
    >>> oeis(45, 8)
    13

    """
    # http://codereview.stackexchange.com/a/120133/46261
    doc = get("https://oeis.org/search",
              params={"fmt": "json", 
                      "q": "id:A{:06d}".format(int(code))}).json()
    data = doc['results'][0]['data']
    number = int(data.split(',')[int(nth) - 1])
    return number


if __name__ == "__main__":
    arguments = docopt(__doc__)
    print(oeis(arguments['<code>'], arguments['<nth>']))

C＃中的链接列表

使用SQL查询执行和计时循环

编程黑洞网