Python review_generator_编程黑洞网

给定一个Python文件，该文件包含写有错误样式的Python脚本，此脚本将输出一个解决其问题的评论。

RESERVED_KEYWORDS=['abs','dict','help','min','setattr','all','dir','hex','next','slice',
'any','divmod','id','object','sorted','ascii','enumerate','input','oct',
'staticmethod','bin','eval','int','open','str','bool','exec','isinstance',
'ord','sum','bytearray','filter','issubclass','pow','super','bytes','float',
'iter','print','tuple','callable','format','len','property','type','chr',
'frozenset','list','range','vars','classmethod','getattr','locals','repr','zip',
'compile','globals','map','reversed',
'__import__','complex','hasattr','max','round','delattr','hash','memoryview','set']

FILENAME = "code_with_bad_style.py"

BULTIN_REASSIGNED_ERROR = """You wrote:

    {} = "something"

That is not good because {} is a built-in in Python
and you should never re-assign new values to the
built-ins, in case you are wondering wheter a word is a builtin or
not go to https://docs.python.org/3/library/functions.html to read the
complete list"""

NAME_NOT_USED_ERROR="""You should use

    if __name__ == "__name__":
        main()
So that your file is going to usable as both
a stand-alone programme and an importable programme.
"""

NO_DOCS_ERROR = """You should consider using some docstrings.
Docstrings are multiline comments that explain what a function does,
they are of great help for the reader. They look like the following:

    def function(a, b):
        \"\"\"Do X and return a list.\"\"\"
"""

USE_LIST_COMPREHENSION_ERROR = """In python there is
a very powerful language feature called [list comprehension][https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions].
The following:

    result = []
    for i in lst:
        if foo(i):
            result.append(bar(i))

should be replaced with:

    result = [bar(i) for i in lst if foo(i)]
"""

USE_WITH_ERROR = """There is a very convenient way of handling files in python:
the with statement. It handles closing automatically so you do not
have to worry about it. It is very simple to use, look at the following example:

    with open("x.txt") as f:
        data = f.read()
        # do something with data
"""

PRINT_BREAKING_PYTHON_3_ERROR = """You should use parenthesis with your print
statement (in Python 3 it is a function) to keep compatibility with python 3"""

IMPORT_STAR_ERROR = """You should avoid using:

    from long_long_long_name import *

because people will not know where you are taking your functions from.
Instead use:

    import long_long_long_name as short
"""

SEPARATOR = """
----------

"""

def nextline(line,lines):
    return lines[lines.index(line) + 1]

def reassign_built_in_error(code):
    for built_in in RESERVED_KEYWORDS:
        if built_in + " =" in code or built_in + "=" in code:
            return BULTIN_REASSIGNED_ERROR.format(built_in,built_in)

def if_name_error(code):
    if "__name__" not in code:
        return NAME_NOT_USED_ERROR

def no_docs_error(code):
    for line in code.splitlines():
        if line.startswith("def") or line.startswith("class"):
            if '"""' not in nextline(line,code):
                return NO_DOCS_ERROR

def use_list_comprehension_error(code):
    if "append" in code:
        return USE_LIST_COMPREHENSION_ERROR

def with_not_used_error(code):
    if ".close()" in code and ".open()" in code:
        return USE_WITH_ERROR

def print_breaking_3_error(code):
    for line in code.splitlines():
        if line.startswith("print") and "(" not in line:
            return PRINT_BREAKING_PYTHON_3_ERROR

def import_star_error(code):
    for line in code.splitlines():
        if line.startswith("import") and "*" not in line:
            return IMPORT_STAR_ERROR

def main():
    ALL_ANALYSIS = [reassign_built_in_error,if_name_error,no_docs_error,
                    use_list_comprehension_error,with_not_used_error,
                    print_breaking_3_error,import_star_error]

    with open(FILENAME) as f:
        code = f.read()

    for analysis in ALL_ANALYSIS:
        result = analysis(code)
        if result:
            print(result)
            print(SEPARATOR)

if __name__ == "__main__":
    main()

看起来我们正在寻求更多的机器人体验：meta.stackoverflow.com/questions/280183/…

#1 楼

该代码甚至没有通过基本的PEP8测试。
听起来有些虚伪;-)

目标文件名硬编码在文件中，
此脚本非常不方便：

FILENAME = "code_with_bad_style.py"

看一下argparse，并使该脚本在命令上使用filename参数行。

几种方法将代码分成多行，例如：

def no_docs_error(code):
    for line in code.splitlines():
        # ...

def print_breaking_3_error(code):
    for line in code.splitlines():
        # ...

使用较大的源代码，这可能非常浪费。
最好在一开始就分割成几行，
并将列表传递给需要列表而不是单字符串版本的方法。

此检查是完全错误的：

    if line.startswith("import") and "*" not in line:
        return IMPORT_STAR_ERROR

...所以会匹配以下语句：

import re

...，它将不匹配以下语句：

from long_long_long_name import *

删除“ not”是不够的，
因为规则不会与您要阻止的内容匹配。

在此规则和许多其他规则中使用正则表达式会更好。
例如，在该文件与其他全局变量：

RE_IMPORT_STAR = re.compile(r'^from .* import \*')

然后进行如下检查：

if RE_IMPORT_STAR.match(line):
    return IMPORT_STAR_ERROR

其他很多测试可以使用正则表达式，以便更好地进行判断，并且通常也可以提高性能。

您定义的规则有时过于宽松，例如：

for built_in in RESERVED_KEYWORDS:
    if built_in + " =" in code or built_in + "=" in code:
        return BULTIN_REASSIGNED_ERROR.format(built_in, built_in)

这将不匹配以下内容：

abs     = 'hello'

同时，相同的规则会使此完全有效的代码失败：

parser.add_argument('-a', '--alphabet',
                    help='override the default alphabet')

此代码中还有更多类似的示例。

\ $ \ begingroup \ $
请在编写之前进行测试：如果“ import re” .startswith（“ import”）和“ *”在“ import re”中，则print（“ Error”）不会打印任何内容。无论如何，您是正确的，它将与long_long_long_name导入不匹配*
\ $ \ endgroup \ $
– Caridorc
2014年12月13日下午16:39

\ $ \ begingroup \ $
我不好。现在已更正。在您的代码中，如果line.startswith（“ import”）和“ *”不对齐。我在我的版本中删除了“ not”，并错误地复制了那个而不是您的。更正了该帖子。
\ $ \ endgroup \ $
– janos
2014年12月13日在16:44

#2 楼

我的评价：
编写程序时，他们应该能够使用命令行参数来指定要处理的输入文件。您的程序将名称硬编码为code_with_bad_style.py。
您的许多函数会立即拆分行并进行处理。由于这项工作已经重复了很多次，您可能应该将行拆分一次，然后，而不是传递原始代码，而应传递这些行。
另外，对于类似UNIX的约定，您应该处理STDIN，如果在命令行上没有给出文件。
另一方面，当我发现该文件时，对我来说效果很好。

自我审查
python code_with_bad_style.py
您应该考虑使用一些文档字符串。
文档字符串是多行注释，用于解释函数的功能，
它们对读者有很大帮助。它们如下所示：

def function(a, b):
    """Do X and return a list."""

在python中，有一个非常强大的语言功能，称为列表理解。
以下内容：

result = []
for i in lst:
    if foo(i):
        result.append(bar(i))

应该替换为：

result = [bar(i) for i in lst if foo(i)]

在python中有一种非常方便的处理文件的方法：
with语句。它会自动处理关闭，因此您不必担心它。使用起来非常简单，请看以下示例：

with open("x.txt") as f:
    data = f.read()
    # do something with data

\ $ \ begingroup \ $
自我审查的想法是天才，我们人类要做好准备，自我审查，并在不久之后提出自我完善的代码！ :)
\ $ \ endgroup \ $
– Caridorc
2014年12月13日下午16:07

\ $ \ begingroup \ $
您应该注意两件事，仅供参考：复习问题生成器和未回答的Python问题
\ $ \ endgroup \ $
–rolfl
2014年12月13日下午16:12

#3 楼

您可以通过使用ast模块来简化和更正您的代码。不能在ast级别重新实现的唯一错误是“使用不带括号的打印。”

示例代码检查“重新设置了内置函数”，“缺少文档字符串”和__name__未使用：

import ast
import builtins
BUILTIN_NAMES = [name for name in dir(builtins) if not name.startswith("_")]
class ErrorChecker(ast.NodeVisitor):
     def __init__(self):
         self.errors = set()
         self.name_used = False
     def visit_Name(self, node):
          if node.id in BUILTIN_NAMES and isinstance(node.ctx,ast.Store):
               self.errors.add(BUILTIN_REASSIGNED_ERROR)
          elif node.id == "__name__":
               self.name_used = True
     def visit_FunctionDef(self, node):
          if not ast.get_docstring(node):
              self.errors.add(NO_DOCS_ERROR)
          self.generic_visit(node)
     visit_ClassDef = visit_FunctionDef

其他错误检查器也可以在AST级别上类似地实现，这将使它们不会意外捕获字符串文字中的错误语句或被字符串文字所迷惑。

也可以使用python解析器/编译器检查“打印功能周围缺少括号”错误：

import __future__
try:
   compile(code, FILENAME, "exec", __future__.print_function.compiler_flag)
except SyntaxError:
   #Code uses print without parentheses