将生成的随机字母字符串与输入匹配

我用Python编写了一个程序，并希望它更快，所以我用C＃编写了它，因为它已经编译了。令我惊讶的是，Python程序要快得多。我猜我的C＃代码有问题，但是它非常简单明了，所以我不知道。它们的结构大致相同。
C＃：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Diagnostics;

//This program generates a string of random lowercase letters and matches it to the user's input
//It does this until it gets a match
//It also displays the closest guess so far and the time it took to guess

namespace Monkey
{
    class Program
    {
        static string userinput()
        {
            //Takes user input, makes sure it is all lowercase letters, returns string
            string input;

            while(true)
            {
                input = Console.ReadLine();

                if (Regex.IsMatch(input, @"^[a-z]+$"))
                {
                    return input;
                }
            }
        }

        static string generate(int len)
        {
            //generates string of random letters, returns the random string
            Random rnd = new Random();
            string alpha = "abcdefghijklmnopqrstuvwxyz";
            int letterInt;
            StringBuilder sb = new StringBuilder();


            for (int i = 0; i < len; i++)
            {
                letterInt = rnd.Next(26);
                sb.Append(alpha[letterInt]);
            }

            return sb.ToString();
        }

        static int count(int len, string s, string g)
        {
            //returns number of letters that match user input
            int same = 0;

            for (int i = 0; i < len; i++)
            {
                if(g[i] == s[i])
                {
                    same++;
                }
            }

            return same;
        }

        static void Main(string[] args)
        {
            Console.WriteLine("They say if you lock a monkey in a room with a typewriter and enough time,");
            Console.WriteLine("the monkey would eventually type a work of Shakespeare.");
            Console.WriteLine("Let's see how well C# does...");
            Console.WriteLine("Enter a word");
            Console.WriteLine("(3 letters or less is recommended)");
            string solution = userinput();

            int size = solution.Length;
            bool success = false;
            string guess = null;
            int correct;
            int best = 0;
            Stopwatch watch = Stopwatch.StartNew();

            while (!success)
            {
                guess = generate(size);
                correct = count(size, solution, guess);

                if (correct == size)
                {
                    success = true;
                }

                else if (correct > best)
                {
                    Console.Write("The best guess so far is: ");
                    Console.WriteLine(guess);
                    best = correct;
                }
            }

            watch.Stop();
            TimeSpan ts = watch.Elapsed;
            Console.WriteLine("Success!");
            Console.Write("It took " + ts.TotalSeconds + " seconds for the sharp C to type ");
            Console.WriteLine("\"" + guess + "\"");

            Console.ReadLine();
        }
    }
}

Python：

import random
import time
#This program generates a string of random letters and matches it with the user's string
#It does this until it's guess is the same as the user's string
#It also displays closest guess so far and time it took to guess


def generate():
    # generate random letter for each char of string
    for c in range(size):
        guess[c] = random.choice(alpha)


def count():
    # count how many letters match
    same = 0
    for c in range(size):
        if guess[c] == solution[c]:
            same += 1
    return same


print("They say if you lock a monkey in a room with a typewriter and enough time,")
print("the monkey would eventually type a poem by Shakespeare")
print("Let's see how well a python does...'")

user = ""
badinput = True
while badinput:
    # Make sure user only inputs letters
    user = input("Enter a word\n(5 letters or less is recommended)\n")
    if user.isalpha():
        badinput = False

solution = list(user.lower())
size = len(solution)
guess = [""] * size
alpha = list("abcdefghijklmnopqrstuvwxyz")
random.seed()
success = False
best = 0    # largest number of correct letters so far
start = time.time()    # start timer

while not success:
    # if number of correct letters = length of word
    generate()
    correct = count()
    if correct == size:
        success = True
    elif correct > best:
        print("The best guess so far is: ", end="")
        print("".join(guess))
        best = correct

finish = time.time()    # stop timer
speed = finish - start

print("Success!")
print("It took " + str(speed) + " seconds for the python to type ", end="")
print("\"" + "".join(guess) + "\"")
input()

我想最好再输入Macbeth！

您可能会在StackOverflow上发现此问题有用，并讨论了实例化StringBuilder对象的利弊：stackoverflow.com/questions/550702/…

使用探查器来回答性能问题。还有什么在猜。

@agentnega：您看到自己在做什么，对吗？你在猜您的猜测是好的，有根据的猜测，可能是正确的，但它们仍然是猜测。就我们所知，C＃程序运行缓慢的原因是因为Console.WriteLine出现问题，导致输出阻塞或类似的事情。我一天之内就剖析了很多C＃程序，而且非常频繁-超过10％的时间-我最初对速度下降原因的猜测是完全错误的。工程师通过推理事实而不是猜测来解决问题。

您说：“我是用C＃编写的，因为它是经过编译的”； Python通常也可以编译。

#1 楼

我不了解python，所以我将重点介绍c＃代码。

您的程序是上下颠倒的。人们可能期望void Main在顶部，下面是更专门的代码。
C＃中的方法名称应始终遵循PascalCasing约定。只有您的Main方法可以做到这一点，如果您采用驼峰式惯例，则userinput最好称为userInput。给出的字符串，并且不执行任何操作来验证索引是否合法，乍一看似乎要求Count。

性能方面，我认为您是我需要将测试的随机性排除在外以进行任何基准测试，以表示任何意义。

使用IndexOutOfRangeException是个很好的选择。

\ $ \ begingroup \ $
这是随机的，但是如果您同时运行它们，则python始终比C＃快得多。感谢您的提示。我将方法放在底部并更改了名称。
\ $ \ endgroup \ $
– bobpal
2014-02-25 19:29

\ $ \ begingroup \ $
如果您将循环外的rnd和sb对象的创建视为@vals，是否相同？我怀疑您正在创建的所有Random实例存在种子问题。可能您将程序运行了10K次，并且以相同的“随机”顺序运行了10K次。 C＃也是通过IL代码从Jit编译的，所以是的，它已经编译过了，但是是一种中间语言-也有CLR来完成它的工作。不确定您的比较是否公平。
\ $ \ endgroup \ $
– Mathieu Guindon♦
2014-2-25在19:35

\ $ \ begingroup \ $
是的！问题是方法内部的随机初始化。感谢您指出了这一点。在调试器中单步执行并没有抓住它。是的，我了解C＃及其编译方式。
\ $ \ endgroup \ $
– bobpal
2014-2-25在20:04

\ $ \ begingroup \ $
公平地说，@ user2180125确实是vals的答案引起了性能问题。如果您接受我的代码审查答案，我将接受；但是，如果您接受我的答复，以获取有关将实例化带出循环的建议，则vals的答复应带有复选标记;）
\ $ \ endgroup \ $
– Mathieu Guindon♦
2014年2月25日在21:08

\ $ \ begingroup \ $
@ Mat'sMug没关系，我应该写一个更彻底的答案:-)
\ $ \ endgroup \ $
–vals
2014-2-25在21:34

#2 楼

如果要提高性能，请不要在内部循环中创建对象：

    static string generate(int len)
    {
        Random rnd = new Random();                    // creating a new object
        string alpha = "abcdefghijklmnopqrstuvwxyz";
        int letterInt;
        StringBuilder sb = new StringBuilder();      // creating a new object
        ....

    }

创建一次并重用它们

\ $ \ begingroup \ $
gh。我怎么想念它！接得好！
\ $ \ endgroup \ $
– Mathieu Guindon♦
2014年2月25日18:50

\ $ \ begingroup \ $
每个用户输入一次创建对象，几乎没有性能问题...
\ $ \ endgroup \ $
– Uri Agassi
2014年2月25日在18:51

\ $ \ begingroup \ $
@UriAgassi不是。在while循环的每次迭代中创建对象，该循环在用户输入后运行。
\ $ \ endgroup \ $
– Mathieu Guindon♦
2014-2-25在19:40

\ $ \ begingroup \ $
@Phoshi我认为我对复数的使用重用了它们，使您相信我建议使用对象池。我的建议要容易得多。只需创建一个Random类的对象和一个StringBuilder类的对象，它们就意味着2个对象。
\ $ \ endgroup \ $
–vals
2014年2月26日在17:34

\ $ \ begingroup \ $
@Andris：“默认情况下不要对象池或重用，只有在确实存在问题时才这样做”。随机有语义问题，不用担心速度问题，并且绝对不能在紧密的循环中创建。但是，这不适用于绝大多数对象。
\ $ \ endgroup \ $
– Ph子
2014年2月27日在11:20

#3 楼

要回答为什么您的C＃代码这么慢的问题。

这行。根据剖析器，大约90％的执行时间是通过重新创建Random对象来占用的。您会注意到您的python代码仅使用随机变量而不是重新创建它。

Random rnd = new Random();

如果将generate方法更改为：

    static Random rnd = new Random();

    static string generate(int len)
    {
        //generates string of random letters, returns the random string
        string alpha = "abcdefghijklmnopqrstuvwxyz";
        int letterInt;
        StringBuilder sb = new StringBuilder();

        for (int i = 0; i < len; i++)
        {
            letterInt = rnd.Next(26);
            sb.Append(alpha[letterInt]);
        }

        return sb.ToString();
    }

这些更改将显示两种语言的相似性能。

#4 楼

在您的python代码中，您将字符数组用作guess。在您的C＃代码中，您改为构建StringBuffer。尝试改用char[]。

由于您在进行代码审查-我还将对您的代码提出一些想法：

命名约定-C＃方法命名为PascalCase，并且python函数命名为snake_case，所以-分别为UserInput()和user_input():。有意义的名称-generate和count不能很好地传达其中GenerateRandomString和CountSimilarLetters的代码含义。 alpha，len，g，s等也一样

\ $ \ begingroup \ $
您对camelCase和PascalCase感到困惑，但至少可以正确使用它们。尽管camelCase也不使用下划线-看起来就像骆驼。
\ $ \ endgroup \ $
–魔术师
2014年2月25日在19:17

\ $ \ begingroup \ $
@Magus，已修复，谢谢。CamelCase可以用于上下第一个字符（en.wikipedia.org/wiki/CamelCase），而pascal_case是一个错字...
\ $ \ endgroup \ $
– Uri Agassi
2014-02-25 19:19

#5 楼

我将研究编译您的正则表达式，并将其作为类级成员以实现最佳重用。另外，以相同的方式提升随机数生成器，因为不断地重新生成它从来不是一个好习惯。

旧代码：

    static string userinput()
    {
        //Takes user input, makes sure it is all lowercase letters, returns string
        string input;

        while(true)
        {
            input = Console.ReadLine();

            if (Regex.IsMatch(input, @"^[a-z]+$"))
            {
                return input;
            }
        }
    }

    private static string generate(int len)
    {
        // generates string of random letters, returns the random string
        Random rnd = new Random();
        string alpha = "abcdefghijklmnopqrstuvwxyz";
        StringBuilder sb = new StringBuilder();
        int letterInt;

        for (int i = 0; i < len; i++)
        {
            letterInt = rnd.Next(26);
            sb.Append(alpha[letterInt]);
        }

        return sb.ToString();
    }

新代码：

    private static readonly Regex regex = new Regex(@"^[a-z]+$", RegexOptions.Compiled);
    private static readonly Random rnd = new Random();

    static string userinput()
    {
        //Takes user input, makes sure it is all lowercase letters, returns string
        string input;

        while(true)
        {
            input = Console.ReadLine();

            if (regex.IsMatch(input))
            {
                return input;
            }
        }
    }

    private static string generate(int len)
    {
        // generates string of random letters, returns the random string
        const string alpha = "abcdefghijklmnopqrstuvwxyz";
        StringBuilder sb = new StringBuilder();
        int letterInt;

        for (int i = 0; i < len; i++)
        {
            letterInt = rnd.Next(26);
            sb.Append(alpha[letterInt]);
        }

        return sb.ToString();
    }

\ $ \ begingroup \ $
如果字符串是123a56和123b56，则您的计数将为3，OP的计数将为5。
\ $ \ endgroup \ $
– ChristW
2014-2-25在23:43

\ $ \ begingroup \ $
@ChrisW dangit，它固定了“到目前为止最好的”显示。哎呀纠正。
\ $ \ endgroup \ $
–Jesse C. Slicer
2014-2-25在23:55

#6 楼

这是您的Python程序的最新评论。

我不知道C＃。不过，老实说，我真的对C＃的速度感到惊讶，因为您编写的内容不是如何在Python中实现快速变体。

在Python中，您需要在情况变慢时使用Numpy。实际上，您将需要以下内容：

"""
This program generates a string of random letters and matches it with the user's string.
It does this until its guess is the same as the user's string.
It also displays closest guess so far and time it took to guess.
"""

import numpy
import time
from string import ascii_lowercase

# Everything is ASCII, which is what the "c" means.
letters = numpy.array(list(ascii_lowercase), dtype="c")

def brute_force(solution):
    """
    BRRUUUUTTTEEE FFOOORRRCCCEEEE!!!!

    Repeatedly guess at a solution until it matches.
    """

    # Convert to char array
    solution = numpy.array(list(solution.casefold()), dtype="c")

    best = 0
    while True:
        # Do loads of guesses (100000) at once
        guesses = numpy.random.choice(letters, (100000, len(solution)))

        # Check all of the characters for equality, and count the number
        # of correct for each row
        corrects = (guesses == solution).sum(axis=1)

        # Gets the highest-so-far at each point
        maximums = numpy.maximum.accumulate(corrects)

        # Gets how much the maximum increased
        changes = numpy.diff(maximums)

        # Indexes of the increases
        # numpy.where returns a tuple of one element, so unpack it
        [when_increased] = numpy.where(changes > 0)
        # Need to increase by one, because these are indexes into
        # the differences whilst we want indexes into the final array
        # for the *increased* (not increasing) elements
        when_increased += 1

        for index in when_increased:
            guess = guesses[index]
            correct = corrects[index]

            if correct > best:
                yield str(guess, encoding="ascii")
                best = correct

            if correct == len(solution):
                return


print("They say if you lock a monkey in a room with a typewriter and enough time,")
print("the monkey would eventually type a poem by Shakespeare")
print("Let's see how well a python does...'")

while True:
    print("Enter a word")
    print("(5 letters or less is recommended)")
    user_input = input()

    # Make sure user only inputs letters
    if user_input.isalpha():
        break

start = time.time()

for solution in brute_force(user_input):
    print("The best guess so far is: ", solution)

elapsed_time = time.time() - start

print("Success!")
print("It took ", elapsed_time, " seconds for the python to type ", repr(solution))

看起来可能更令人生畏，但其中只有约30条逻辑线，其中大多数是微不足道的。 br />
基本思想是使大量的随机选择负载：

guesses = numpy.random.choice(letters, (100000, len(solution)))

这是快速的。这样就形成了100000xN的矩阵。对于N = 3：

h s l
w t x
a m e
i x t
  ⋮

然后可以通过将每一行与给出guesses == solution数组的解（bool）比较来计算出每一行的正确数字，对每行（.sum(axis=1)）求和以得到正确的数字。

从那里的多余部分（numpy.diff）假装我们按顺序进行了此操作，尽管我们没有这样做。在大多数实际情况下，这并不相关，在该情况下，人们只能从那堆打印出最佳答案。

总体而言，这比原始速度快（> 10倍），并且大部分时间都花在了在numpy.random.choice中，这意味着在不替换Numpy生成的高质量随机数的情况下，不可能进一步有意义地加快速度。

所以不要以为Python慢是因为它具有解释性和动态性。借助此类程序，高质量的实现可以快速运行。

由于这是代码审查，因此我还将指出一些您需要改进的地方：

文档字符串，而不是注释。

写

"""
This program generates a string of random letters and matches it with the user's string.
It does this until its guess is the same as the user's string.
It also displays closest guess so far and time it took to guess.
"""

而不是

# This program generates a string of random letters and matches it with the user's string.
# It does this until its guess is the same as the user's string.
# It also displays closest guess so far and time it took to guess.

这听起来听起来很笨拙，但有助于自省。

减少对全局变量的使用。函数应该（几乎）永远不共享状态；它应该在他们之间传递。相信我这意味着您可以重复使用它们并四处移动它们而不必担心依赖关系。它还允许本地更改保持本地状态。
将IO集中在一起。如果您查看我的变体，则所有打印都已本地化，逻辑被分离了。这样就对程序进行了模块化，因此您可以随时随地移动并弄乱逻辑，而不会始终无所事事。
random.seed()没有给出任何值在这里毫无意义。它仅应在random.seed(some_value)之后使用以除去种子。

此代码：

done = False
while not done:
    if ...:
        done = True

写得好得多

/>

while True:
    if ...:
        break

有些人不同意。它们是错误的。

诸如此类的东西

for c in range(size):
    guess[c] = random.choice(alpha)

依次访问每个项目的情况下，最好写成没有索引，例如：

guess[:] = [random.choice(alpha) for _ in range(size)]

，但是如果您使用我的先前建议，这确实应该是

def generate(size):
    """Generate random letter for each character of string."""
    return [random.choice(alpha) for _ in range(size)]

。

count也是如此，可以是

sum(g==s for g, s in zip(guess, solution))

不要像guess = [""] * size那样无意义地预分配值。它只是隐藏了错误。

如果您发现需要这样做，则可能是因为我不建议您不要使用全局变量。

编程黑洞网