用于理论建模的各种科学编程语言的基准

对于面向生物学专业学生的理论建模课程，我正在尝试确定哪种语言是在优雅和紧凑的同时进行进化模拟的最佳技术编程语言，同时仍然表现出色（这是用于模拟，我正在使用Mathematica进行分析的东西））。

作为一个简单的示例，我编写了一个模拟模块，该模块研究了离散时间下密度依赖性竞争下无性繁殖种群中连续性状的演变（即使用非重叠世代，使用递归）方程，其中父母全都同时繁殖，产生后代，然后父母全都死亡，后代成为新的父母），用4种不同的语言表达：R，Julia，Mathematica和Matlab。

但没有出现速度非常快，在我的简单实现中，没有一个能够模拟出100万个世代。欢迎对在任一实现中优化代码性能，内存使用或可伸缩性的任何想法，以及对其他高级技术计算语言（如Python / Numba / Cython）的任何建议或移植。优雅的基于C，C ++，Java或JVM OO的编程语言实现也很酷。

R实现：

# function to calculate new offspring trait values from parent population trait values tv
# given a density-dependent fitness function fitnessfunc,
# a mutation rate mutrate and standard deviation of mutational effects stdev
doStep = function (tv, fitnessfunc, mutrate, stdev) {
  n = length(tv) # current population size
  numberoffspring = rpois(n, fitnessfunc(R=tv, popsize=n)) # number of offspring of each parent
  newtv = rep(tv, times=numberoffspring) # offspring are copies of their parents
  # save occasional mutation which we apply below
  n = length(newtv)
  nrmutants = rbinom(1, n, mutrate)
  rnoise = rnorm(nrmutants, mean=0, sd=stdev)
  rndelem = sample(1:n, nrmutants, replace=FALSE)
  newtv[rndelem] = newtv[rndelem] + rnoise
  return(pmax(newtv,0))
}

# function to iterate this a given number of generations, starting with trait values tv
evolve = function (tv, fitnessfunc, mutrate, stdev, ngens) {
  sapply(1:ngens, function (gen) { tv <<- doStep(tv, fitnessfunc, mutrate, stdev) # slightly dodgy solution by defining tv globally, but works
                                   return(tv) })
}

# example parameters
psize = 1000 # pop size
ngens = 10000  # nr of generations to simulate
mutrate = 0.005 # mutation rate
stdev = 0.05 # st dev of mutant trait values
k = 2*psize # carrying capacity
init_traitv = runif(psize, 2.5, 2.6) # initital trait values
fitnessf = function (R, popsize, K=k) pmax((1 + R*(1 - popsize/K)), 0) # density-dependent fitness function

# do simulation and plot results
set.seed(1)
system.time(out <- evolve(init_traitv, fitnessf, mutrate, stdev, ngens)) 

popsizes = sapply(out,function(x) length(x))
# densities = sapply(out, function (x) hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count/sum(x,na.rm=TRUE) )
densities = sapply(out, function (x) tabulate(findInterval(x, vec=seq(0,3,length.out=100)),nbins=99)/sum(x,na.rm=TRUE) ) # slightly faster
par(mfrow=c(1,2))
plot(popsizes, 1:ngens, type="l", xlab="Population size", ylab="Generation", col="steelblue", las=1, cex.axis=0.7, ylim=c(1,ngens), yaxs="i")
image(x=seq(0,3,length.out=99), y=1:ngens, z=densities, col=colorRampPalette(c("grey100","grey0"))(50), 
      useRaster=TRUE, xlim=c(0,3), las=1, cex.axis=0.7, xlab="Reproductive rates (R)", ylab="Generation")
box()

Julia实现：

## Pkg.add("Distribution")
using Distributions
# Pkg.clone("https://github.com/ChrisRackauckas/VectorizedRoutines.jl") # for R.rep function
# Will be Pkg.add("VectorizedRoutines") after being added to package system
using VectorizedRoutines

function doStep(tv, fitnessfunc, mutrate, stdev)
    n = length(tv) # current population size
    numberoffspring = [rand(Poisson(theta)) for theta in fitnessfunc(tv, n, k)] # number of offspring of each parent
    newtv = R.rep(tv, each = numberoffspring) # offspring are copies of their parents
    # but some of them mutate, so we also apply mutation
    n = length(newtv)
    nrmutants = rand(Binomial(n, mutrate),1)[1]
    rnoise = randn(nrmutants)*stdev
    rndelem = sample(1:n, nrmutants[1], replace=false)
    newtv[rndelem] = newtv[rndelem] + rnoise
  return(max(newtv,0))
end

function evolve(tv, fitnessfunc, mutrate, stdev, ngens)
    Results = Array(Array, ngens)
    Results[1] = tv
    for idx = 2:ngens
        Results[idx] = doStep(Results[idx - 1], fitnessfunc, mutrate, stdev)
    end
    return(Results)
end

psize = 1000 # population size;
ngens = 10000  # nr of generations to simulate;
mutrate = 0.005 # mutation rate;
stdev = 0.05 # st dev of mutant trait values;
k = 2*psize # carrying capacity;
srand(1);
init_traitv = rand(2.5:.1/psize:2.6,psize) # initial trait values;
fitnessfunc(R, popsize, K=k) = max((1 + R*(1 - popsize/K)), 0);

@time res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens);

popsizes = [length(res[idx]) for idx = 1:length(res)] # population size;
densities = hcat([hist(res[idx],0:3/100:3)[2]/length(res[idx]) for idx = 1:length(res)]...)' # densities;

## Pkg.add("Plots")
using Plots
p1 = plot(popsizes, 1:ngens, lab = "", 
          xlabel = "Population size", yaxis = ("Generation",(0,ngens)));
p2 = heatmap(densities, xticks = (100/6:100/6:100, 0.5:0.5:3), fill = :grays,  # axis labels could probably be specified more elegantly
             xlabel = "Reproductive rate R");
plot(p1, p2)

Mathematica实现：

(* fast Poisson random number generator *)
fastPoisson = Compile[{{lambda, _Real}},
   Module[{b = 1., i, a = Exp[-lambda]}, 
    For[i = 0, b >= a, i++, b *= RandomReal[]]; i - 1],
   RuntimeAttributes -> {Listable}, Parallelization -> True];

(* function to mutate trait values *)
mutate[traitvalues_, mutrate_, stdev_] := 
 Module[{tv, n, nrmutants, rnoise, rndelem},
  tv = traitvalues;
  n = Length[tv];
  nrmutants = RandomVariate[BinomialDistribution[n, mutrate]];
  rnoise = RandomReal[NormalDistribution[0, stdev], nrmutants];
  rndelem = RandomChoice[Range[n], nrmutants];
  tv[[rndelem]] += rnoise;
  Clip[tv, {0, 10^100}]]

(* function to calculate new offspring trait values from parent \
population with trait values *)
doStep[tv_, fitnessfunc_, mutrate_, stdev_] := 
 Module[{n, fitnessinds, numberoffspring, newtv},
  n = Length@tv;
  fitnessinds = fitnessfunc[tv, n];
  numberoffspring = fastPoisson[fitnessinds];
  newtv = Join @@ MapThread[ConstantArray, {tv, numberoffspring}];
  mutate[newtv, mutrate, stdev]]

(* function to iterate a number of generations *)
evolve[fitnessfunc_, initpop_, mutrate_, stdev_, generations_] := 
 NestList[doStep[#, fitnessfunc, mutrate, stdev] &, initpop, 
  generations]

(* function to plot results of one run *)
PlotResult[traitvalues_, maxphen_] := 
 Module[{generations, pop, maxscaleN, minscaleN, maxscaleP, 
   frequencydata},
  generations = Length[traitvalues];
  pop = Length /@ traitvalues;
  maxscaleN = Max[pop];
  minscaleN = Min[pop];
  maxscaleP = Max[Max[traitvalues], maxphen];
  frequencydata = (BinCounts[#, {0, maxscaleP, 1/100}] & /@ 
      traitvalues)/(pop + 0.00001);

  GraphicsRow[{ ListPlot[Transpose[{pop, Range[generations]}], 
    Joined -> True, Frame -> True, 
    FrameLabel -> {"Population size", "Generation"}, AspectRatio -> 2,
     PlotRange -> {{Clip[minscaleN - 50, {0, \[Infinity]}], 
       maxscaleN + 50}, {0, generations}}], 
   Show[ArrayPlot[frequencydata, 
     DataRange -> {{0, maxscaleP}, {1, generations}}, 
     DataReversed -> True], Frame -> True, FrameTicks -> True, 
    AspectRatio -> 2, 
    FrameLabel -> {"Phenotype frequency", "Generation"}] }] 
]

(* example parameters and simulation *)
psize = 1000; ngens = 10000; mutrate = 0.005; stdev = 0.05; k = 
 2*psize;
f[R_, popsize_] = 
  Clip[(1 + R (1 - popsize/k)), {0.000001, 10^100}];
First@AbsoluteTiming[
  traitvalues = 
    EvolveHapl[f, RandomReal[{2.5, 2.6}, psize], mutrate, stdev, 
     ngens];]
PlotResult[traitvalues, 3]

Matlab实现：

doStep.m函数文件：

function [ tv ] = doStep( tv, fitnessf, mutrate, stdev )
% Function to calculate new offspring trait values from parent population trait values tv
% given a density-dependent fitness function fitnessf,
% a mutation rate mutrate and standard deviation of mutational effects stdev

 n = length(tv); % current population size
 numberoffspring = poissrnd(fitnessf(tv, n)); % number of offspring of each parent
 newtv = repelem(tv, numberoffspring); % offspring are copies of their parents
 % save occasional mutation which we apply below
 n = length(newtv);
 nrmutants = binornd(n, mutrate);
 rnoise = normrnd(0, stdev, 1, nrmutants); 
 rndelem = datasample([1:n], nrmutants, 'Replace', false);
 newtv([rndelem]) = newtv([rndelem]) + rnoise;
 tv = max(newtv,0);

end

evolution.m函数文件：

function [ Results ] = evolve( tv, fitnessf, mutrate, stdev, ngens )
% Iterate evolution given number of generations and output all trait
% vectors in each generation as cell array Results
    out{1} = tv;
    for idx = 2:ngens
        out{idx} = doStep(out{idx - 1}, fitnessf, mutrate, stdev);
    end
    Results = out;

end

Matlab脚本文件：

% example parameters
psize = 1000; % pop size
ngens = 10000;  % nr of generations to simulate
mutrate = 0.005; % mutation rate
stdev = 0.05; % st dev of mutant trait values
k = 2*psize; % carrying capacity
rng('default');
init_traitv = 2.5 + (2.6-2.5).*rand(1,psize); % initital trait values
fitnessf = @(R, popsize) max((1 + R.*(1 - popsize/k)), 0) % density-dependent fitness function

% do simulation and plot results
tic
out = evolve(init_traitv, fitnessf, mutrate, stdev, ngens);
toc

popsizes = cellfun(@(el) length(el), out);
densities = reshape(cell2mat(cellfun(@(el) histcounts(el, 0:3/100:3), out, 'UniformOutput', false)),100,ngens)';

% plot results
clf
subplot(1,2,1)
plot(popsizes,1:ngens)
xlabel('Population size', 'FontSize', 16)
ylabel('Generation', 'FontSize', 16)
subplot(1,2,2)
imagesc(linspace(0,3,100), linspace(1,ngens,100), densities)
set(gca,'ydir','normal')
colormap(flipud(gray))
xlabel('Reproductive rate (R)', 'FontSize', 16)
% colorbar()
% ylabel(p1,'Density','FontSize', 16)

在Windows 8.1下我的Intel i7 2.4 GHz 8 Gb笔记本电脑上这4种实现的时间：

ngens = 10000和psize = 1000：

R 3.2.3：5.8 s（Microsoft R Open，具有多线程BLAS / LAPACK库和4个内核，但不确定是否已使用它们）
Julia 0.4.6：20.6 s
Matlab 2015a：31.9 s
Mathematica 10：96.5 s

ngens = 100000和psize = 1000：

R 3.2.3：38 s（Microsoft R Open，具有多线程BLAS / LAPACK库和4个内核，但不确定使用它们）
朱莉娅0.4.6：231 s
Matlab 2015a ：328 s
Mathematica 10：966 s

由此看来，R似乎是赢家，尽管朱莉娅的时机相对较差，令我感到惊讶，因为已公布的基准通常非常好（关闭到C ++），并且通常比R快得多。

是否有人想到我在Julia实现中做错了什么（类型稳定性问题或由于缺少类型声明而引起）？请注意，尽管热图可以接受一些下采样，但理想的绘制例程应该能够绘制一百万代的输出。

考虑进行概要分析以找出潜在的热点位置-这可能有助于集中解决问题：adv-r.had.co.nz/Profiling.html另请参见adv-r.had.co.nz/Performance.html

谢谢-我刚刚发现的一个瓶颈似乎是“ newtv = do.call（c，sapply（1：n，function（i）rep（tv [i]，numberoffspring [i]））））”这一行-任何想法这样更快/更好吗？

Ha刚找到它-带有时间参数的rep（）:-)

听起来您的代码无法正常工作。你能澄清一下吗？

您认为我做错了什么？对我来说，代码似乎有效...

#1 楼

我从您拥有的Julia代码开始，也得到了约20秒的时间，所以我认为我的时间安排与您的时间安排相似。让我逐步介绍如何执行此操作。

首先，请注意，如果您在REPL中运行代码，则定义的变量是全局的。这产生了良好的性能成本。有两种方法可以解决此问题：

1）将其全部包装在一个函数中，例如：

function plotThePops()
   #all code in here
end
plotThePops()

或2）更改一些全局变量转换为常量。

编译器需要这样做，因为否则它将无法正确推断这些变量的类型（在全局范围内，它们可以在任何地方更改！），这意味着它需要在各处编译一堆帮助程序代码。万一类型改变了。

我用了const路由，得到以下代码：

using Distributions
# Pkg.clone("https://github.com/ChrisRackauckas/VectorizedRoutines.jl") # for R.rep function
# Will be Pkg.add("VectorizedRoutines") after being added to package system
using VectorizedRoutines

function doStep(tv, fitnessfunc, mutrate, stdev)
    n = length(tv) # current population size
    numberoffspring = [rand(Poisson(theta)) for theta in fitnessfunc(tv, n)] # number of offspring of each parent
    newtv = R.rep(tv, each = numberoffspring) # offspring are copies of their parents
    # but some of them mutate, so we also apply mutation
    n = length(newtv)
    nrmutants = rand(Binomial(n, mutrate),1)[1]
    rnoise = randn(nrmutants)*stdev
    rndelem = sample(1:n, nrmutants[1], replace=false)
    newtv[rndelem] = newtv[rndelem] + rnoise
  return(max(newtv,0))
end

function evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)
    Results = Vector{Vector{Float64}}(ngens)
    Results[1] = init_traitv
    for idx = 2:ngens
        Results[idx] = doStep(Results[idx - 1], fitnessfunc, mutrate, stdev)
    end
    return(Results)
end

const psize = 1000 # population size;
const ngens = 10000  # nr of generations to simulate;
const mutrate = 0.005 # mutation rate;
const stdev = 0.05 # st dev of mutant trait values;
const k = 2*psize # carrying capacity;
srand(1);
const init_traitv = rand(2.5:.1/psize:2.6,psize) # initial trait values;
fitnessfunc(R, popsize; K=k) = max((1 + R*(1 - popsize/K)), 0)

@profile res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)
@time res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)

popsizes = Int[length(res[idx]) for idx = 1:length(res)] # population size;
densities = hcat([hist(res[idx],0:3/100:3)[2]/length(res[idx]) for idx = 1:length(res)]...)' # densities;



## Pkg.add("Plots")
## Pkg.add("GR") if you want to use the GR plotting library.
using Plots
gr()
p1 = plot(popsizes, 1:ngens, lab = "",
          xlabel = "Population size", yaxis = ("Generation",(0,ngens)));
p2 = heatmap(densities, xticks = (100/6:100/6:100, 0.5:0.5:3), fill = :grays,  # axis labels could probably be specified more elegantly
             xlabel = "Reproductive rate R");
plot(p1, p2)
gui()

请注意，我也切换到了GR绘图后端，重点是速度。那不是时间安排，只是想你应该知道。这样可以将计时降低到约17秒。不多，但是还可以。

那我们下一步要去哪里？要找出答案，请将时序线更改为性能分析：

@profile res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)
using ProfileView
ProfileView.view()

这会打开一个像这样的窗口：

这是您阅读该图的方式。块长度表示一行所占时间的百分比，如果您滚动浏览该块，则会告诉您它是哪一行。如果一个块位于另一个块的顶部，则意味着顶部的代码是从其下面的代码调用的。我圈出的行用于dostep函数，我圈出的部分用于以下行：

numberoffspring = [rand(Poisson(theta)) for theta in fitnessfunc(tv, n)]

这意味着〜80％的时间您的时间是由于这条线！显然，这意味着实施效率低下。实际上，这可以通过展开和关闭边界检查来改善。因此，我在VectorizedRoutines.jl中实现了以下功能：

function rpois(n::Int,p::Vector{Float64})
  out = Vector{Int}(n)
  for i in 1:n
    @inbounds out[i] = rand(Poisson(p[i]))
  end
  out
end

，然后将有问题的dostep行更改为：

numberoffspring = R.rpois(n,fitnessfunc(tv, n))

（如果要将函数放在脚本中，可以将其称为rpois）。现在，根据随机性，时间减少到2.8-4秒（而ngens = 100000则为〜32-34秒）。我再次剖析

那个大红色块是VectorizedRoutines.jl的R.rep函数。被发现的地方是：

  for j in eachindex(x)
    @inbounds for k in 1:each[j]
      index += 1
      @inbounds rval[index] = x[j]
    end
  end

这只是直线移动的数字，因此可能无法改进。

因此最终代码是这样的：

using Distributions
# Pkg.clone("https://github.com/ChrisRackauckas/VectorizedRoutines.jl") # for R.rep function
# Will be Pkg.add("VectorizedRoutines") after being added to package system
using VectorizedRoutines

function doStep(tv, fitnessfunc, mutrate, stdev)
    n = length(tv) # current population size
    numberoffspring = R.rpois(n,fitnessfunc(tv, n))
    newtv = R.rep(tv,numberoffspring) # offspring are copies of their parents
    # but some of them mutate, so we also apply mutation
    n = length(newtv)
    nrmutants = rand(Binomial(n, mutrate))
    rnoise = scale!(randn(nrmutants), stdev)
    rndelem = sample(1:n, nrmutants[1], replace=false)
    newtv[rndelem] += rnoise
  return(max(newtv,0.0))
end

function evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)
    Results = Vector{Vector{Float64}}(ngens)
    Results[1] = init_traitv
    for idx = 2:ngens
        Results[idx] = doStep(Results[idx - 1], fitnessfunc, mutrate, stdev)
    end
    return(Results)
end

const psize = 1000 # population size;
const ngens = 10000  # nr of generations to simulate;
const mutrate = 0.005 # mutation rate;
const stdev = 0.05 # st dev of mutant trait values;
const k = 2*psize # carrying capacity;
srand(1);
const init_traitv = rand(2.5:.1/psize:2.6,psize) # initial trait values;
fitnessfunc(R, popsize; K=k) = [max((1.0 + r*(1.0 - popsize/K)), 0.0) for r in R]

@profile res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)
@time res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)

popsizes = Int[length(res[idx]) for idx = 1:length(res)] # population size;
densities = hcat([hist(res[idx],0:3/100:3)[2]/length(res[idx]) for idx = 1:length(res)]...)' # densities;



## Pkg.add("Plots")
## Pkg.add("GR") if you want to use the GR plotting library.
using Plots
gr()
p1 = plot(popsizes, 1:ngens, lab = "",
          xlabel = "Population size", yaxis = ("Generation",(0,ngens)));
p2 = heatmap(densities, xticks = (100/6:100/6:100, 0.5:0.5:3), fill = :grays,  # axis labels could probably be specified more elegantly
             xlabel = "Reproductive rate R");
plot(p1, p2)
gui()

隐藏了一些复杂性，因为我制作了Poisson随机数生成的快速版本并将其添加到VectorizedRoutines.jl包中，但是即使如此功能很简单。这就是Julia的优点：那些“低级实用函数”可以用Julia编写，因此可以快速/轻松地直接在Julia中创建“快速”的事物版本，而对于R / MATLAB / Python，这些函数是用Julia编写的C或Fortran。

我希望这也能很好地概述如何优化Julia代码。我可以将其重新发布到我的博客中。我不知道您会得到什么时间/配置文件。

编辑

在王凤阳@TotalVerb的帮助下，我们通过Gitter改进了VectorizedRoutines中的R.rep函数。 jl。为此，只需运行Pkg.update()并将R.rep行更改为：

newtv = R.rep(tv,numberoffspring)

，您应该有所改进。现在，在我的计算机上，代码可以在1.1秒内运行，因此速度提高了3倍以上。给定已发布的内容，它应该很容易击败R的时间。

作为参考，新的配置文件如下所示：

我圈了圈RNG时间。如评论中所述，从Google Summer of Code页面可以看到，有一个项目可以使用更快的psudo-rngs来制作仓库RNG.jl。看起来它已经实现了一些新功能。因此，如果您在夏季末运行此代码，则可能会更快！

编辑2

最终代码中的doStep和fitnessfunc进行了少许更改以减少分配。我们削减了一些分配，并且rpois将Rmath库用于其RNG。此时，R和Julia正在进行基本相同的计算。

R之所以表现不错，是因为它直接调用了C库，但是从这些分析看来，优化的C代码可能只好了30％。因此开销不是问题，但是主要的问题是实际算法（和RNG）。一个新的时间可能会使Julia和R处于大约相同的时间。最好在不能很好向量化的算法上测试Julia vs R！

编辑3

为了完整起见，我使用in放置操作。虽然先前的版本在时间上与R匹配，但是它会做得更好，因为它以一种向量化函数调用无法实现的方式减少了分配。例如，我们可以将R.rpois更改为R.rpois！（！是就地函数的Julia约定）通过执行以下操作重用上一轮的newoffspring向量：

  function rpois!(n::Int,p::Vector{Float64},out::Vector{Int})
    resize!(out,n)
    for i in 1:n
      @inbounds out[i] = StatsFuns.RFunctions.poisrand(p[i]) #rand(Poisson(p[i])) #
    end
  end

这里是newoffspring。请注意，唯一的区别是我们在使用它之前调整大小，现在必须将它传递到任何地方。对于此代码的较小调整，我们减少了分配。我们可以对R.rep到R.rep做同样的事情！作者：

  function rep!{T1,T2}(x::AbstractVector{T1},rval::AbstractVector{T1},each::AbstractVector{T2} = ones(Int,length(x)))
length_old = length(x)
length_old != length(each) && throw(ArgumentError("If the argument 'each' is not a scalar, it must have the same length as 'x'."))
length_out = sum(each) #times =  1
index = length_out
resize!(rval,length_out)
resize!(x,length_out)
for j in length_old:-1:1
  @inbounds @simd for k in 1:each[j]
    @inbounds rval[index] = x[j]
    index -= 1
  end
end
x[1:length_out] = rval[1:length_out]
return(length_out)
end

现在，我们在第二个向量周围保留中间值（这是必需的，并且由于给定样本具有0个后代。如果不可能的话，您可以直接向x写入，以提高速度！）。

请注意，您可以作弊并知道，鉴于问题，rval永远不会大于5000，因此您可以制作此函数的特殊版本，在其中注释掉resize!(rval,length_out)行，将rval预先分配为。 5000，并且永远不必处理重新分配它。这有点“作弊”，因为该函数通常无法与此配合使用（rval需要调整大小时会出错，因此我没有将其添加到库中！），因此答案将要求您定义一个rep!函数这样做会增加答案的复杂性。但是，这确实节省了更多时间。

仍然，这仅节省了一点时间，因为大部分时间仍在RNG中。作为参考，具有就地函数调用的脚本是这样的：

using Distributions
# Pkg.clone("https://github.com/ChrisRackauckas/VectorizedRoutines.jl") # for R.rep function
# Will be Pkg.add("VectorizedRoutines") after being added to package system
using VectorizedRoutines

function doStep(tv, fitnessfunc, mutrate, stdev,numberoffspring,rval)
  n = length(tv) # current population size
  R.rpois!(n,fitnessfunc(tv, n),numberoffspring)
  n = R.rep!(tv,rval,numberoffspring)
  # but some of them mutate, so we also apply mutation
  nrmutants = rand(Binomial(n, mutrate))
  rnoise = scale!(randn(nrmutants), stdev)
  rndelem = sample(1:n, nrmutants[1], replace=false)
  tv[rndelem] += rnoise
  return(max(tv,0))
end

function evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)
    Results = Vector{Vector{Float64}}(ngens)
    Results[1] = init_traitv
    numberoffspring = Vector{Int}(5000)
    rval = similar(init_traitv)
    for idx = 2:ngens
        @inbounds Results[idx] = doStep(Results[idx - 1], fitnessfunc, mutrate, stdev,numberoffspring,rval)
    end
    return(Results)
end

const psize = 1000 # population size;
const ngens = 10000  # nr of generations to simulate;
const mutrate = 0.005 # mutation rate;
const stdev = 0.05 # st dev of mutant trait values;
const k = 2*psize # carrying capacity;
srand(1);
const init_traitv = rand(2.5:.1/psize:2.6,psize) # initial trait values;
fitnessfunc(R, popsize; K=k) = max((1 + R*(1 - popsize/K)), 0)

@profile res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)
@time res = evolve(init_traitv, fitnessfunc, mutrate, stdev, ngens)

\ $ \ begingroup \ $
评论不用于扩展讨论；此对话已移至聊天。
\ $ \ endgroup \ $
– Mathieu Guindon♦
16年7月21日在15:14

\ $ \ begingroup \ $
这是最后一个时间注释的注释，因为注释应该在此处发布。 @TomWenseleers：只需重新设置时间，Julia和R的确在完全相同的范围内，除了图形部分！也很高兴听到即使使用我的原始代码（使用const ...固定，这很容易），新版本也将能够获得这种性能！当可以调用R时，Julia也可以成为Rcpp的绝佳替代品-我得试试看，stackoverflow.com / questions / 9965747 / linking-r-and-julia
\ $ \ endgroup \ $
–克里斯·拉卡卡斯（Chris Rackauckas）
16年7月21日在15:20

#2 楼

在您的MATLAB代码中，可以跳过一些不必要的步骤。这是对doStep函数的修改，可以大大提高性能。每个修饰符都在代码中的注释中进行了说明。

doStep2.m函数文件：

Windows 10，MATLAB 2015a）以及ngens = 100000和psize = 1000的计时分别为：

doStep：81.004秒
doStep2：54.679秒

与R相比，效果仍然不理想，但更好。

编辑

这里有更详细的说明。在新版本doStep2中，行mutants = rand(1,n) < mutrate基本上是binornd函数在内部执行的操作（键入edit binornd进行检查）。但是，binornd抛出此向量，仅返回其和。在这里，我们保持此逻辑向量具有每个突变体的位置。然后，这可以避免使用datasample函数，该函数不必要地生成其他随机数来选择突变体的位置。总而言之，突变体的位置仅计算一次，而不是两次。

\ $ \ begingroup \ $
您能解释一下为什么您的函数版本更快吗？那会改善你的答案。
\ $ \ endgroup \ $
–pacmaninbw
16年8月23日在14:27

\ $ \ begingroup \ $
当然，@ pacmaninbw。我编辑了响应以添加更详细的说明。
\ $ \ endgroup \ $
–米格尔·杜斯·桑托斯（Miguel dos Santos）
16年8月23日14:50

\ $ \ begingroup \ $
非常好的答案！来自我的^ =）
\ $ \ endgroup \ $
– Steewie Griffin
2016年9月14日下午5:38

\ $ \ begingroup \ $
另外，请看看我的答案。我发现您使用了repelem并拥有R2015a。我在答案中包含的代码段应该可以提高性能，但是我不太确定。这里的向量大小不一定太多，但是值得一试。 :)
\ $ \ endgroup \ $
– Steewie Griffin
2016年9月14日下午5:48

#3 楼

关于rpois函数，我认为从均值λ的Poisson分布中模拟单个值的首选方法是
通过均值向量out的Poisson模拟值，可以使用Int的变异版本。在v0.4中，最好编写一个辅助函数

using Distributions
rand(Poisson(λ))

，然后将其称为p。在v0.5中，您可以轻松地在地图map中使用匿名函数。

\ $ \ begingroup \ $
我们使用StatsFuns.RFunctions.poisrand（p [i]）和rand（Poisson（p [i]））都测试了rpois。第一个最终速度更快，因为第二个实际上是此处定义的第一个语法糖。
\ $ \ endgroup \ $
–克里斯·拉卡卡斯（Chris Rackauckas）
16年7月22日在18:01

\ $ \ begingroup \ $
但是，并非总是这样定义的。目的是用本机的Julia代码替换对Rmath库中C函数的调用。
\ $ \ endgroup \ $
–道格拉斯·贝茨（Douglas Bates）
16年7月22日在19:20

\ $ \ begingroup \ $
是的，我将根据此处给出的答案尽快对其进行更新。可能会回到rand（Poisson（lambda）），但是方法会有所不同。
\ $ \ endgroup \ $
–克里斯·拉卡卡斯（Chris Rackauckas）
16年7月22日在20:54

#4 楼

首先，有关Julia实施的实用评论。该行：

numberoffspring = [rand(Poisson(theta)) for theta in fitnessfunc(tv, n, k)]

占用大量时间，因为fitnessfunc是用户定义的函数，在编译过程中对此知之甚少。这可以通过在结果中添加类型断言来解决：

numberoffspring = 
  [rand(Poisson(theta)) for theta in fitnessfunc(tv, n, k)::Vector{Float64}]

Chris Rackauckas提供了许多细节，尤其是带有R的rep函数实现的Package。最终，该有用功能将以自然的Julia形式出现。同时，将其添加到源代码中可能会更简单：

function repeach{T}(v::Vector{T},each::Vector{Int})
  length(v)==length(each) || throw(ArgumentError("values and each vectors must have same size"))
  n = length(each)
  s = sum(each)
  r = Array(T,s)
  iv = 1
  for i = 1:n
    @inbounds val = v[i]
    @inbounds for j=1:each[i]
      @inbounds r[iv] = val
      iv += 1
    end
  end
  return r
end

这有点难看，但是正如Chris所示，这通常是隐藏的。此功能的单行代码版本为

repeach2(v,each) = vcat([[v[i] for j=1:each[i]] for i=1:length(v)]...)

，但是效率较低。使用repeach的代码中的相关行可以是：

newtv = repeach(tv, numberoffspring) # offspring are copies of their parents

（顺便说一句，本文中写的这一行在使用Chris'Package时对我有错字）。

要了解运行时间的变化，发布者应运行修改后的版本。

最后，这些类型的跨语言基准测试问题含糊不清。最初的实现通常有利于作家的最初专业知识。优化通常会产生很大的不同，但是尚不清楚对于典型的语言用户而言，什么是自然的。也许语言基准测试应该考虑新手用户的性能？当专家编写问题代码时，很多时候语言都收敛于单个语言不可知的瓶颈（例如本例中的Poisson RNG），并且代码对于新手版本而言过于复杂。总而言之，这些比较是有问题的，但仍然有用且具有教育意义。

P.S.考虑到对这个问题的回答以及朱莉娅在其中的突出地位，如果您要根据社区的热情和未来对语言进行评分，朱莉娅会获得满分。

\ $ \ begingroup \ $
非常感谢您-可以尝试一下，并及时了解您的情况。在这种情况下，我的确很在意每种语言在初学者手中的表现（这是一门针对生物学学生的课程），但是使用每种语言都将使用的典型范例（例如，尽可能使用矢量化代码等）。我对Julia的使用也非常热情，但感觉可能还需要一两年的时间才能真正部署到教室中。
\ $ \ endgroup \ $
–汤姆·温瑟勒斯
16年7月21日在20:35

\ $ \ begingroup \ $
@DanGetz“（顺便说一句，帖子中写的这行使用Chris'Package对我有错字）”您是说回购中还有错字吗？如果您运行Pkg.update（），仍然会出现错误？我还没有进行任何单元测试，因此希望回购代码可以在其他系统上正常工作，就像在我的测试系统上一样！
\ $ \ endgroup \ $
–克里斯·拉卡卡斯（Chris Rackauckas）
16 Jul 22'0:36

\ $ \ begingroup \ $
“帖子中有错字”。该行是R.rep（tv，each = numberoffspring），尽管它是回购中的可选位置参数，但它显式命名了每个参数。此后，回购可能已更改。 BTW使用0.5。
\ $ \ endgroup \ $
–丹·盖茨（Dan Getz）
16年7月22日在14:20

\ $ \ begingroup \ $
是的，仓库已更改。在关键字参数上调度很奇怪，我现在不想处理。
\ $ \ endgroup \ $
–克里斯·拉卡卡斯（Chris Rackauckas）
16年7月22日在20:55

#5 楼

MATLAB

我没有统计工具箱，因此很遗憾，我无法进行任何基准测试来测试这一点，但是我相信以下几点将提高实现的性能。但是，它仍然比R慢得多。的时间。

要改善normrnd的性能，您可能会受益于在SO上执行此答案。

要改善poissrnd的性能，您可能会受益可以从头开始创建自定义函数，也可以在每次迭代中使用很少元素的循环内调用函数（是的，违反直觉的）。我链接到的函数很旧，因此Mathworks可能现在已经改进了自己的datasample函数，但是值得一试。我已经看到了用于提高性能的循环方法，但现在找不到任何参考。

我不知道您正在运行哪个版本的MATLAB。如果是R2015a，则normrnd并不是为大量元素创建poissrnd -vector的最快方法。如果数据大小大于10000个元素，则以下函数每秒可处理两倍的元素。

function out = rle_cumsum_diff(vals,runlens)
clens = cumsum(runlens);
idx(clens(end))=0;
idx([1 clens(1:end-1)+1]) = diff([0 vals]);
out = cumsum(idx);
return;

函数可能可以通过以下方式改进：将内存预先分配给poissrnd -cell阵列（使用NaN）。由于您在repelem -function调用之前不知道元素的数量，因此您必须超调一点，并在函数调用之后删除NaN元素。

这绝对不是代码中的瓶颈，但仅供参考：newtv比evolve快。

在Matlab中计时功能时使用out，比doStep更准确。

\ $ \ begingroup \ $
非常感谢您提供的好建议-我迟早会尝试的！
\ $ \ endgroup \ $
–汤姆·温瑟勒斯
16年7月26日在11:46

编程黑洞网