搜寻我的网站并帮助我找到无效链接和未链接文件的好工具[关闭]

我有一个很大的旧站点，实际上有成千上万个PDF，这些PDF有时在数据库中占了比重，但通常只是页面上的链接，并存储在该站点的大多数目录中。

我写了一个php crawler来跟踪我网站上的所有链接，然后将其与目录结构的转储进行比较，但是有什么更简单的方法吗？

另请参阅webmasters.stackexchange.com/questions/13310/…，后者也询问拼写检查。

#1 楼

我用过Xenu的Link Sleuth。它运行良好，只是请确保不要自己使用DOS！

选中选项中的“孤立文件”选项，它将提示您通过ftp登录到您的站点。

–茎
2010年7月9日在10:27

这样是否也可以处理您必须登录才能访问该页面的网站？

– Dony V.
2010年7月18日在20:49

@Jim如何查看断开链接所在的页面？

–Rob
2012年7月24日12:58

抓取完成后，将有一个报告将告诉您相关信息。根据站点的大小和断开的链接的数量，可能很难解析。我通常会在抓取之前调整报告选项，然后将报告作为文本文件打开（因为它是.htm），并删除与使报告更易于管理无关的内容。希望这可以帮助。

– plntxt
2012年7月24日13:54

#2 楼

如果使用Windows 7，最好的工具是IIS7的SEO Toolkit 1.0。它是免费的，您可以免费下载。

该工具将扫描任何站点，并告诉您所有无效链接在哪里，需要长时间加载的页面，缺少标题的页面，重复的标题，与关键字和描述相同的标题，以及哪些页面破坏了HTML。

我个人使用了它，它非常适合扫描，当然也适合SEO，但是Xenu中的FTP检查只是解决了这一问题。

–克里斯特先生
2010年7月10日在16:10

#3 楼

尝试使用W3C的开源工具Link Checker。您可以在线使用它或在本地安装

#4 楼

如果您有unix命令行（我在linux，MacOS和FreeBSD上使用过），我非常喜欢linklint来检查大型静态站点。有关安装说明，请参见其网站。安装后，我将创建一个名为check.ll的文件并执行以下操作：

linklint @check.ll

这是我的check.ll文件的样子。

# linklint
-doc .
-delay 0
-http
-htmlonly
-limit 4000
-net
-host www.example.com
-timeout 10

进行www.example.com的爬网，并生成带有交叉引用报告的HTML文件，以查找损坏，丢失等情况。

#5 楼

Microsys有多种产品，特别是它们的A1 Sitemap Generator和A1网站分析器，它们可以抓取您的网站并报告您可能会想到的所有内容。

其中包括断开的链接，还包括表格的表格视图所有页面，因此您可以比较相同的和元描述标签，nofollow链接，网页上的meta noindex以及很多疾病，这些疾病只需要敏锐的眼睛和快速的修复即可。<br /><br /></div><div class='answer'><h3 style='font-size: 16px;background: #434a54;color: #fff;padding: 10px;margin: 10px 0;'> #6 楼</h3>Link Examiner也是一个非常好的免费软件，可以满足您的需求。<br /><br /></div> </div> <div class="post-footer"><b>本文标签：</b> <a href="http://129.226.226.195/tags/web-crawlers/" target="_blank"> web-crawlers </a> </div> </div> <div class="box boxmt nearbypost"> <div class="alignleft"><a href="http://129.226.226.195/post/21134.html" >POE与千兆以太网兼容吗？</a></div> <div class="alignright"><a href="http://129.226.226.195/post/21136.html">国际定位“没有退货标签”</a></div> </div> </div> <div class="aside"> <div class="box widget" id="divTags"> <div class="title">标签列表</div><ul><li><a href="http://129.226.226.195/tags/java/">java<span class="tag-count"> (11)</span></a></li> <li><a href="http://129.226.226.195/tags/r/">r<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/r-faq/">r-faq<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/javascript/">javascript<span class="tag-count"> (17)</span></a></li> <li><a href="http://129.226.226.195/tags/jquery/">jquery<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/asynchronous/">asynchronous<span class="tag-count"> (2)</span></a></li> <li><a href="http://129.226.226.195/tags/php/">php<span class="tag-count"> (17)</span></a></li> <li><a href="http://129.226.226.195/tags/mysql/">mysql<span class="tag-count"> (7)</span></a></li> <li><a href="http://129.226.226.195/tags/sql/">sql<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/html/">html<span class="tag-count"> (2)</span></a></li> <li><a href="http://129.226.226.195/tags/regex/">regex<span class="tag-count"> (2)</span></a></li> <li><a href="http://129.226.226.195/tags/arrays/">arrays<span class="tag-count"> (2)</span></a></li> <li><a href="http://129.226.226.195/tags/variables/">variables<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/warnings/">warnings<span class="tag-count"> (2)</span></a></li> <li><a href="http://129.226.226.195/tags/language-agnostic/">language-agnostic<span class="tag-count"> (2)</span></a></li> <li><a href="http://129.226.226.195/tags/c%2B%2B/">c++<span class="tag-count"> (9)</span></a></li> <li><a href="http://129.226.226.195/tags/c%2B%2B-faq/">c++-faq<span class="tag-count"> (8)</span></a></li> <li><a href="http://129.226.226.195/tags/parsing/">parsing<span class="tag-count"> (2)</span></a></li> <li><a href="http://129.226.226.195/tags/debugging/">debugging<span class="tag-count"> (5)</span></a></li> <li><a href="http://129.226.226.195/tags/c/">c<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/error-handling/">error-handling<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/python/">python<span class="tag-count"> (10)</span></a></li> <li><a href="http://129.226.226.195/tags/pandas/">pandas<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/android/">android<span class="tag-count"> (3)</span></a></li> <li><a href="http://129.226.226.195/tags/list/">list<span class="tag-count"> (3)</span></a></li> </ul> </div><div class="box widget" id="divPrevious"> <div class="title">最近发表</div><ul><li><a href="http://129.226.226.195/post/18326.html">IP地址错误的错误掩码</a></li> <li><a href="http://129.226.226.195/post/18325.html">在Cisco IOS中自动进行配置备份（每分钟）</a></li> <li><a href="http://129.226.226.195/post/18324.html">VRRP和HSRP有什么区别？</a></li> <li><a href="http://129.226.226.195/post/18323.html">IP地址如何映射到MAC地址？</a></li> <li><a href="http://129.226.226.195/post/18322.html">网站可以识别我的MAC地址吗？</a></li> <li><a href="http://129.226.226.195/post/18321.html">在STP中如何选择根桥？</a></li> <li><a href="http://129.226.226.195/post/18320.html">为什么要使用三根以太网电缆将交换机连接到路由器？</a></li> <li><a href="http://129.226.226.195/post/18319.html">为什么10.1.255.255是无效的广播地址？</a></li> <li><a href="http://129.226.226.195/post/18318.html">为什么将IP地址分配给每个接口而不是设备？这将意味着什么？</a></li> <li><a href="http://129.226.226.195/post/18317.html">为什么Visual Studio 2013不愿意运行我的Web性能/负载测试？</a></li> <li><a href="http://129.226.226.195/post/18316.html">对测试代码了解太多会不利吗？</a></li> <li><a href="http://129.226.226.195/post/18315.html">如何隔离错误？</a></li> <li><a href="http://129.226.226.195/post/18314.html">如何使用Selenium和WebDriver清除localStorage</a></li> <li><a href="http://129.226.226.195/post/18313.html">评估测试项目</a></li> <li><a href="http://129.226.226.195/post/18312.html">我如何说服管理层我们需要一个正式的质量保证部门？</a></li> <li><a href="http://129.226.226.195/post/18311.html">FluentWait与WebDriverWait有何不同？</a></li> <li><a href="http://129.226.226.195/post/18310.html">简历和求职建议-从开发到测试的职业转变</a></li> <li><a href="http://129.226.226.195/post/18309.html">您如何等待Selenium 2中的jQuery Ajax调用完成</a></li> <li><a href="http://129.226.226.195/post/18308.html">在持续开发下测试应用程序</a></li> <li><a href="http://129.226.226.195/post/18307.html">Selenium的页面加载默认超时是多少？</a></li> <li><a href="http://129.226.226.195/post/18306.html">IT项目中软件测试的真正商业价值是什么？</a></li> <li><a href="http://129.226.226.195/post/18305.html">系统测试与系统集成测试（SIT）有何不同？</a></li> <li><a href="http://129.226.226.195/post/18304.html">如何找到我们的“质量保证流程”的弱点？</a></li> <li><a href="http://129.226.226.195/post/18303.html">测试人员应如何处理生产中发现的错误？</a></li> <li><a href="http://129.226.226.195/post/18302.html">如果我不使用TDD但想过渡到敏捷，那我应该回去创建那些单元测试吗？</a></li> <li><a href="http://129.226.226.195/post/18301.html">代码覆盖率和测试覆盖率有什么区别？</a></li> <li><a href="http://129.226.226.195/post/18300.html">当团队想要忽略关键但难以重现的错误时，我应该如何应对</a></li> <li><a href="http://129.226.226.195/post/18299.html">测试人员应该修复错误吗？</a></li> <li><a href="http://129.226.226.195/post/18298.html">审核测试自动化代码的良好实践</a></li> <li><a href="http://129.226.226.195/post/18297.html">质量检查人员应该能够编写测试代码吗？</a></li> </ul> </div> <div class="box widget" > <div class="title">随机文章</div> <ul> <li><a href="http://129.226.226.195/post/26794.html">单击评论图标时，我看到“没有可用的评论队列”，没有任何解释</a></li> <li><a href="http://129.226.226.195/post/27505.html">MySQL-varchar长度和性能</a></li> <li><a href="http://129.226.226.195/post/27652.html">更改“发布”按钮上的文本</a></li> <li><a href="http://129.226.226.195/post/27862.html">2048和C中的GUI</a></li> <li><a href="http://129.226.226.195/post/28483.html">Google助理阅读自定义词组</a></li> <li><a href="http://129.226.226.195/post/30130.html">如何在终端中打开当前目录的文件管理器？</a></li> <li><a href="http://129.226.226.195/post/30263.html">如何将SQL Server Unicode / NVARCHAR字符串设置为表情符号或补充字符？</a></li> <li><a href="http://129.226.226.195/post/30397.html">在WordPress中将默认图像尺寸设置为硬裁剪</a></li> <li><a href="http://129.226.226.195/post/30532.html">使用第三方库-始终使用包装器吗？</a></li> <li><a href="http://129.226.226.195/post/31076.html">如何在Kubernetes证书中添加其他IP /主机名？</a></li> </ul> </div> </div> </div> <style> code{ padding: 2px 4px; color: #242729; background-color: #e4e6e8; border-radius: 3px; } pre{ padding: 12px; color: #242729; background-color: #e4e6e8; border-radius: 5px; overflow: auto; max-height: 600px; } pre code{ padding:0; } </style><footer class="footer"> <div class="global-width footer-box"> <div class="copyright" id="copyr"><span>声明：本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。</span> <script type="text/javascript" src="https://s9.cnzz.com/z_stat.php?id=1279522828&web_id=1279522828"></script> </div> </div> <span id="go-to-top"></span> </footer> </body> </html>

编程黑洞网