分享

How Search Works (Google)

 生财大掌柜 2020-09-20

Search is Google's core technology. Well, how does it work? In this episode, Matt Cutts, an engineer at Google, will tell us that Google creates an index of the web pages it can find and it returns the most relevant results by evaluating more than 200 quality factors. Google's search results are impartial, they're clearly separated from ads and they're returned in less than half a second.

Hi. My name is Matt Cutts. I’m an engineer in the quality group of Google, and I’d like to talk today about what happens when you do a web search. The first thing to understand is that when you do a Google search, you aren’t actually searching the web, you’re searching Google’s index of the web, or at least as much of it as we can find. We do this with software programs called spiders. Spiders start by fetching a few webpages, then they follow the links on those pages and fetch the pages they point to and follow all the links on those pages and fetch the pages they link to and so on until we index a pretty big chunk of the web. Many billions of pages stored across thousands of machines.

Now, suppose I want to know how fast a cheetah can run. I type in my search, say "cheetah running speed" and hit return. Our software searches our index to find every page that includes those search terms. In this case, there are hundreds of thousands of possible results. How does Google decide which few documents I really want? By asking questions, more than two hundred of them, like: How many times does this page contain your keywords? Do the words appear in the title, in the URL, directly adjacent? Does the page include synonyms for those words? Is this page from a quality website or is a low quality, even spamming? What is this page’s page rank? That’s a formula invented by our founders Larry Page and Sergey Brin that rates a webpage’s importance by looking at how many outside links point to it, and how important those links are. Finally we combine all those factors together to produce each page’s overall score, and send you back your search results about half a second after you submit your search. At Google, we take our commitment to delivering useful and impartial search results very seriously. We don’t ever accept payment to add a site to our index, update more of them, or improve its ranking.

Let’s take a look at my search results.

Each entry includes a title, a URL and a snippet of the text to help me decide whether this page is what I’m looking for. I also see links to similar pages, Google’s most recent stored version of that page, and related searches that I might want to try next. And sometimes along the right and at the top, I’ll see ads. We take our advertising business very seriously as well.

Both our commitment to deliver the best possible audience for advertisers, and to strive to only show ads that you really want to see. We’re very careful to distinguish your ads from regular search results. And we won't show you any ads at all if we can’t find any that we think will help you find information you're looking for, which in this case, the cheetah's top running speed is more than sixty miles an hour.

Thanks for watching. I hope this made Google a little bit more understandable.

翻译稿件 

搜索是谷歌的核心技术。那么,它是如何工作的?谷歌工程师Matt Cutts将在这一集告诉我们。谷歌使用200多种指标标从数百万网页和内容中确定与该查询相关程度最高的答案,将搜索结果在半秒内按相关程度排序并显示在网页上。谷歌的搜索结果是客观的,它仔细地将广告类的网页排除了。

大家好,我的名字是Matt Cutts。我是谷歌质量组的一名工程师,我今天想和你们聊聊使用搜索引擎时发生了什么。

首先你需要明白,当你使用Google搜索引擎时,你并没有搜索整个万维网,你只是在网上搜索谷歌的索引。谷歌早在您早在您向它输入搜索之前,搜索查询的过程就已经开始了。 我们使用软件机器人(也就是网页抓取工具或“蜘蛛”程序)找到网页,并稍后将其囊括到 Google 搜索结果中。 Google 的软件将这些网页的数据存储在数据中心里。 网络就像是一本厚达数万亿页的书,我们的工作就是为这本书编写索引。蜘蛛程序的起点通常是那些访问量很大的服务器和热门网页,检索网页上的词语并追踪在该网站上找到的每个链接。这样,蜘蛛程序迅速开始了旅行,爬遍网上绝大多数经常访问的网站。如此反复,直到谷歌将网络的大部分数据加入索引。

假如现在我想知道cheetah的运行速度,我在搜索栏里输入:“cheetah running speed”后回车。我们的软件会把GOOGLE索引里包含这句话的网页都找到。这种情况下,会有成千上万个结果。谷歌会如何选择我真正想要的那几份文件呢。答案是通过超过200个问题的提问。比如,页面里包含你的关键词有几个?它们出现在标题,还是URL?还是直接相邻?页面是否包含同义词?这个网页来自高质量网站还是低质量的网站,甚至可能是垃圾邮件?这个页面的等级如何?我们公司的创立者LarryPage和SergeyBrin发明了一个公式:根据外链有多少指向它来进行网页页面重要性的排名,以及这些链接的重要性来评估网页的重要性。最后,我们把所有这些因素结合起来,打出每个页面的整体分数,并把它作为搜索结果返回给你。这个过程在你提交搜索按钮大约半秒后。在谷歌,我们非常认真地承诺:为客户提供有用和客观的搜索结果。我们永远不会接受付费将网站添加到谷歌的索引,或者在谷歌的索引里更新内容、提高其排名。

让我们来看看我的搜索结果。

每个条目包含的标题,URL和文字片段,会帮助我决定这个页面是否是我要找的。我也可以看到类似网页的链接,谷歌最新存储的网页内容,以及我下一步可以会搜索的内容。有时沿着右侧以及在顶部,我也会看到广告。因为我们同样非常重视我们的广告业务,

我们有两个承诺,对最有需求的用户提供广告;以及力图只为用户呈献真正需求的广告。对于你们的广告和搜索结果我们会谨慎区别。如果 在cheetah的速度超过六十英里每小时,我们还不能帮助你找到你要找的内容,我们不会对用户出任何广告。

感谢收看,希望我的演说能让您对谷歌多一些理解。

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多