AltaVista
search

Onsite Business
Altavista Software
Partner Pavilion
Visionary Club
The Store
Exposition Hall
Onsite Security
Firewall Center
Tunnel Center
Onsite People
Forum Center
Mail Center
Onsite Information
Search Center
Search eXtensions 97
Family

Search your Intranet
Search your Workgroup
Search your PC
What can I do with them?
Find your Solution
Try it FREE
Buy it NOW
Tech Showcase
Business eXtensions
Network Affiliates
Directory Center























AltaVista Public Search Service
- A White Paper

Discovering the Value and Promise of the Internet

Traversing the Internet has always been a bit like exploring outer space. One could wander indiscriminately and make many useful discoveries. Just as easily, though, hours (thankfully, not years) could pass with nothing of value to show for the effort.

Like its figurative cousin, the cyberspace of the Internet is vast; but unlike the world of Voyager and Magellan, the Internet is finite. And while the Internet remains essentially unstructured, it is possible --- with enough sophistication and power --- to catalogue the connected realm. To index every word on every page of every available web site. To bring order and meaning to an otherwise unwieldy behemoth.

Digital has proved it can be done --- and has done it! AltaVista Search Public Service --- launched on December 15, 1995 --- radically altered the way we view and use the Internet. On the surface, one sees a simple searching interface, not unlike other tools available through a standard web browser. Behind the scenes, however, the world's most sophisticated indexing and "super-spider " software and most powerful computer systems have compiled (and continue to update) the most complete index to date of the entire Internet. For the first time, it is possible to find and retrieve useful information from across the vast expanse of the Internet in seconds.

AltaVista Search Public Service has changed how we use the Internet. It is no longer necessary to know the address of a particular home page, only to begin following the trail of hyperlinks to your eventual goal. AltaVista Search Service takes you to precisely where you want to be from the start --- pointing you to relevant web pages regardless of where they reside on a particular site. You can then follow the links from there as desired. The painstaking task of classifying web pages into logical groups is a thing of the past. Today, AltaVista Search Public Service puts the contents of the Internet at your fingertips, transforming this information into a bona fide business, education, and entertainment resource.

A higher view of the Internet

What exactly is AltaVista Search Public Service? It is a whole new class of Internet technology, developed in the research laboratories of Digital Equipment Corporation. To better understand AltaVista Search Public Service, let's first look at it from the user's perspective.

Finding useful information on the Internet in seconds

AltaVista Search Public Service, from a user's point of view, is a query system for finding useful information on the Internet. Accessible through the World Wide Web from any standard web browser, AltaVista Search Public Service provides a simple interface for entering a few words of a query which specify your topic of interest. It then produces a prioritized list of all the web pages that mention the content of your inquiry. What's more, each reference in the list is hyperlinked to the actual web page, so you simply click and you're there. Bear in mind, the entire process takes only seconds. Want to know the annual rainfall in Nepal? How about the latest earnings from XYZ Corporation? Care to hear your favorite musician talk about her latest release? Or watch a clip from her video? Looking for the latest breakthrough on osteomalacia? Need data on the effects of carbon monoxide on evergreens? Want to find an old friend? Or catch up on the news from down under?

If it's on the Internet, you can find it in seconds using AltaVista Search Service. Moreover, if something new appears on the Internet --- AltaVista Search Service will know about it. That's because AltaVista Search Public Service maintains the most comprehensive database to date of web pages and their content on the Internet --- and this database is growing constantly.

At last count, the AltaVista Search Public Service database contained 17 billion words indexed from over 32 million web pages. This is a database of practically everything on the Internet (some pages are excluded, as discussed below). And it's accessible in an instant.

How does AltaVista Search Public Service do that?

The technology behind AltaVista Search Public Service is revolutionary. It's a combination of super-sophisticated software and super-fast computers.

Collecting data and making it useful.
The software consists of the query tool described above, a "super-spider" or data collector and an indexer. The AltaVista Search Public Service data collector, dubbed Scooter by Digital, is the fastest known "super-spider" in existence. Scooter can look at over 3 million web pages per day and brings back the contents of those pages for indexing.

Scooter is known as a "polite" spider; that is, it obeys the rules of the Standard for Robot Exclusion (SRE). This means that Scooter checks a special file at each web site before visiting any of its pages. This file may contain a listing of certain pages that the site's webmaster does not want traversed by a spider. If so, Scooter will not fetch any pages on the list.

Scooter is polite in another respect, too. It is simultaneously accessing and fetching thousands of web pages at a time. Yet, it imposes minimal load on a web server to avoid inconveniencing the site in any way. To accomplish this, Scooter waits after performing a fetch before it retrieves another page from the same site. By invoking a delay that is a function of the duration of the fetch, Scooter accesses slower systems much less frequently than fast ones. In fact, Scooter never uses more than 1% of the resources of a given system while it is retrieving pages.

Now, what do we do with all this content? Enter the Digital's indexing software. This software can index an astounding 1 gigabyte of text per hour, producing links to every word on every web page brought back by Scooter. This is the key software that allows you to enter a few words in the query interface and instantly retrieve a listing of relevant web pages.

One of the most important features of Digital's indexing software is its ranking system. This is an important feature of a system that can easily locate thousands of documents which match a query. The ranking system scores each document that is located according to the words specified in the query. Documents having a higher score are presented first, and those having a lower score last. This allows the user to immediately focus on documents which are more likely to be of interest.

Providing results in an instant
The AltaVista Search Public Service software is optimized for Digital's 64-bit Alpha technology, which enables the entire system - query interface, web crawler, and the indexing software - to perform at unbelievable speeds.

To process inquiries from the AltaVista Search Public Service, we currently use a trio of AlphaStation 500/333, each with 256 MB of RAM and 4 GB of hard disk. Running on the AlphaStation systems is a custom multi-threaded web server, which sends queries to the indexing software. With just these relatively small systems we easily handle millions of hits per day to the AltaVista Search Public Service. The search queries are forwarded to the index servers - - about 90% search the web and 10% for newsgroups

The cornerstone of AltaVista's performance are six AlphaServer 8400 5/300 systems, each with 10 processors, 6 GB of RAM, and 210 GB hard disk in a RAID array. Each holds a complete copy of the web index (currently 55 GB in size) and is able to provide response times of less than a second.

Scooter runs on a DEC4100 AlphaStation with 1GB RAM and a 48 GB RAID array, to ensure data integrity. The sole job of this computer is crawling the web, fetching content and sending it to the indexing system. The index is built on an AlphaServer 4100, with 2 processors and 1 GB of memory. The completed index is periodically copied to the index servers.

Our news server runs on an AlphaStation 600/333 system, with 512 MB RAM and 24 GB of hard disk with RAID 5. This server maintains a current news spool for the News Indexer and serves the articles via http to those users who simply want to read news using the ease of their standard web browser. The News Indexer runs on an AlphaStation 600/333 system, with 512 MB of RAM and 40 GB of hard disk. This machine keeps an up-to-date index of the news spool, handling the constant turnover of thousands of news articles to ensure the most current information is presented when you make a query.

Driving in the fast lane
In addition to innovative software and blazing computers, AltaVista Search Service uses high-performance network technology to handle the traffic for such a busy site. AltaVista is directly connected to Digital's Internet Gateway, the largest Internet gateway of its kind. The gateway has multiple connections to the Internet, with a total of over 100 megabits per second of bandwidth. The entire gateway complex is built from Digital's network technology including; Digital's Alpha computers -- used as firewall routers, Digital's DEChub 900 modular hub -- including the high-density DECconcentrator 900FH FDDI concentrator, and a DECswitch 900EF FDDI-to-Ethernet switch.

A brief history

Where did AltaVista Search Public Service come from? AltaVista Search Service emerged from Digital's Palo Alto, California research laboratories in the spring of 1995. Defying common wisdom, researchers at the Palo Alto laboratories wanted an answer to the question, "how big is the web?" and set out to index the entire web --- something long considered unattainable.

The right place at the right time

It was a research environment that fostered the growth of an innovative idea. And it was the talent and perseverance of one research facility in particular that transformed that idea into a practical reality. Typical of all of our research facilities, Digital's Palo Alto lab is filled with forward-looking scientists and engineers --- people who feel an urgency to move their projects into the real world as products or technology demonstrations.

The catalyst for AltaVista was an idea from Paul Flaherty, who wanted to develop a search engine based on key word search capabilities for the Internet, to showcase the Very Large Memory Database of the AlphaServer. Keying off this idea, Louis Monier took it a step further and envisioned a full text search of the Web. He teamed up with other researchers at the labs to begin to explore this concept.

Louis Monier developed the super-spider, Scooter, "from scratch" with the sole intention of making AltaVista Search Public Service a reality. Leading the project with a team of experienced Internet experts inside Digital's research labs, Louis not only produced Scooter in record time but also developed the web front-end for Scooter and the indexing software, that millions of people now know as AltaVista.

Digital's indexing software started out humbly as a way to organize e-mail, and then, because it worked so well, evolved to an indexing system for the newgroups. These initial projects made the scientists realize, against conventional wisdom, that indexing the entire Web might well be feasible. The redesign of the indexing software was completed about the time that the AltaVista Search Public Service project was taking off, and the huge set of data that Scooter was returning with from the Web proved to be the perfect test of the new software.

Glenn Trewitt and Stephen Stuart then entered the picture, designing the networks and putting together the hardware needed to run AltaVista Search Service. With their experience in network design and topology mapping, they set out to ensure that AltaVista would have the hardware and network connectivity needed for a successful debut. Stephen administers Digital's showcase Internet gateway in Palo Alto (another story in itself). Before the AltaVista launch, the connectivity was somewhat over 10 million bits per second - probably not enough to handle the load when AltaVista's real customers showed up on December 15, 1995. With just a few months warning, Stephen, Glenn, and others worked to upgrade Digital's connection to the Internet to over 135 million bits per second by the day of the launch.

All the resources needed to create the world's fastest and most complete web crawler were there in one place. In fact, there is probably nowhere else in the world that could have supported such an effort. Through a combination of ideal research conditions, talent, need, and excellent timing, AltaVista Search Service had arrived.

A Long Future

AltaVista Search Public Service grew out of pure research, but it was no accident. It was born of an environment that values creativity and rewards practicality.

AltaVista Search Public Service, today, is without doubt a showcase for the technological leadership of Digital software and computers. And it will remain so. But it also holds fundamental value as a tool for using the Internet in our every-day business and personal lives. Its sudden and remarkable popularity --- attracting over eight million hits per day in its less than five months of operation ---without any promotion --- is a testament to that fact.

AltaVista Search Public Service marks a turning point in the way we view and use the Internet. It makes it possible for anyone to gain value from resources on the Internet, without wasting hour after hour in the process. It also has implications for how web sites are structured. For instance, the home page, which has been the traditional point of entry to a site --- defining it, setting the tone, and providing links to further detail --- may never be seen by the typical visitor. Through AltaVista Search Public Service, web travelers could land on a page anywhere in a particular site, based on their specific interest. With AltaVista Search Public Service, in fact, the entire web is treated as one huge site --- the only home page left is AltaVista Search Public Service itself.

More of a good thing

Where do we go from here? With the tremendous growth in popularity of the single AltaVista Search Public Service in Palo Alto, California, the natural course of action is to deliver more of a good thing. We will start by establishing mirror sites with our partners around the world.

With the development of mirror sites, the AltaVista Search Public Service will be available to more Internet users than ever before, all around the world --- truly establishing it as the leading Internet search technology. Particularly for users outside of the U.S., response times will increase significantly. And with geographic distribution of mirror sites, we will have partners who can localize the pages and include regional content.

AltaVista Search Public at work

If we can index the entire World Wide Web, and help the average Internet surfer find practically anything in a matter of seconds, just think what we could do for today's corporations, universities, and government agencies. With all that Internet technologies have brought to the public arena, many enterprises are recognizing the value of adopting the same capabilities inside their operations --- on intranets --- private intranets within the boundaries of a corporate enterprise. Many have set up TCP/IP networks (the same technology as the Internet) and are installing desktops with standard web browsers. As the number of corporate intranet web pages grows ever more rapidly, anyone on an Intranet --- as on the Internet --- businesses will find that spending time just in pursuit of information is not as productive as using that information.

AltaVista Search Private eXtensions
Our award winning AltaVista technology will be extended to private business environments, adapted to give business users unparalleled access to information and enable corporate networks to be as responsive and easy-to-use as the Internet at its best. Available in Intranet, Workgroup, and My Computer private extensions to our public service, AltaVista Search Private eXtensions will incorporate the fastest, most sophisticated software for locating various types of useful information anywhere on the Internet, your corporate intranet, your personal desktop, or any file server your desktop is connected to -- in seconds. With AltaVista Search Private eXtensions, distributed enterprises, workgroups, and individuals will be able to all unlock hidden information assets to maximize their value contribution and productivity.

No longer will a road warrior be left stranded without the latest corporate product data or pricing. With AltaVista Search -- Intranet Private eXtension, if the information you are looking for is somewhere on a server in your enterprise, a simple query will point instantly to the needed information. Project teams from around the world will be able to capitalize on research and previous work by quickly locating published reports or previously assembled data from any available server on an enterprise. Everyone throughout the corporation can get the information they need to work more productively and with greater effectiveness. So more time is spent adding value, rather than simply looking for information.

Getting the information you need from your workgroup, when you need it is critical to be effective in your job. With AltaVista Search -- Workgroup Private eXtension, if it's on your group's intranet, it will be at your fingertips in seconds. Need your team's marketing plan from last year or from ten years back? Looking for an RFP that references critical competitive data? Want to check the engineering report on your newest product? AltaVista Search -- Workgroup Private eXtension ensures you will have complete team information available to you at your desktop --- limiting duplicate efforts, eliminating costly manual searches, and maximizing accuracy and productivity.

Imagine being able to instantly find a choice piece of data, buried in an e-mail message from three years ago. Or finding a file you thought was lost --- accidentally saved deep in an obscure directory. That's the power of AltaVista Search -- My Computer Private eXtension. This valuable desktop tool will help you preserve all your work, without worrying about finding it down the road. No need to waste hours looking for something you know you kept, but can't remember where. AltaVista Search My Computer Private eXtension saves time and frustration, and helps you get more value out of your work and resources. It's an invaluable personal productivity tool.

Wherever your business takes you, start your journey with AltaVista.

The AltaVista Search Public Service has spawned a new vision of the Internet, providing a higher view from which to locate and access valuable resources. And it has inspired broader use of Internet technologies across the enterprise and around the world.

Digital is building on the success of the AltaVista Search Public Service, with an entire family of software products, AltaVista Software. Using Internet technology as a common, ubiquitous environment, AltaVista Software provides users with dynamic, global capabilities for exchanging information and ideas. AltaVista Software combines the vast resources of the Internet, the navigational ease of standard web browsers, and the value of existing intranet assets.

The AltaVista Search Public Service is truly revolutionary in scope, breaking through traditional barriers that limit communication and information access. It is both a vision and a reality, offering immediate rewards as it inspires new, innovative technology. It is a key that unlocks all the Internet has to offer.


HOME
HOME
SEARCH
SEARCH
ABOUT
ABOUT
PARTNERS
PARTNERS
BUY
BUY
HELP
HELP

Digital Equipment Corporation
Copyright © Legal
AltaVista Internet Software, 30 Porter Road,
Littleton, MA Fax: (978) 506-2017