| Indexing features |
| Easy index building |
Building an index can be as simple as entering in your home page URL. Even more refined indices are effortlessly created through an easy-to-use interface. |
| Real-time updates |
The user or program can submit URLs that are added to the index in real time. This allows a site to have a "newswire" search facility that is getting real-time data. |
| Indexer scheduling |
You can establish a schedule for each collection to specify that the indexer run only at certain times. This feature is useful if you would like to specify that a collection only index during off-hours. |
| Usenet collections |
Create collections that subscribe to Usenet groups. Simply point Autonomy Ultraseek at the Usenet server and indices will automatically be created and updated. |
| Multiple root URLs |
An index can be built by using more than one URL for its root. You can adjust the depth of links to for the spider to index by specifying the number of hops from the root, so you can build an index to be as small or as large as you want. |
| Multiple document types |
Autonomy Ultraseek supports Plain text, HTML, XML, RTF, Microsoft Word, Microsoft Excel, and Microsoft PowerPoint, Adobe Acrobat (PDF), Adobe FrameMaker (.MIF and .FMV formats - NT only), WordPerfect (NT and Solaris only), Lotus 1-2-3, Lotus WordPro, and Lotus Freelance Graphics. |
| Multiple document type parsing |
Autonomy Ultraseek can be modified to recognize non-standard MIME types. |
| Database Module |
Quickly and easily provide your users with a single search box that can query the unstructured content (HTML, XML, plain text, word-processing docs, spreadsheet docs, pdf files, etc.) on your intranet and your database content at the same time! |
| Multiple languages |
Autonomy Ultraseek supports English as a standard. Additional component provides lexical analysis and localization for Dutch, French, German, Italian, and Portuguese, Spanish, Swedish, Danish, Finnish, Norwegian, Traditional and Simplied Chinese, Japanese and Korean, plus other languages. |
| Flexible URL filters |
URL filters give administrators the power to include or exclude content. So when you have a collection that includes any particular URL, you can still filter out given directories, sites, or file types within that same URL. |
| Thesaurus expansion |
Administrators can create custom thesauri, so your organization's special vocabulary can be integrated into search. So for an automobile Web site, a query for both hood and bonnet will find similar results. |
| Smart filters |
An administrator can provide the URL of a page containing links, and have the spider visit all the linked URLs without indexing the initial page through the allow links filter. You can also set the filter to allow only text, to allow everything, or to disallow a particular URL or directory. So, by creating a list of URLs in a database, you can easily have the spider index the database. |
| Number of links |
The indexer automatically adjusts document quality by looking at the number of links to a document from documents on other sites. Because of this, documents that are linked from other sites can appear more relevant in the display of search results than documents that are not linked. |
| Administrative configuration |
All administrative configuration information can be viewed and changed at any time, and from any location. All user and administrative functions are available through web interface. The easy-to-use interface makes server administration quick and efficient. |
| Multi-threaded spider |
A multi-threaded spider means better indexing performance, which means lower load on the system. The administrator can adjust the number of threads allowed, so you get the best performance possible for the amount of bandwidth available. |
| Designed for distributed search |
The search engine was designed from the ground up for distributed search, including collection selection and results merging. |
| Sitelist.txt support |
Autonomy Ultraseek allows for full sitelist.txt support, for efficient spidering of only pages that have changed. By using sitelist.txt, spider traffic is reduced, creating lower network load and better performance. |
| Robots.txt support |
Autonomy Ultraseek allows for full robots.txt support according to the robots.txt standard. Administrators can customize the user agent to meet their needs. |
| Customize revisits |
The administrator can tune the spider to update the index at regular intervals without intervention. Once set, the spider will automatically adjust each URLs revisit frequency based on how often the content changes. Revisits can also be run manually, so you don't have to wait for the indexer. |
| Index controls |
Administrators can control the size of indexes by telling the spider how many hops it can make from a specified page. This is useful for "black-hole" detection, where a page points back to a page in circular fashion. |
| Network-friendly spider |
Only one outstanding HTTP request per server is issued, and it will suspend indexing of a site that is heavily loaded. So, no unnecessary load is placed on the server, and results are still quick and accurate. |
| Spider throttle |
The spider can be told to slow down at certain URL patterns, to reduce unnecessary load on the server. |
| Cooperative spidering |
Multiple spidering instances of Autonomy Ultraseek can share information through cooperative spidering, reducing time spent in administration. |
| Revisit site |
Autonomy Ultraseek can be told to simply revisit a site, instead of revisiting and entire collection. This reduces system load by allowing you to only revisit sites that have content changes. |
| Proxies and Firewalls |
The spider can operate through network proxies and firewalls. Indexed sites can be in another part of the network or even outside the secure network, such as public Web servers. |
| Password authorization |
If desired, the spider can access content protected by passwords, allowing authorized users to search all material on the site. |
| Password administration |
Administrators can have multiple usernames and passwords. Administrators also have the ability to add or change their password. |
| Add URL form |
This form gives users the ability to add URLs to a collection, without having to go through an administrator. This makes it quick and easy to keep content up-to-date. Ultraseek automatically recognizes to which collections a document should be added. |
| URL status |
The URL status page gives users and administrators real-time information about a specified URL, such as when it was last modified, or why it was not indexed. This is a quick and efficient way for administrators to get information on a URL for maintenance and troubleshooting. |
| "User agent" and "from" fields |
Sites being indexed can quickly see who is issuing the HTTP requests. The site administrator can allow or disallow specific spiders, or contact the appropriate individual performing the indexing. |
| Intelligent duplicate elimination |
As documents are retrieved by the spider, they are checked to see if the same content is already in the index. If this is the case, only one copy will be maintained. The spider knows when a significant change has been made, and whether the document still qualifies as a duplicate. |
| Intelligent duplicate site recognition |
The spider can recognize duplicate site names. Domain aliases are identified and treated as duplicates, reducing server load. |
| Per-collection data directories |
You can specify data directories on a per-collection basis. You can even specify that a collection keep its data on a different file system from the rest of Autonomy Ultraseek’s data. This is useful if you want to spread out Ultraseek’s data across several file systems. |
| Intelligent obsolete recognition |
The system can synchronize the index with the Web server at any interval. Obsolete documents are deleted, and new documents are added, all of which is real-time updated to the collection. |
| Error recovery |
Automatic recovery from system crashes, maintaining all index updates. No need to worry about losing changes on large indexing runs. Server restarts automatically upon failure. |
| Scanner collections |
The file system scanner correctly deletes documents from the index if the directory they were in becomes deleted. |
| Manual index saves and merges in the background |
Manual index saves and merges are performed in background. This avoids administrative time waste, and prevents Web browsers from automatically timing out the connection. |
| Collection display name |
You can make the name of your collection separate from the displayed name that users see. For example, a collection named "faq" might be displayed as "Autonomy Ultraseek FAQ". |
| Multiple collection types |
Collections can be created by spidering across a network, scanning a file system, following newsgroups, mirroring existing collections, or merging existing collections. |
| Standard syslog facility |
Indexing logs are both controllable and compatible with existing tools. |
| Optional spam detection |
Autonomy Ultraseek can be set to either detect or ignore spam. You can even set the levels of spam allowed for search results. |
| Simple Licensing |
Licensing is a simple process done through a serial number, and can be upgraded over the phone. Entering an e-mail address for administrative contact automatically registers your product. Administrators are automatically notified when license parameters are exceeded. |
| Automatic directory listing recognition |
Autonomy Ultraseek automatically recognizes those documents that simply act as directory listings. It also allows you to decide whether these documents should be included in the index and whether links should be followed. |