Textual Searches on File Data Using MS SQL Server 7.0Updated: July 19, 2001
by Margaret Li and Frank Pellow ![]() On This Page
AbstractMicrosoft® SQL Server™ version 7.0 introduces facilities that support textual queries on data in SQL Server as well as on data in the file system. This document describes searches on data in the file system. Several products and features have been brought together to support this capability, including SQL Server distributed queries, Windows NT® Server built-in Web server, Microsoft Internet Information Services (IIS) version 4.0, and Microsoft Indexing Service version 2.0. This document is for those who are familiar with SQL Server but not necessarily familiar with its textual search features and products. For more information about support of textual searches against data residing in SQL Server, see Extensions to SQL Server to Support Full-Text Search (part number 098-80764). This document introduces the types of textual searches supported by SQL Server and illustrates the roles performed by both the IIS and Indexing Service technologies. It also introduces SQL Server distributed queries and describes their use in the processing of textual searches. It then addresses the question of how SQL queries can be written against the file system when there are no tables in the file system. Finally, the SQL extensions to support full-text queries are described, and several examples, including examples that combine file data with database data, are provided. This document provides an overview of how to incorporate file data into SQL queries and of how the various components of SQL Server and other software interact to provide support for such queries. IntroductionA large portion of digitally stored information is in the form of unstructured data, primarily text, which is stored in the file system. This data is often related to data within the database, and there are requirements to support searches that include both sources. However, it is often inappropriate to import this data into the file system. SQL Server 7.0 distributed queries, coupled with extensions to the SQL language, make it possible to write such queries without the data. This capability is called file content search. There are two major types of textual searches:
File content search supports both these types of textual searches and couples them with the ability to incorporate such searches into a query that includes relational operations against database data. For example, the following search selects the names, sizes, and authors of all Microsoft Word files on drive D that contain the phrase "SQL Server" in close proximity to the word "text." It then joins this result with the writers table to obtain the author‘s citizenship. SELECT Q.FileName, Q.Size, Q.DocAuthor, W.Citizenship FROM OpenQuery(MyLinkedServer, ‘SELECT FileName, Size, DocAuthor FROM SCOPE(‘‘ "D:\" ‘‘) WHERE CONTAINS(‘‘"SQL Server" NEAR() text‘‘) AND FileName LIKE ‘‘%.doc%‘‘ ‘ ) AS Q, writers AS W WHERE Q.DocAuthor = W.writer_name File content search relies on the Microsoft OLE DB Provider for Indexing Service. It also relies on Indexing Service for support of underlying filters and full-text indexes. Notice that the OLE DB Provider gives Indexing Service 2.0 the ability to support SQL queries against data in the file system independent of SQL Server. The core extensions to the SQL language that support such queries are the same in Indexing Service and SQL Server. This document has the following purposes:
Only a small percentage of the facilities available in the supporting products are introduced in this document, and even when a facility is discussed, many of its options are not discussed. For more information, see the SQL Server documentation and the Indexing Service topic in the Windows NT Server documentation. Internet Information Services and Indexing ServiceMicrosoft Internet Information Services (IIS) 4.0 and Indexing Service 2.0 (both part of the Microsoft Windows NT 4.0 Option Pack) combine to provide property filtering and searching as well as full-text indexing and searching of file data. Windows NT 4.01 Service Pack 4 must be installed for proper interaction between Indexing Service and SQL Server. All of these capabilities are available completely independent of SQL Server. In particular, there are at least two ways to search that do not use SQL Server. One of these employs an Indexing Service specific query language; the other supports SQL-based queries within ActiveX® Data Objects (ADO). Neither alternative will be discussed in this document except to say that the SQL language used in ADO queries is consistent with the SQL extensions outlined here. This document will discuss property filtering and full-text indexing. Indexing Service provides filters for several file formats, including Microsoft Word, Microsoft PowerPoint®, Microsoft Excel, and HTML. Filters are also available for plain-text documents. Filters can be written by customers and third-party vendors for other formats such as Adobe Acrobat. Filters provide support for non-plain-text documents and capture property values both from the file content and about the files. Assuming that every file is a document, examples of properties include each document‘s title, the number of pages with notes in each PowerPoint presentation graphics program document, the number of paragraphs in each document, the last date and time each file was accessed, and the physical path to each file. A full list of file properties is provided later in this document. For more information, see the Indexing Service documentation. Full-text indexes are created by scanning file content. The process consists of tracking which significant words are used and where they are located. For example, a full-text index might indicate that the word "Canada" is found at word number 227, word number 473, and word number 1,017 in a given file. This index structure supports an efficient search for all items containing indexed words, as well as advanced search operations such as phrase searches and proximity searches. An example of a phrase search is looking for "white elephant," where "white" is immediately followed by "elephant." An example of a proximity search is looking for "big," where "big" occurs near "house." To prevent the full-text index from becoming bloated, noise words (words that are too common to expedite the search, such as "a," "and," "the," and "therefore") are ignored. Noise-word lists for many languages are available in the directory \Mssql\Ftdata\Sqlserver\Config. And the set of supported languages is growing. The choice of a particular noise-word list is based on the language of the material that is file-format dependent during the filtering process. Some files set the language per section or paragraph; some specify it for the entire document. These noise-word lists should be sufficient for most operations, but they can be modified. Administrators can use a regular text editor to modify the contents of a list. For example, a computer company can add the word "computer" to its noise-word list. Indexing Service stores indexes and property values in a text search catalog. By default, a text search catalog named Web is created when Indexing Service is installed. It is possible to specify more than one text search catalog, but this document confines itself to the use of Web and does not discuss the process used to create additional text search catalogs. A given text search catalog references one or more IIS virtual directories. A virtual directory references one or more physical directories and, optionally, other virtual directories. Once a real file is linked to the text search catalog by means of a virtual directory, Indexing Service is notified of any new files that need to be indexed and begins filtering and indexing the properties and content associated with those files. Indexing Service is also notified of any subsequent changes to the files and will refilter and reindex any updated files. The following screen fragment shows a directory for the default Web site on the computer Pellow-2. Notice that /Corpus is listed in both panes. /Corpus is the alias of a virtual directory that, in turn, points to the real directory, D:\Corpus. All the files in D:\Corpus have their properties and full-text indexes maintained in the Web text search catalog. The following screen fragment demonstrates how the Virtual Directory Wizard can be invoked to insert new virtual directories into the tree. This final screen fragment demonstrates the result of using the Virtual Directory Wizard to add the /SQL_standards virtual directory, which contains two virtual directories and one real directory. SQL Server Distributed QueriesSQL Server 7.0 supports access to data in multiple, heterogeneous data sources, which can be on either the same or different computers. The data can be stored in various relational and nonrelational data sources for which there is either an OLE DB provider or ODBC driver. OLE DB provider exposes its data in tabular objects called rowsets. SQL Server 7.0 allows a rowset from an OLE DB provider to be referenced in the FROM clause of an SQL query just as if it was a SQL Server table. OLE DB Provider for Indexing Service In the examples discussed in this document, OLE DB provider is supplied by Indexing Service. The sp_addlinkedserver stored procedure may be used to register data sources that will be referenced in distributed queries. In order to register OLE DB Provider for Indexing Service for the Web text search catalog on the same machine that SQL Server is running on, execute this statement: EXECUTE sp_AddLinkedServer FileSystem, ‘Indexing Service‘, ‘MSIDXS‘, ‘Web‘ Here are the definitions of the syntax terminology: FileSystem The linked_server_name assigned to this particular linked server. Indexing Service The product_name of the data source. MSIDXS The provider_name (PROGID) of OLE DB Provider for Indexing Service. Web The name of the text search catalog that will be used for this linked server. The OLE DB provider can now be referenced using the FileSystemlinked_server_name in the new OPENQUERY() result-set-valued function. For example: SELECT * FROM OpenQuery(FileSystem, ‘SELECT Directory, FileName, DocAuthor, Size, Create FROM SCOPE() WHERE CONTAINS( Contents, ‘‘Distributed‘‘ ) ‘ ) Notice that there are two SELECT statements. The inner SELECT statement (within the OPENQUERY() function) returns a result set as a table that can then be used like any other table in the FROM clause. In this case, the outer SELECT statement is a simple SELECT *, which passes on all the rows from the inner SELECT statement. Also notice that because the inner SELECT statement is specified as a constant parameter value within single quotes, all single quote characters within the inner SELECT statement must be doubled. That is why ‘ ‘Distributed‘ ‘ appears as it does. Two adjacent single quote marks (‘ ‘) are not the same as one double quote mark ("). In general, SQL Server distributed queries support both read and update access to the data source. In the case of Indexing Service, only read access is appropriate. Generally, distribution is supported to remote computers; however, the SQL Server file content search feature has been tested only with all components and all data residing in the same computer. Notice that OPENQUERY() does not work when running with a compatibility mode earlier than SQL Server 7.0. The compatibility mode can be set using the sp_dbcmptlevel stored procedure. Security There are special security considerations with OLE DB Provider for Indexing Service on the Windows NT operating system. SQL Server supplies a username and password on the current SQL Server login and on the login mapping set up in SQL Server of the form (current login, linked server) -> (remote login, remote password). However, OLE DB Provider for Indexing Service ignores the username and password and instead uses the Windows NT security context of the client (as if the client asked for a Windows NT Authentication mode connection). This means that OLE DB Provider for Indexing Service uses the Windows NT account under which SQL Server is running. Since this account is likely to be powerful, it can expose information about files to which the original SQL Server login has no privileges. This concern has been addressed by giving SQL Server administrators full control over who has access to OLE DB Provider for Indexing Service through SQL Server. The administrator can control the login mappings so that no one other than those who have explicit login mappings can gain access to the server (for example, an Indexing Service linked server). The administrator can also disable ad hoc access against a given provider so that no one can access Indexing Service through the ad hoc route without using a linked server. For example, if SQL Server is running under the Windows NT account sqlaccount, consider a linked server called mytextfiles that has been configured to point to a particular Indexing Service text search catalog. On Windows NT 4.0, when a SQL Server user executes a distributed query against mytextfiles, this query is executed under the privileges of the Windows NT account under which SQL Server is running (sqlaccount). Given this, the SQL Server security administrator must decide which SQL Server logins should have access to mytextfiles. This can be done by performing the following steps.
Defined TablesAt least one defined table must be specified in every SQL query. A defined table is a table where the number and types of columns is either known in advance or specified as part of the query. A relational database usually contains a number of defined tables, and metadata about the columns of these tables is stored in a schema. The collection of files in a file system does not generally have a predefined structure. The closest thing to columns are the properties of a file, but there is no deterministic set of properties for files. The closest thing to a row is a file, but files are usually not grouped in a homogeneous collection akin to rows in a table. Thus, in this case, the table concept is unclear, SELECT * is meaningless, and both the rows and columns are unbounded. Another way of looking at this is that a file system effectively has a universal schema consisting of every possible file property, both known and unknown. Indexing Service solves this problem by providing the SCOPE function as a means of defining the set of rows that makes up a virtual table and by providing file properties that substitute for columns. The SCOPE Function The SCOPE function is specified in the FROM clause of the Indexing Service query. It specifies the set of files that make up a virtual table. The syntax of the SCOPE function, simplified for this document, is as follows: Here are the definitions of the syntax terminology: () The virtual table consists of all the files that have been registered in the text search catalog and data source for the linked server specified in the OPENQUERY() function. DEEP TRAVERSAL OF The virtual table consists of all the files in the directory at the specified path or virtual directory as well as all the files in all the subdirectories (to any level) that are considered to be part of the virtual table. DEEP is the default. SHALLOW TRAVERSAL OF The virtual table consists only of the files in the top-level directory at the specified path or virtual directory that are considered to be part of the virtual table. physical_path A path to a real directory. If a real directory is specified, the filtering and indexing is done as part of the query processing, which can be time-consuming. virtual_directory The alias (or chain of aliases) assigned to a virtual directory that has been registered in the text search catalog and data source for the linked server specified in the OPENQUERY() function. In this case, the filtering and indexing will probably already have been done and, thus, the query will be much faster than when a physical path is specified. File Properties Indexing Service filters and maintains an excess of 50 file properties. All these can be specified in text file search queries. From the perspective of writing a SELECT statement, there are three types of file properties:
This list outlines some of the file properties.
Customers and third-party vendors can write filters to add to this set of file properties. They can also add properties, for example, by adding tags to an HTML document. In addition, to permit the query and retrieval of such user-defined file properties, the SQL extensions to Indexing Service include support for a SET statement that allows the specification of new file property names and their associated types. Virtual Tables It is possible to specify a query with the equivalent of a table in the file system, resulting in the select list and the FROM clause. For other parts of a SELECT statement, the properties can be used in place of columns in the WHERE and ORDER BY clauses. However, the GROUP BY and HAVING clauses are not supported by OLE DB Provider for Indexing Service. The following examples illustrate the use of all supported clauses. This query selects the full physical path and the file creation timestamp of all files in the /SQL-standards virtual directory and all its subdirectories, where the document contains the phrase "overloaded function." SELECT * FROM OpenQuery(FileSystem, ‘SELECT Path, Create FROM SCOPE(‘‘ "/SQL-standards" ‘‘) WHERE CONTAINS(Contents, ‘‘ "overloaded function" ‘‘) ‘ ) This query is similar to the previous query, except that only files directly in the /SQL-standards virtual directory are considered. SELECT * FROM OpenQuery(FileSystem, ‘SELECT Path, Create FROM SCOPE(‘‘ SHALLOW TRAVERSAL OF "/SQL-standards" ‘‘) WHERE CONTAINS(Contents, ‘‘ "overloaded function" ‘‘) ‘ ) This query is also similar, except that only files directly in the /SQL3 virtual subdirectory are considered. SELECT * FROM OpenQuery(FileSystem, ‘SELECT Path, Create FROM SCOPE(‘‘ "/SQL-standards/SQL3" ‘‘) WHERE CONTAINS(Contents, ‘‘ "overloaded function" ‘‘) ‘ ) This query selects author, title, subject, and file name of documents in all files that are either in the /Corpus virtual directory and its subdirectories or in the \Temp directory on drive C, where the document is at least 5,000 words, the author is either Wendy Vasse or Anas Abbar, and the rows representing those documents with the most pages are ordered highest. SELECT * FROM OpenQuery(FileSystem, ‘SELECT DocAuthor, DocTitle, DocSubject, FileName FROM SCOPE(‘‘ "/corpus" ‘‘, ‘‘ "C:\temp" ‘‘ ) WHERE DocWordCount >= 5000 AND ( DocAuthor = ‘‘Wendy Vasse‘‘ OR DocAuthor = ‘‘Anas Abbar‘‘ ) ORDER BY DocPageCount DESC ‘ ) Path through the Components This diagram illustrates the part that each component plays in the processing of a typical query. These items relate to the numbered items in the previous diagram.
SQL Extensions for Indexing Service Full-Text QueriesA point worth repeating is that the SQL extensions for Indexing Service are consistent with the SQL language supported for full-text search against relational database data. Furthermore, SQL support for full-text searching follows the SQL-3 functional methodology for full-text syntax extensions. The primary SQL extension consists of the CONTAINS and FREETEXT predicates. These predicates are used to find column values that match special full-text query criteria. To be consistent with similar features in other products and to make these predicates more extensible, functional notation is used. The high-level syntax is as follows: The flexibility of the functional style of these two predicates allows easy, upward-compatible future extensions for a third parameter to designate the language used for the query. The CONTAINS Predicate The CONTAINS predicate determines whether the content of files contains certain words and phrases. The CONTAINS predicate syntax is as follows:
Here are the definitions of the syntax terminology. property The property to be searched, whose data type is character-based. If property is not specified in the CONTENTS clause, then property is assumed. The value of the CONTENTS clause is the contents of the file after conversion to plain text (if necessary) by a filter. It is good practice to explicitly code the CONTENTS clause, rather than to accept it as a default. AND, OR, and AND NOT The Boolean operators used to join, or combine, terms. simple_term The term used to match the exact word or phrase being searched for. simple_term The simple_term syntax is as follows, where word refers to one or more characters without spaces or punctuation, and phrase refers to multiple words with spaces in between. Asian languages can have phrases made up of multiple words without any spaces in between. In keeping with the standard for full-text products, the search function is always case-insensitive. Here are some examples of simple terms used in the context of the CONTAINS predicate in a WHERE clause. WHERE CONTAINS( Contents, ‘hockey‘ ) WHERE CONTAINS( Contents, ‘ "ice hockey" ‘) For example, there is one file with a value of "This is a dissertation on the use of ice-cream sandwiches as hockey pucks" and another file with the value "Dissertation on new ways of splitting the atom." Since "this," "is," "a," and so on are noise words, they are not stored in the full-text index. Therefore, a query with this CONTAINS predicate: CONTAINS( Contents, ‘ "this is a dissertation" ‘ ) is the same as this query: CONTAINS( Contents, ‘dissertation‘ ) Both rows will be returned as hits because in the first query, the noise words are removed before processing the query. Combining Terms As with other SQL search conditions, more complex conditions can be specified by linking individual operands with Boolean operators. In this case, the operands are any of the types of terms being discussed. Except for the restriction that the OR NOT combination is not supported and that NOT cannot be specified before the first term, the rules are exactly the same as those used to combine individual predicates to form search conditions. For example, parentheses may be used to change the default priority order in which the operators are applied. Here are some examples of simple terms being combined within a CONTAINS predicate in a WHERE clause. WHERE CONTAINS( Contents, ‘hockey OR curling‘ ) WHERE CONTAINS( Contents, ‘hockey AND NOT field‘) WHERE CONTAINS( Contents, ‘ ("ice hockey" OR curling) AND NOT Canada ‘ ) prefix_term prefix_term is used to match words or phrases that begin with the specified text. The prefix_term syntax is as follows: A prefix term consists of a simple term appended with an asterisk (*) to activate prefix matching on a word or phrase. All text that starts with the material before the * is matched. The wildcard symbol (*) in this case is similar to the % symbol in the LIKE predicate in that it matches zero, one, or more characters of the root words in a word or phrase. In a phrase, each word within the phrase is considered to be a prefix; for example, the term "local bus*" matches "locality busy," "local bush," and "locale bust." Here are some examples of prefix terms used in the context of the CONTAINS predicate in a WHERE clause. WHERE CONTAINS( Contents, ‘ "atom*" ‘ ) This matches values that contain the word "atom," "atomic," "atomism," "atomy," and so on. WHERE CONTAINS( Contents, ‘ "wine*" OR "vine*" ‘) This matches values that contain the word "wine," "vine," or words such as "winery," "wines," "vineyard," or "vinegar." proximity_term proximity_term is used when the words or phrases being searched for must be close to one another. The proximity_term syntax is as follows: A proximity term is similar to an AND operator in that more than one word or phrase must exist in the value being searched. It differs from AND because the relevance of the match increases as the words appear closer together. The syntax is designed to be extensible for possible future support for specification of units of proximity such as words, sentences, paragraphs, chapters, and so on. NEAR() and ~ share the same meaning: the first word or phrase is close to the second word or phrase. "Close" is a purposefully vague term that can mean "within 50 words," but the algorithm is complicated. While words within the same sentence are one word distance apart, larger distances are assigned between units such as sentences, paragraphs, and chapters. Even if words or phrases are very far apart, the query is still considered to be satisfied; the row just has a low (zero) rank value. However, if the contains condition consists of proximity terms only, then SQL Server will not return rows with a rank value of zero. This can be avoided by specifying RANK > 0 as one of the predicates in the WHERE clause. It is possible to chain-code the proximity matching. For example," a ~ b ~ c " means that a should be near b, which should be near c. Because of the fuzzy nature of full-text searches, it is often desirable to see the rank values. This can be done by including the RANK property in the select list of the query. Here are some examples of proximity terms used in the context of the CONTAINS predicate in a WHERE clause. WHERE CONTAINS( Contents, ‘ hockey ~ player ‘ ) This matches values that contain the word "hockey" in close proximity to the word "player." WHERE CONTAINS( Contents, ‘ hockey ~ "play*" ‘) This matches values that contain the word "hockey" in close proximity to a word that starts with "play." WHERE CONTAINS( Contents, ‘ "great*" ~ "Mike Nash" ‘) AND Rank > 0 This matches values that contain words starting with "great" in close proximity to the phrase "Mike Nash." Values that meet the criteria but have a ranking of 0 do not have rows returned. generation_term generation_term is used when the words being searched for need to be expanded to include the variants of the original word. The generation_term syntax is as follows: The INFLECTIONAL predicate means that plural and singular forms of nouns and the various tenses of verbs will be matched. A single term will not match both exclusive noun and exclusive verb forms. The syntax is designed to be extensible enough to handle other linguistically generated forms, such as derivational, soundex, and thesaurus. Here is an example of a generation term used in the context of the CONTAINS predicate in a WHERE clause. WHERE CONTAINS(‘ FORMSOF (INFLECTIONAL, skate) ‘ ) This matches values that contain words such as "skate," "skates," "skated," and "skating." weighted_term weighted_term is used for queries that match a list of words and phrases, each optionally given its own weighting. Matching values must match only one element in the list. The weighted_term syntax is as follows, where n.nnn represents a decimal constant from zero through one.
A row is returned if there is a match on any one of the ISABOUT elements. Each component in the vector can be optionally weighted. The assigned weight forces a different measurement of the rank value that is assigned to each row that matches the query. Here are some examples of weighted terms used in the context of the CONTAINS predicate in a WHERE clause. WHERE CONTAINS( Contents, ‘ ISABOUT(hockey, puck, goalie) ‘ ) This matches article values that contain any of the words "hockey," "puck," or "goalie." The better matches will contain more than one of the words. WHERE CONTAINS( Contents, ‘ISABOUT("Canadian ice hockey" WEIGHT(1.0), "ice hockey" WEIGHT(.5), hockey WEIGHT(.2) ) ‘ ) This matches article values that may have information about Canadian ice hockey, with higher rank values assigned to articles that have more words from the phrase. The FREETEXT Predicate The FREETEXT predicate determines whether or not a value reflects the meaning, rather than the exact words, specified in the predicate. The FREETEXT predicate syntax is as follows:
This is a simple form of natural language query, where the index engine internally breaks the freetext string into a number of search terms, generates the stemmed form of the words, assigns heuristic weighting to each term, then finds the matches. Here is an example of a FREETEXT predicate used in a WHERE clause. WHERE FREETEXT( Contents, ‘ Who have been the most valuable ice hockey players from 1975 through 1982? ‘ ) Search Condition Support in Indexing Service The search_condition supported by Indexing Service is slightly different from the search_condition supported by SQL Server. Because these queries are distributed to the OLE DB provider for processing, the queries must follow the rules of the provider. The main difference is that OLE DB Provider for Indexing Service does not support the QUANTIFIED COMPARISON, BETWEEN, EXISTS, IN, or NULL predicates, but it does support two other predicates: MATCHES and ARRAY COMPARISON. These predicates are not yet directly supported by SQL Server. Following is an introduction to the search_condition syntax as supported by Indexing Service. Some aspects have been omitted, and the syntax of other aspects is incomplete. For more information, see the Indexing Service documentation. The search_condition syntax is as follows: The MATCHES and ARRAY COMPARISON Predicates This section briefly introduces the MATCHES and ARRAY COMPARISON predicates. For more information, see the Indexing Service topic in the Windows NT Server documentation. The MATCHES predicate provides more extensive pattern matching than the LIKE predicate. This added functionality bears the burden of a more complicated set of rules. The complete syntax and rules are not described here; rather, here are some examples to illustrate the use of this predicate. Here is a grouped match against more than one pattern, where it is known that the author‘s first name is Peggy, but the spelling of her second name is uncertain. WHERE MATCHES( DocAuthor, ‘Peggy |(MacK|,McK|,MacC|,McC|)arson‘ ) Here, it is uncertain if Pellow is spelled with one "l" or two. WHERE MATCHES( DocAuthor, ‘* Pel|{1,2|}ow‘ ) The ARRAY COMPARISON predicate is for use with the Indexing Service Vector properties. Some of the property values filtered by Indexing Service are multivalued. The data type of such values is a variable size array. SQL Server does not yet support such data types, but SQL-3 does. The SQL extension supported by OLE DB Provider for Indexing Service is consistent with that in SQL-3. For example, the virtual table contains a number of PowerPoint presentations, and you want to know the path to the presentations that contain any slide called "CONTAINS predicate," "FREETEXT predicate," or "Query Transformation." SELECT * FROM OpenQuery(FileSystem, ‘SELECT Path FROM SCOPE(‘‘ "/slide_presentations" ‘‘) WHERE DocPartTitles = SOME ARRAY[ ‘‘CONTAINS predicate‘‘, ‘‘FREETEXT predicate‘‘, ‘‘Query Transformation‘‘ ] ‘) For more information, see the Indexing Service documentation. Sample Full-Text Queries This section provides several examples that answer queries by combining database data and file data.
SQL Extensions for Site Server Full-Text QueriesMicrosoft Site Server version 3.0 also ships an OLE DB provider, the primary purpose of which is to allow users to write ADO application programs to query Web data. The OLE DB Provider for Site Server has not yet been tested with the SQL Server 7.0 query processor for distributed queries, and there is no official support for interoperability with SQL Server 7.0. However, users who wish to experiment with this configuration will find the following information useful. As with the OLE DB Provider for Indexing Service, the sp_addlinkedserver stored procedure is used to register OLE DB Provider for Site Server. For example, in order to register this provider for the WebTest text search catalog on the same machine that SQL Server is running on, the following statement will need to be executed: EXECUTE sp_addlinkedserver WebData, ‘Site Server‘, ‘MSSEARCHSQL‘, ‘WebTest‘ Here are the definitions of the syntax terminology. WebData The linked server name assigned to this particular linked sever. Site Server The product name of the data source. MSSEARCHSQL The provider name (PROGID) of OLE DB Provider for Site Server. WebTest The name of the text search catalog that will be used for this linked server. Notice that Site Server has additional syntax to support the union of results generated by a query across multiple catalogs. For more information, see the Site Server Search documentation. ConclusionThe textual search features supported by Microsoft SQL Server 7.0 are basic but serve many useful full-text searching purposes. Even this basic level allows inclusion of data both inside and outside the database. The SQL Server 7.0 query processor for distributed queries enables advanced capabilities by joining full-text searching results in the database with results generated by queries in the file system against Indexing Service 2.0. An excellent foundation has been laid during the integration of Microsoft information retrieval technologies into SQL Server 7.0, technologies that have already shipped in several Microsoft products and that will serve as the basis for other Microsoft products that support textual search. The component reuse strategy benefits the customer by providing seamless upgrade capability in the underlying index and search engine, and cross-product consistency. For more information about Microsoft SQL Server, see the SQL Server Web site at http://www.microsoft.com/sql/. |
|