SQL: Full-Text Searching in MS SQL Server 2005

Published by

Posted on May 17, 2008

1. Microsoft have a background document on Full Text Search concepts and I suggest that you read that first.
2. The fundamental fact of full-text indexing is that the database requires extra files called the catalog. You still have the normal .mdb and .ldb files, but you also have a folder called FTData which contains a folder for your catalog, which in turn contains lots of files that make up the full text index. These files are all made for you and kept up to date by SQL Server.
3. How to make the Catalog
Now must execute some SQL commands to create the full-text catalog. The catalog itself is a container to put one or more full-text indexes into. So, if you want to be able to search on the text of three of your database tables, you make a single catalog and create three indexes which go into that catalog. Note that if your indexes are going to be really huge you should make a separate catalog for each index, but to get started, stick with one catalog.
WARNING: Some of the information of the Microsoft website relates to the full version of SQL Server, such as the “Storage folder” that Express doesn’t have. Don’t panic, you just need to create the catalog by hand, rather than through the Management Interface. Also, you may find out-of-date information about SQL Server 2000, telling you that a table can only have one column full-text indexed. Ignore this, as it is not true under SQL Server 2005.
In this example I will make a catalog and put two indexes into it. Here is how:
1. Start SQL Server 2005 Management Studio Express, and connect to your database.
2. Click the “New Query” icon so that you can start entering SQL queries.
3. Type:
use mydatabase
EXEC sp_fulltext_database ‘enable’
4. Click “Execute” to run this SQL statement. This will enable fulltext searching on the database (it is not enabled by default), and then create the catalog, which will be called mycatalog. The catalog is just a folder called “mycatalog” in Microsoft SQL Server\SQL.1\FTData. Note here again that if your data tables are huge, this catalog could get huge too, so may be worth putting on a different disk to the one with the database on it, for performance reasons.
4. How to make the full-text index
The rules state that if you want to make a fulltext index on a table, that table must have a “unique, single-column, non-nullable index”. Very often your primary key will already be just like this, because all primary keys are unique and non-nullable. However, if your primary key is on more than one column, you have a slight problem. You will need to add an extra column to the table that is unique, and add a new index that is non-nullable on this column. The standard way to do this is to create a new column called “text_id” which is an integer column, and make it an identity with a seed of 1 and an increment of 1. If you do this, and add a new index on just that column, you will have a unique, single-column, non-nullable index on your table. Ok, so it wastes a bit of space, but you have to obey this rule or things don’t work.
If you already have a single-column primary key but it is on a really large field such as a GUID, it is suggested that you also add an int field and use that instead as the key field. This is because the full-text index will contain a copy of all the GUIDs, and so will take up a lot of space.

And now, how to create the fulltext index itself:

1. If you needed to add a new field, make sure you have created an index on it; otherwise just use the PK.
2. Execute this SQL:
CREATE FULLTEXT INDEX ON mydatabase.dbo.mytable
Language 2057
KEY INDEX myindex ON mycatalog
Here, mytable is the name of the table that we are building the index on, and column_to_be_indexed is the name of the column of text that you want to be able to search on. myindex is the name of the index to use (which is most likely your Primary Key), and mycatalog is the name of the catalog you previously created.
The “Language 2057” part tells SQL Server which language is used in the texts, such as English or French. This is the really clever part, because SQL Server knows that in English “mice” is the plural of “mouse”, so a search on “mouse” will return records that contain “mice” too. In this case I used 2057 which is the code for UK English. Microsoft have a table of different codes for different languages, and you should pick the appropriate one. There is also a code 0X0 which means “no language specified”.
3. Now, to verify that things have worked as planed, start SQL Server Manager interface and right-click on your table and select “Modify”. Find the column that you wanted to be indexed and right-click it, and choose “Full text index…”. Check that “Columns” is set to the column name you want indexed and check whether “Active” is set to “Yes”. If it isn’t then the index hasn’t yet been built.
4. You can find the status of your fulltext indexes like this:
exec sp_help_fulltext_columns mytable
This shows you which columns in given table are configured for full-text indexing. Also:
select * from sys.fulltext_indexes
shows you the status of the current indexes and when they were last updated. The key thing is to make sure that change_tracking_state_desc is set to “AUTO” and that has_crawl_completed = “1”. You can also see the time of the last crawl, meaning the time that the index was last updated.
5. You will have to wait a while for the index to be created. If you have a large table, the CPU usage on your PC will probably go up to 100% while this happens. There is a trouble-shooting section further down this page if you get problems.
1. Performing a Find on your index
The SQL SELECT statement to find records is quite simple:
SELECT mycolumn
FROM mytable
WHERE CONTAINS(mycolumn, ‘”find this”‘)
(You must put the search words in double-quotes within the single-quotes needed by SQL Server)
2. Eliminating Noise Words
There is a noise file for each language, and this contains simple words like “and” and “or” that are not worth searching for. If your text fields are in HTML or XML, then you should add words like “BR” and “P” to the noise words list.

reference : http://msdn2.microsoft.com/en-us/library/ms142571.aspx




Backup Failures