Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Other SQL Server Topics (2005)
 Full Text Index not working with nvarchar source

Author  Topic 

gavinharriss
Starting Member

17 Posts

Posted - 2007-07-11 : 18:23:33
Hi, I was wondering if any SQL Server gurus out there could help me...

I have a table I'm trying to apply a full text catalog to, however no results are ever returned due to the text column being cataloged being of varbinary(max) that's being populated from a converted nvarchar(max) value.

To re-create the problem quickly...

If I populate the column via
CONVERT(varbinary(max), 'test text')
then there is no problem, I get results as expected.

However if I populate the column via
CONVERT(varbinary(max), CAST('test text' as nvarchar(max)))
no results are ever returned.

Is this a bug with SQL Server 2005 Full Text Indexing? I'm happily creating full text catalogs when an nvarchar is not getting converted into a varbinary.

I'm setting the Document Type column to '.html' (I've tried changing this to '.txt' in case it was a fault with the html ifilter but the problem persists so I believe I can rule this out).

The reason I need to convert an nvarchar to varbinary is that the table holds multi-lingual text and I'm adding a html meta tag <META NAME="MS.LOCALE" CONTENT="ES"> to the beginning in order for the full text indexing word breaker to select the correct language to catalog the text with. The aim being to provide more relevant searches in users native languages (I've read a few articles that describe this technique, but it's the first time I've tried to apply it).

Any pointers / suggestions would be greatly appreciated. Cheers,
Gavin.

Below is a T-SQL script you can run to demonstrate the effect I'm
experiencing...

-- Create test database
CREATE DATABASE FullTextTest
GO
USE FullTextTest
GO

-- Create test data table
CREATE TABLE TestTable
(
pk UNIQUEIDENTIFIER NOT NULL CONSTRAINT tablePK PRIMARY KEY,
varbinarycol VARBINARY(MAX),
documentExtension VARCHAR(5),
)
GO

-- The below single entry WILL BE FOUND (the text source is being entered
directly)
INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(),
CONVERT(VARBINARY(MAX),'<META NAME="MS.LOCALE" CONTENT="EN">test entry 1'),
'.html')

-- The bellow two entries below WILL NOT BE FOUND (the text source is taken
from an NVARCHAR(MAX) value)
INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(),
CONVERT(VARBINARY(MAX), CAST('<META NAME="MS.LOCALE" CONTENT="EN">test entry
2' AS NVARCHAR(MAX))), '.html')
INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(),
CONVERT(VARBINARY(MAX), CAST('<META NAME="MS.LOCALE" CONTENT="EN">test entry
3' AS NVARCHAR(MAX))), '.html')
GO

-- Create the full text catalog
sp_fulltext_database 'enable'
GO
CREATE FULLTEXT CATALOG TEST AS DEFAULT
GO
CREATE FULLTEXT INDEX ON TestTable (varbinarycol TYPE COLUMN
documentExtension LANGUAGE 1033)
KEY INDEX tablePK
GO

-- NOTE: You might need to give the catalog a chance to build before running
the script below.

-- Now do a search that SHOULD RETURN 3 ROWS of data, but ONLY 1 ROW IS
RETURNED
SELECT CAST(varbinarycol AS NVARCHAR(MAX)) FROM TestTable WHERE
CONTAINS(varbinarycol, 'test')


www.gavinharriss.com

gavinharriss
Starting Member

17 Posts

Posted - 2007-07-15 : 16:03:29
I've also tried adding the following extra meta tag - <META http-equiv="Content-Type" content="text/html; charset=utf-16">

But this seemed to break the Full Text Index even further. If applied to the example T-SQL I give above then even the non-nvarchar input is no longer returned:

INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(), CONVERT(VARBINARY(MAX),'<META http-equiv="Content-Type" content="text/html; charset=utf-16"><META NAME="MS.LOCALE" CONTENT="EN">test entry 1'), '.html')

www.gavinharriss.com
Go to Top of Page
   

- Advertisement -