Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Developing a Search portal site ,Done Right in sql

Author  Topic 

Johnhamman
Starting Member

37 Posts

Posted - 2003-01-21 : 10:46:02
Hi all,
I have been given the task of developing a search portal site like that of aspin . com or 411asp . net . I would like to develope this in the most efficiant way possible. Does anyone have any suggestions for database design for this site, so that searching for items is as quick as possible. Here is more details of the site.
The site will have a directory structure like that of yahoo. Users can submit links to products and tutorial sites. (Possibly a spider will go to the page and extract keywords.) or Someone will visit the site and extract keywords and sutch of the page. The questions is how do i develope the most accurate search feature for this site (database design) that will be fast too. The site will have anything from a couple hundred pages to a 1000 or so pages to index and search thru. I have never done a search feature and have seen many tutorials but i dont know what is the best method and why.
I would love to hear suggestions on this subject also. Thanks.
john
PS this site will also have forums on it and i would like to include that in the search.


Onamuji
Aged Yak Warrior

504 Posts

Posted - 2003-01-21 : 10:55:19
Off the top of my head and from previous experience:

CREATE TABLE Keywords (Keyword VARCHAR(32) NOT NULL PRIMARY KEY CLUSTERED)
CREATE TABLE Pages (PageID INT NOT NULL IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, Url VARCHAR(256) NOT NULL UNIQUE CLUSTERED, PRIMARY KEY CLUSTERED (PageID, Keyword)) -- add date fields for maintaining the page, like last crawled for keeping pages up to date
CREATE TABLE PageKeywords (PageID INT NOT NULL REFERENCES Pages(PageID), Keyword VARCHAR(32) NOT NULL REFERENCES Keywords(Keyword))

SELECT Pages.Url FROM Pages INNER JOIN PageKeywords ON Pages.PageID = PageKeywords.PageID WHERE PageKeywords.Keyword LIKE 'sql%'

You might also be able to use this other version for full text indexing

CREATE TABLE Keywords (KeywordID INT NOT NULL IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, Value VARCHAR(128) NOT NULL UNIQUE CLUSTERED) -- use a VARCHAR size that is good for most common keywords ... like they really shouldn't be this large but you may get a key word like 'sql', 'server', 'sql server'.. you know compound keywords
-- leave the Pages table the same
CREATE TABLE PageKeywords (PageID INT NOT NULL REFERENCES Pages(PageID), KeywordID INT NOT NULL REFERENCES Keywords(KeywordID), PRIMARY KEY CLUSTERED (PageID, KeywordID))

SELECT Pages.Url FROM Pages INNER JOIN PageKeywords ON Pages.PageID = PageKeywords.PageID INNER JOIN Keywords ON PageKeywords.KeywordID = Keywords.KeywordID WHERE Keywords.Value LIKE 'sql%'

... just ideas ... maybe someone more experienced in this area has better schemas/design suggestions ... your schema will mimic the complexity you want to support ... the more complex solution you want to provide, the more complex your schema may be.

Go to Top of Page

Johnhamman
Starting Member

37 Posts

Posted - 2003-01-21 : 11:33:07
One thing that i want to do is build a spider that pulls back and enters into a database any metatag keywords, metatag descriptions, the page title, and then all text from the page and all alt tags. And use this information in the database to provide the search. I hope this little tidbit will help with suggestions.
thanks all

Go to Top of Page
   

- Advertisement -