Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
irs2k3
Starting Member
22 Posts |
Posted - 2010-04-03 : 03:56:50
|
Hi Guys,I have a table called News that stores all news fetched from RSS feeds now I want to record the number of clicks for each news item in the GUI.I have created another Table called News_MIS that contains newsID and #clicks.When I want to display the number of clicks I join the 2 tables to get the number of clicks for each news.For the news table with over 5 million records is this the best approach of should I place the Clicks column inside the news Table that way I query a Table and not a view ?Thanks |
|
Kristen
Test
22859 Posts |
Posted - 2010-04-03 : 04:20:08
|
I would put the clicks in a separate table. Its a very high update table, and I would not want to be updating the news table itself that often (it will have a TEXT or Varchar(MAX) column I presume, and may have Trigger to store old versions, etc., best not to share that with the high-volume updating columns.So a 1:1 "parallel" table for the Clicks with the same PK as the News (which is hopefully an Integer ID, and thus very "narrow"). |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-04-03 : 04:21:18
|
P.S. I would create the CLICKS table row at the same time as the News Record (with a CLICK COUNT of zero), so that the "Clicked" action just needs to increment it using UPDATE, never have to INSERT a row. |
|
|
irs2k3
Starting Member
22 Posts |
Posted - 2010-04-03 : 04:24:46
|
Thanks for the heads up.Because I heard that quering a table is better than querying a view or 2 tables?Is this true for a table of like 5 milions record? |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-04-03 : 05:20:27
|
Depends!I think the high volume of updates is the thing you need to optimise for, rather than the Select queryIf you want to find the number of clicks for News Item = 12345 then that is a single, clustered, primary key lookup - "instant", even if you have a huge number of rows.If you want to find the top-10 most popular news items that is a more involved query on a big table. You could index by the number of clicks - but then EVERY update to the Clicks has to also update the Index.I think it would be better to have a scheduled task than runs, say, every hour (or maybe only once a day) that gets the top 100 news stories, top 100 stories "new" in the last week / month, top 100 stories that have had the most increase in their Clicks in the last, say, week ... and store the results into a "TopNewsItems" table - and then query than whenever anyone wants the "Top 10" / "Top 100" stores / recent stories / biggest increasers in popularity etc. - kinda like a Cache.The VIEW thing is moot. If you have a VIEWCREATE VIEW MyViewASSELECT N.ID AS ID, N.NCol1, N.NCol2, ..., C.CCol1, C.CCol2, ...FROM NewsTable AS N JOIN NewsClicks AS C ON C.ID = N.ID thenSELECT ID, NCol1, CCol1 FROM MyView WHERE ID = 12345 is exactly the same as doingSELECT N.ID, NCol1, CCol1FROM NewsTable AS N JOIN NewsClicks AS C ON C.ID = N.IDWHERE N.ID = 12345 and assuming that ID is the Clustered PK on both NewsTable and NewsClicks it will be "instant" (for a single, specific, ID)AlsoUPDATE USET NumberOfClicks = NumberOfClicks + 1FROM NewsClicksWHERE ID = 12345 to increment the number-of-clicks on a given News ID record will be very fast. [number-of-clicks] is a fixed length, integer column, so will not increase in width, so SQL can just replace it with the new value in-situ - no page splits etc. That operation is Atomic, so will not block other queries etc.Finding top 100 popular news items isSELECT TOP 100 IDFROM NewsClicksORDER BY NumberOfClicks DESC this will be slow (on a big table, if there is no index on NumberOfClicks - and I don't think an index on NumberOfClicks is a good idea because of the huge number of updates on that column), hence I suggest caching it.However, because [NewsClicks] table is very "slim" (just News-ID and Number-of-Clicks columns) there will be lots of records oin each data page, so it will be very fast to do a table-scan - as compared to having the Clicks in the same table as the News Record / Text itself.But to reduce the frequency of such queries I would save them in a "caching table" - as I described above, using a scheduled task every hour / day. |
|
|
irs2k3
Starting Member
22 Posts |
Posted - 2010-04-03 : 05:32:49
|
Man, I appreciate your help. Thank you.I have a Table in the DB called TH_News_MIS with the following schema : ,[NewsID] ,[Clicks] ,[LastClickDate] ,[Trims] ,[LastTrimDate] ,[Comments] ,[LastCommentDate] When A User clicks a news this table will be updated if the news exist else a new row will be created in the table.Now to get the Top 100 News for example, I can Query this MIS table and get the ids of news and then query the news table.What do you think? |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-04-03 : 08:12:46
|
"When A User clicks a news this table will be updated if the news exist else a new row will be created in the table."I would create a ROW in TH_News_MIS when you create a row in News table - so that there is always a row in this table.Otherwise EVERY time you increment the count you have to do some sort of EXISTS test to decide whether to Insert or Update. Much faster to just do an UPDATE knowing that the row will exist."Now to get the Top 100 News for example, I can Query this MIS table and get the ids of news and then query the news table"Yes, but that will be slow on a large table, hence my suggestion to have a Caching Table for the "TOP 100" type queries, and refresh the cache table periodically.Then when a user checks the "Popular News" you just query the Cache Table and JOIN to the News Table by NewsID |
|
|
irs2k3
Starting Member
22 Posts |
Posted - 2010-04-03 : 09:40:18
|
Thanks a lot mate.I will try to implement this technique. |
|
|
|
|
|
|
|