Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Cleaner select distinct

Author  Topic 

rlull
Starting Member

39 Posts

Posted - 2014-04-30 : 18:59:08
I can accomplish what I want using the following syntax but I'm sure there's a more "pure" way to do this. Suggestions?

SELECT
DISTINCT(nv.NodeID), -- needed to filter out articles with multiple categories assigned
MIN(nv.NewsTitle) AS NewsTitle,
MIN(nv.NewsReleaseDate) AS NewsReleaseDate,
MIN(nv.NewsSummary) AS NewsSummary,
MIN(nv.NewsTeaser) AS NewsTeaser ,
MIN(dc.CategoryID) AS CategoryID,
MIN(nv.NodeAliasPath) AS NodeAliasPath
FROM View_CONTENT_News_Joined nv
JOIN CMS_DocumentCategory dc ON nv.DocumentID = dc.DocumentID
JOIN CMS_Category ct ON dc.CategoryID = ct.CategoryID
WHERE
nv.NodeSiteID = 1
AND nv.Published = 1
AND nv.DocumentCulture = N'en-US'
AND nv.NodeAliasPath LIKE N'/Blog/Article/%'
AND (ct.CategoryParentID = 68 OR dc.CategoryID = 68)
AND nv.NewsReleaseDate <= GETDATE()
GROUP BY NodeID, NewsReleaseDate
ORDER BY NewsReleaseDate DESC

stepson
Aged Yak Warrior

545 Posts

Posted - 2014-05-01 : 01:01:11
My first thought is about using Row_number(), but not sure enough

quote:

How to ask: http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx




sabinWeb MCP
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-01 : 07:50:39
quote:
Originally posted by rlull

I can accomplish what I want using the following syntax but I'm sure there's a more "pure" way to do this. Suggestions?




If you're grouping by NodeId, NewsReleaseDate, you shouldn't need DISTINCT
Go to Top of Page

rlull
Starting Member

39 Posts

Posted - 2014-05-01 : 11:50:17
Ok, I've tried to put this in the recommended format:

How do I select the unique rows from the following 4 tables? (Because of the joins on the category tables, and because a blog article may have multiple categories assigned to it, I get multiple rows with the same article.) I am able to accomplish what I want with the query below but it feels "hacky" putting "MIN" on each field.

Tables:
 

CREATE TABLE [dbo].[CONTENT_News](
[NodeID] [int] IDENTITY(1,1) NOT NULL,
[NewsTitle] [nvarchar](450) NOT NULL,
[NewsReleaseDate] [datetime] NOT NULL,
[NewsSummary] [nvarchar](max) NOT NULL,
[NewsTeaser] [nvarchar](500) NULL,
[AuthorID] [int] NULL
CONSTRAINT [PK_CONTENT_News] PRIMARY KEY CLUSTERED
(
[NewsID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[custom_Author](
[AuthorID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NOT NULL
CONSTRAINT [PK_custom_Author] PRIMARY KEY CLUSTERED
(
[AuthorID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[CMS_DocumentCategory](
[DocumentID] [int] NOT NULL,
[CategoryID] [int] NOT NULL,
CONSTRAINT [PK_CMS_DocumentCategory] PRIMARY KEY CLUSTERED
(
[DocumentID] ASC,
[CategoryID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[CMS_DocumentCategory] WITH CHECK ADD CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category] FOREIGN KEY([CategoryID])
REFERENCES [dbo].[CMS_Category] ([CategoryID])
GO

ALTER TABLE [dbo].[CMS_DocumentCategory] CHECK CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category]
GO

------------------------

CREATE TABLE [dbo].[CMS_Category](
[CategoryID] [int] IDENTITY(1,1) NOT NULL,
[CategoryName] [nvarchar](250) NULL,
CONSTRAINT [PK_CMS_Category] PRIMARY KEY NONCLUSTERED
(
[CategoryID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


------------------------

My query that works but feels hacky
 
SELECT
nv.NodeID,
MIN(nv.NewsTitle) AS NewsTitle,
MIN(a.FirstName + ' ' + a.LastName) AS Author,
MIN(nv.NewsReleaseDate) AS NewsReleaseDate,
MIN(nv.NewsSummary) AS NewsSummary,
MIN(nv.NewsTeaser) AS NewsTeaser ,
MIN(dc.CategoryID) AS CategoryID,
MIN(nv.NodeAliasPath) AS NodeAliasPath
FROM CONTENT_News nv
JOIN CMS_DocumentCategory dc ON nv.DocumentID = dc.DocumentID
JOIN CMS_Category ct ON dc.CategoryID = ct.CategoryID
JOIN custom_Author a ON nv.AuthorID = a.AuthorID
WHERE
(ct.CategoryParentID = 68 OR dc.CategoryID = 68)
AND nv.NewsReleaseDate <= GETDATE()
GROUP BY NodeID, NewsReleaseDate
ORDER BY NewsReleaseDate DESC


Results in 1 row per article
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2014-05-01 : 12:04:10
quote:
How do I select the unique rows from the following 4 tables

Your question doesn't exactly match your query. While you are getting sub set of "unique" rows per NodeID and NewsReleaseDate. That really isn't all the unique rows (assuming the aggregate is removing some). If that is what you want then your query works just fine. But, I think there is a detail or two that is still missing here.

Also, your sample tables do not match your query (CONTENT_News doesn't have a DocumentID column). It's hard to tell from what you posted but, are we to assume that you have a 1-to-many relationship someplace that is causing "duplicates" (multiple rows per NodeID & NewsReleaseDate)? Or is there something going on with the NodeID & NewsReleaseDate date that isn't clear? I'd assume that NodeID is unique for CONTENT_News, but you have another nonexistent column as the primary key (NewsID).

1. Can you clean up your schema so that it is correct?
2. Can you post sample data that show where you are getting duplicates (so we can run queries against it)?
3. Can you define in words what Unique means across those four tables?

My guess is that this query hinges on one table and if we can figure out how to resolve that you won't need to do aggregates. But, hard to tell from the information provided thus far.
Go to Top of Page

rlull
Starting Member

39 Posts

Posted - 2014-05-01 : 12:47:38
Ok, sorry for the confusion. I tried to simplify for brevity. Let's try this again. There is a News table with articles in it. Each article has only 1 author associated with it. Each article can have one or more categories assigned to it. So if I don't filter by NodeID, which is unique, the same article will appear multiple times if it has more than one category assigned to it. I only want each article to appear once, regardless of how many categories are assigned to it.

Tables:
 

CREATE TABLE [dbo].[CONTENT_News](
[NodeID] [int] IDENTITY(1,1) NOT NULL,
[NewsTitle] [nvarchar](450) NOT NULL,
[NewsReleaseDate] [datetime] NOT NULL,
[NewsSummary] [nvarchar](max) NOT NULL,
[NewsTeaser] [nvarchar](500) NULL,
[AuthorID] [int] NULL
CONSTRAINT [PK_CONTENT_News] PRIMARY KEY CLUSTERED
(
[NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[custom_Author](
[AuthorID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NOT NULL
CONSTRAINT [PK_custom_Author] PRIMARY KEY CLUSTERED
(
[AuthorID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[CMS_DocumentCategory](
[NodeID] [int] NOT NULL,
[CategoryID] [int] NOT NULL,
CONSTRAINT [PK_CMS_DocumentCategory] PRIMARY KEY CLUSTERED
(
[NodeID] ASC,
[CategoryID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[CMS_DocumentCategory] WITH CHECK ADD CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category] FOREIGN KEY([CategoryID])
REFERENCES [dbo].[CMS_Category] ([CategoryID])
GO

ALTER TABLE [dbo].[CMS_DocumentCategory] CHECK CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category]
GO

------------------------

CREATE TABLE [dbo].[CMS_Category](
[CategoryID] [int] IDENTITY(1,1) NOT NULL,
[CategoryName] [nvarchar](250) NULL,
CONSTRAINT [PK_CMS_Category] PRIMARY KEY NONCLUSTERED
(
[CategoryID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


------------------------

This query results in multiple instances of the same article
 
SELECT
nv.NodeID,
nv.NewsTitle AS Title,
nv.NewsReleaseDate AS ReleaseDate,
LTRIM(a.FirstName + ' ' + a.LastName) AS Author,
nv.NewsSummary AS Summary,
nv.NewsTeaser AS SmallImagePath ,
dc.CategoryID AS CategoryId,
ct.CategoryDisplayName AS CategoryName,
nv.NodeAliasPath AS ArticleLink
FROM View_CONTENT_News_Joined nv
JOIN CMS_DocumentCategory dc ON nv.DocumentID = dc.DocumentID
JOIN CMS_Category ct ON dc.CategoryID = ct.CategoryID
JOIN custom_Author a ON nv.AuthorID = a.AuthorID
WHERE
nv.NodeSiteID = 1
AND nv.Published = 1
AND nv.DocumentCulture = N'en-US'
AND nv.NodeAliasPath LIKE N'/Blog/Article/%'
AND (ct.CategoryParentID = 8 OR dc.CategoryID = 8)
AND nv.NewsReleaseDate <= GETDATE()
ORDER BY NodeID, NewsReleaseDate DESC
Go to Top of Page

stepson
Aged Yak Warrior

545 Posts

Posted - 2014-05-02 : 01:41:24
quote:
I only want each article to appear once, regardless of how many categories are assigned to it.


Base on that, you need or not to display Category or display just one Category (any one) ?

dc.CategoryID AS CategoryId,
ct.CategoryDisplayName AS CategoryName,


Also, read the Lamprey's post.

And about the sample, some sample (few rows with dates -real/dummy- ) and the expected output.





sabinWeb MCP
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2014-05-02 : 11:58:27
Without sample data it's much harder to help. Plus your Schema doesn't math your query (CMS_DocumentCategory doesn't have a DocumentID column). I'm reading between the lines and I'm reading that you want to join all those tables together and only have one row per NodeID and that means selecting a random Category... Is that right?

If it were me I'd check performance of doing:
1. A Derived Table on CMS_DocumentCategory (sample below)
3. An OUTER APPLY
3. A Correlated Sub Query

Something like this:
SELECT 
nv.NodeID,
nv.NewsTitle AS Title,
nv.NewsReleaseDate AS ReleaseDate,
LTRIM(a.FirstName + ' ' + a.LastName) AS Author,
nv.NewsSummary AS Summary,
nv.NewsTeaser AS SmallImagePath ,
dc.CategoryID AS CategoryId,
ct.CategoryDisplayName AS CategoryName,
nv.NodeAliasPath AS ArticleLink
FROM
View_CONTENT_News_Joined AS nv
JOIN
(
SELECT
NodeID,
MIN(CategoryID) AS DocumentID --??
FROM
CMS_DocumentCategory
GROUP BY
NodeID
) AS dc
ON nv.DocumentID = dc.DocumentID
JOIN
CMS_Category AS ct
ON dc.CategoryID = ct.CategoryID
JOIN
custom_Author AS a
ON nv.AuthorID = a.AuthorID
Go to Top of Page
   

- Advertisement -