Cleaner select distinct - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Cleaner select distinct

Author

Topic

rlull
Starting Member

39 Posts

Posted - 2014-04-30 : 18:59:08

I can accomplish what I want using the following syntax but I'm sure there's a more "pure" way to do this. Suggestions?

SELECT
DISTINCT(nv.NodeID), -- needed to filter out articles with multiple categories assigned
MIN(nv.NewsTitle) AS NewsTitle,
MIN(nv.NewsReleaseDate) AS NewsReleaseDate,
MIN(nv.NewsSummary) AS NewsSummary,
MIN(nv.NewsTeaser) AS NewsTeaser ,
MIN(dc.CategoryID) AS CategoryID,
MIN(nv.NodeAliasPath) AS NodeAliasPath
FROM View_CONTENT_News_Joined nv
JOIN CMS_DocumentCategory dc ON nv.DocumentID = dc.DocumentID
JOIN CMS_Category ct ON dc.CategoryID = ct.CategoryID
WHERE
nv.NodeSiteID = 1
AND nv.Published = 1
AND nv.DocumentCulture = N'en-US'
AND nv.NodeAliasPath LIKE N'/Blog/Article/%'
AND (ct.CategoryParentID = 68 OR dc.CategoryID = 68)
AND nv.NewsReleaseDate <= GETDATE()
GROUP BY NodeID, NewsReleaseDate
ORDER BY NewsReleaseDate DESC

stepson
Aged Yak Warrior

545 Posts

Posted - 2014-05-01 : 01:01:11

My first thought is about using Row_number(), but not sure enough

quote:

How to ask: http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

sabinWeb MCP

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-01 : 07:50:39

quote:
Originally posted by rlull

I can accomplish what I want using the following syntax but I'm sure there's a more "pure" way to do this. Suggestions?

If you're grouping by NodeId, NewsReleaseDate, you shouldn't need DISTINCT

rlull
Starting Member

39 Posts

Posted - 2014-05-01 : 11:50:17

Ok, I've tried to put this in the recommended format:

How do I select the unique rows from the following 4 tables? (Because of the joins on the category tables, and because a blog article may have multiple categories assigned to it, I get multiple rows with the same article.) I am able to accomplish what I want with the query below but it feels "hacky" putting "MIN" on each field.

Tables:

 

CREATE TABLE [dbo].[CONTENT_News](
	[NodeID] [int] IDENTITY(1,1) NOT NULL,
	[NewsTitle] [nvarchar](450) NOT NULL,
	[NewsReleaseDate] [datetime] NOT NULL,
	[NewsSummary] [nvarchar](max) NOT NULL,
	[NewsTeaser] [nvarchar](500) NULL,
	[AuthorID] [int] NULL
 CONSTRAINT [PK_CONTENT_News] PRIMARY KEY CLUSTERED 
(
	[NewsID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[custom_Author](
	[AuthorID] [int] IDENTITY(1,1) NOT NULL,
	[FirstName] [nvarchar](50) NULL,
	[LastName] [nvarchar](50) NOT NULL
 CONSTRAINT [PK_custom_Author] PRIMARY KEY CLUSTERED 
(
	[AuthorID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[CMS_DocumentCategory](
	[DocumentID] [int] NOT NULL,
	[CategoryID] [int] NOT NULL,
 CONSTRAINT [PK_CMS_DocumentCategory] PRIMARY KEY CLUSTERED 
(
	[DocumentID] ASC,
	[CategoryID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[CMS_DocumentCategory]  WITH CHECK ADD  CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category] FOREIGN KEY([CategoryID])
REFERENCES [dbo].[CMS_Category] ([CategoryID])
GO

ALTER TABLE [dbo].[CMS_DocumentCategory] CHECK CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category]
GO

------------------------

CREATE TABLE [dbo].[CMS_Category](
	[CategoryID] [int] IDENTITY(1,1) NOT NULL,
	[CategoryName] [nvarchar](250) NULL,
 CONSTRAINT [PK_CMS_Category] PRIMARY KEY NONCLUSTERED 
(
	[CategoryID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

My query that works but feels hacky

 
SELECT 
	nv.NodeID,
	MIN(nv.NewsTitle) AS NewsTitle,
	MIN(a.FirstName + ' ' + a.LastName) AS Author,
	MIN(nv.NewsReleaseDate) AS NewsReleaseDate,
	MIN(nv.NewsSummary) AS NewsSummary,
	MIN(nv.NewsTeaser) AS NewsTeaser ,
	MIN(dc.CategoryID) AS CategoryID,
	MIN(nv.NodeAliasPath) AS NodeAliasPath
FROM CONTENT_News nv
	JOIN CMS_DocumentCategory dc ON nv.DocumentID = dc.DocumentID
	JOIN CMS_Category ct ON dc.CategoryID = ct.CategoryID
	JOIN custom_Author a ON nv.AuthorID = a.AuthorID
WHERE 
	(ct.CategoryParentID = 68 OR dc.CategoryID = 68)
	AND nv.NewsReleaseDate <= GETDATE()
GROUP BY NodeID, NewsReleaseDate
ORDER BY NewsReleaseDate DESC

Results in 1 row per article

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2014-05-01 : 12:04:10

quote:
How do I select the unique rows from the following 4 tables

Your question doesn't exactly match your query. While you are getting sub set of "unique" rows per NodeID and NewsReleaseDate. That really isn't all the unique rows (assuming the aggregate is removing some). If that is what you want then your query works just fine. But, I think there is a detail or two that is still missing here.

Also, your sample tables do not match your query (CONTENT_News doesn't have a DocumentID column). It's hard to tell from what you posted but, are we to assume that you have a 1-to-many relationship someplace that is causing "duplicates" (multiple rows per NodeID & NewsReleaseDate)? Or is there something going on with the NodeID & NewsReleaseDate date that isn't clear? I'd assume that NodeID is unique for CONTENT_News, but you have another nonexistent column as the primary key (NewsID).

1. Can you clean up your schema so that it is correct?
2. Can you post sample data that show where you are getting duplicates (so we can run queries against it)?
3. Can you define in words what Unique means across those four tables?

My guess is that this query hinges on one table and if we can figure out how to resolve that you won't need to do aggregates. But, hard to tell from the information provided thus far.

rlull
Starting Member

39 Posts

Posted - 2014-05-01 : 12:47:38

Ok, sorry for the confusion. I tried to simplify for brevity. Let's try this again. There is a News table with articles in it. Each article has only 1 author associated with it. Each article can have one or more categories assigned to it. So if I don't filter by NodeID, which is unique, the same article will appear multiple times if it has more than one category assigned to it. I only want each article to appear once, regardless of how many categories are assigned to it.

Tables:

 

CREATE TABLE [dbo].[CONTENT_News](
	[NodeID] [int] IDENTITY(1,1) NOT NULL,
	[NewsTitle] [nvarchar](450) NOT NULL,
	[NewsReleaseDate] [datetime] NOT NULL,
	[NewsSummary] [nvarchar](max) NOT NULL,
	[NewsTeaser] [nvarchar](500) NULL,
	[AuthorID] [int] NULL
 CONSTRAINT [PK_CONTENT_News] PRIMARY KEY CLUSTERED 
(
	[NodeID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[custom_Author](
	[AuthorID] [int] IDENTITY(1,1) NOT NULL,
	[FirstName] [nvarchar](50) NULL,
	[LastName] [nvarchar](50) NOT NULL
 CONSTRAINT [PK_custom_Author] PRIMARY KEY CLUSTERED 
(
	[AuthorID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

CREATE TABLE [dbo].[CMS_DocumentCategory](
	[NodeID] [int] NOT NULL,
	[CategoryID] [int] NOT NULL,
 CONSTRAINT [PK_CMS_DocumentCategory] PRIMARY KEY CLUSTERED 
(
	[NodeID] ASC,
	[CategoryID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[CMS_DocumentCategory]  WITH CHECK ADD  CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category] FOREIGN KEY([CategoryID])
REFERENCES [dbo].[CMS_Category] ([CategoryID])
GO

ALTER TABLE [dbo].[CMS_DocumentCategory] CHECK CONSTRAINT [FK_CMS_DocumentCategory_CategoryID_CMS_Category]
GO

------------------------

CREATE TABLE [dbo].[CMS_Category](
	[CategoryID] [int] IDENTITY(1,1) NOT NULL,
	[CategoryName] [nvarchar](250) NULL,
 CONSTRAINT [PK_CMS_Category] PRIMARY KEY NONCLUSTERED 
(
	[CategoryID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

------------------------

This query results in multiple instances of the same article

 
SELECT 
    nv.NodeID,
	nv.NewsTitle AS Title,
	nv.NewsReleaseDate AS ReleaseDate,
	LTRIM(a.FirstName + ' ' + a.LastName) AS Author,
	nv.NewsSummary AS Summary,
	nv.NewsTeaser AS SmallImagePath ,
	dc.CategoryID AS CategoryId,
	ct.CategoryDisplayName AS CategoryName,
	nv.NodeAliasPath AS ArticleLink
FROM View_CONTENT_News_Joined nv
	JOIN CMS_DocumentCategory dc ON nv.DocumentID = dc.DocumentID
	JOIN CMS_Category ct ON dc.CategoryID = ct.CategoryID
	JOIN custom_Author a ON nv.AuthorID = a.AuthorID
WHERE 
	nv.NodeSiteID = 1
	AND nv.Published = 1 
	AND nv.DocumentCulture = N'en-US' 
	AND nv.NodeAliasPath LIKE N'/Blog/Article/%' 
	AND (ct.CategoryParentID = 8 OR dc.CategoryID = 8)
	AND nv.NewsReleaseDate <= GETDATE()
ORDER BY NodeID, NewsReleaseDate DESC

stepson
Aged Yak Warrior

545 Posts

Posted - 2014-05-02 : 01:41:24

quote:
I only want each article to appear once, regardless of how many categories are assigned to it.

Base on that, you need or not to display Category or display just one Category (any one) ?


dc.CategoryID AS CategoryId,
ct.CategoryDisplayName AS CategoryName,

Also, read the Lamprey's post.

And about the sample, some sample (few rows with dates -real/dummy- ) and the expected output.

sabinWeb MCP

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2014-05-02 : 11:58:27

Without sample data it's much harder to help. Plus your Schema doesn't math your query (CMS_DocumentCategory doesn't have a DocumentID column). I'm reading between the lines and I'm reading that you want to join all those tables together and only have one row per NodeID and that means selecting a random Category... Is that right?

If it were me I'd check performance of doing:
1. A Derived Table on CMS_DocumentCategory (sample below)
3. An OUTER APPLY
3. A Correlated Sub Query

Something like this:

SELECT 
    nv.NodeID,
	nv.NewsTitle AS Title,
	nv.NewsReleaseDate AS ReleaseDate,
	LTRIM(a.FirstName + ' ' + a.LastName) AS Author,
	nv.NewsSummary AS Summary,
	nv.NewsTeaser AS SmallImagePath ,
	dc.CategoryID AS CategoryId,
	ct.CategoryDisplayName AS CategoryName,
	nv.NodeAliasPath AS ArticleLink
FROM 
	View_CONTENT_News_Joined AS nv
JOIN 
	(
		SELECT 
			NodeID,
			MIN(CategoryID) AS DocumentID --??
		FROM
			CMS_DocumentCategory 
		GROUP BY
			NodeID
	) AS dc 
	ON nv.DocumentID = dc.DocumentID
JOIN 
	CMS_Category AS ct 
	ON dc.CategoryID = ct.CategoryID
JOIN 
	custom_Author AS a 
	ON nv.AuthorID = a.AuthorID

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources