2 subqueries into 1 - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

2 subqueries into 1

Author

Topic

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-14 : 13:31:14

Looking to speed up this dog of a SQL...

Q1 = Main Query
J1 = Inner Join on Main Query
Q2 = Query for TimeIn column of previous record for previous module based on TimeStamp
Q3 = Query for TempIn column of previous record for previous module based on TimeStamp
(Q3 and Q2 will return the same record)

SELECT  Q1.Module,J1.Description,Q1.TimeStamp AS OutTime,Q1.DataVal AS OutTemp,
	(SELECT TOP 1 TimeStamp FROM TraxData Q2
		WHERE Q2.Customer_No = Q1.Customer_No
		AND Q2.TimeStamp <= Q1.TimeStamp
		AND Q2.Module = Q1.Module - 1
		ORDER BY Q2.TimeStamp DESC
	)AS InTime,
	(SELECT TOP 1 DataVal FROM TraxData Q3
		WHERE Q3.Customer_No = Q1.Customer_No
		AND Q3.TimeStamp <= Q1.TimeStamp
		AND Q3.Module = Q1.Module - 1
		ORDER BY Q3.TimeStamp DESC
	)AS InTemp
FROM TraxData Q1
INNER JOIN TraxMods J1 ON J1.Customer_No = Q1.Customer_No 
	AND J1.Module = Q1.Module
WHERE Q1.Customer_No = 12186 AND Q1.TimeStamp > '01/01/2006'
	AND Q1.Module = 5
ORDER BY 2,3

Is there any way I can get the InTime and InTemp fields with 1 query? It would be coming from the same record.

I already have indexes setup for TimeStamp and Module on TraxData and TraxMods table.

TIA!

Srinika
Master Smack Fu Yak Hacker

1378 Posts

Posted - 2006-02-14 : 13:45:52

Didn't u try inner joining that Q2 (or Q3) ?

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-14 : 14:03:52

DO you mean changing the WHERE clause of Q2 and Q3 to a JOIN on Q1? I'll try it but why would that help? I don't think I understand. Can you show me what you mean?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2006-02-14 : 14:21:05

Ken,

In order to help, we'll need the DDL for all tables involved, INSERT INTO statements for sample data, and the expected result set using that sample data. The sample data should contain enough rows to illustrate your problem.

Tara Kizer
aka tduggan

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-14 : 15:31:30

OK, I can easily generate the script to create the tables, but how should I easily create the insert statements? Do I have to do that by hand? Can I send you an export?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2006-02-14 : 15:36:39

It typically is done by hand. When I post questions, I provide between 5-10 rows in each of the tables making sure that the rows can be joined together and also that the sample rows illustrate my problem. Also make sure to exclude any columns that are not relevant to your problem so that the DDL and DML are short and simple.

We answer these questions for free. Sending us an export of your data, analyzing it, coming up with a solution, would not be free.

Tara Kizer
aka tduggan

X002548
Not Just a Number

15586 Posts

Posted - 2006-02-14 : 15:36:56

Read the hint link in my sig...you don't need a lot of data, just one case should do it.

And you only have 2 tables...

Don't forget the expected results.

Read the hint in the link below

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-14 : 16:27:30

The question is How do I get the previous timestamp and dataval from the previous module (21) based on the module and timestamp from module 22?

OK, here's the code to create the tables with some very simple data in them... Note the indexes. They are clustered on Customer_No & ChemWatch_ID for lookup/insertion purposes.


CREATE TABLE [dbo].[traxdata] (
	[module] [int] NULL ,
	[timestamp] [datetime] NULL ,
	[dataval] [numeric](11, 2) NULL ,
	[chemwatch_id] [int] NOT NULL ,
	[customer_no] [numeric](8, 0) NOT NULL 
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[traxmods] (
	[module] [int] NULL ,
	[description] [char] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[chemwatch_id] [int] NOT NULL ,
	[customer_no] [numeric](8, 0) NOT NULL 
) ON [PRIMARY]
GO

 CREATE  INDEX [IX_traxdata_1] ON [dbo].[traxdata]([customer_no], [module], [timestamp]) ON [PRIMARY]
GO

 CREATE  INDEX [IX_traxmods_1] ON [dbo].[traxmods]([customer_no], [module]) ON [PRIMARY]
GO


INSERT INTO TraxMods (Module, Description,Customer_No,ChemWatch_ID)
	Select 22,'Out Mod 22',12186,1 UNION ALL
	Select 21,'In Mod 21',12186,2
GO

INSERT INTO TraxData (Module, TimeStamp, DataVal,Customer_No,ChemWatch_ID)
	Select 21,'01/01/2006 10:00',1000,12186,1 UNION ALL
	Select 21,'01/01/2006 11:00',1100,12186,2 UNION ALL
	Select 21,'01/01/2006 12:00',1200,12186,3 UNION ALL
	Select 21,'01/01/2006 13:00',1300,12186,4 UNION ALL
	Select 21,'01/01/2006 14:00',1400,12186,5
GO

INSERT INTO TraxData (Module, TimeStamp, DataVal,Customer_No,ChemWatch_ID)
	Select 22,'01/01/2006 10:01',1001,12186,6 UNION ALL
	Select 22,'01/01/2006 11:01',1101,12186,7 UNION ALL
	Select 22,'01/01/2006 12:01',1202,12186,8 UNION ALL
	Select 22,'01/01/2006 13:01',1303,12186,9 UNION ALL
	Select 22,'01/01/2006 14:01',1404,12186,10
GO

Select * From TraxData WHERE Module > 20
Select * From TraxMods WHERE Module > 20

Here's the results I need...


22	Out Mod 22                              	2006-01-01 10:01:00.000	1001.00	2006-01-01 10:00:00.000	1000.00
22	Out Mod 22                              	2006-01-01 11:01:00.000	1101.00	2006-01-01 11:00:00.000	1100.00
22	Out Mod 22                              	2006-01-01 12:01:00.000	1202.00	2006-01-01 12:00:00.000	1200.00
22	Out Mod 22                              	2006-01-01 13:01:00.000	1303.00	2006-01-01 13:00:00.000	1300.00
22	Out Mod 22                              	2006-01-01 14:01:00.000	1404.00	2006-01-01 14:00:00.000	1400.00

From this SQL which while it works, runs way too slow on a table with 600,000+ records...

DECLARE  @OutMod int
SET @OutMod = 22

SELECT  Q1.Module,J1.Description,Q1.TimeStamp AS OutTime,Q1.DataVal AS OutTemp,
	(SELECT TOP 1 TimeStamp FROM TraxData Q2
		WHERE Q2.Customer_No = Q1.Customer_No
		AND Q2.TimeStamp <= Q1.TimeStamp
		AND Q2.Module = Q1.Module - 1
		ORDER BY Q2.TimeStamp DESC
	)AS InTime,
	(SELECT TOP 1 DataVal FROM TraxData Q3
		WHERE Q3.Customer_No = Q1.Customer_No
		AND Q3.TimeStamp <= Q1.TimeStamp
		AND Q3.Module = Q1.Module - 1
		ORDER BY Q3.TimeStamp DESC
	)AS InTemp
FROM TraxData Q1
INNER JOIN TraxMods J1 ON J1.Customer_No = Q1.Customer_No 
	AND J1.Module = Q1.Module
WHERE Q1.Customer_No = 12186 AND Q1.TimeStamp > '01/01/2006'
	AND Q1.Module = @OutMod
ORDER BY 2,3

Thanks Tara!

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2006-02-14 : 16:50:08

As always, break it down into steps:

Step 1: get the previous timeStamp and Module for each traxData transaction:


select t.*, (select Max(TimeStamp) from TraxData t2 
             where t2.Module+1 = t.module and 
                   t2.Customer_No = t.Customer_No and
                   t2.TimeStamp < t.TimeStamp
            ) as PrevTimeStamp,
           t.Module-1 as PrevModule
from traxData t

And now to get the DataValue, you should be able to say:


select a.*, t.DataVal as PreviousVal
from
  (above SQL) a
left outer join
  TraxData t
on
  t.Customer_no  = a.Customer_No and
  t.Module = a.PrevModule and
  t.TimeStamp = a.PrevTimeStamp
where
  a.Module = @OutMod

join to other tables and add additional filters in the last step. Should be more efficient that what you have.

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2006-02-14 : 16:50:30

by the way -- thanks for the great DML and DDL ! Much appreciated.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2006-02-14 : 16:59:50

I don't understand the data. But to show you that I tried:


SELECT m.module, m.[description], d1.[timestamp], d1.dataval, d2.[timestamp], d2.dataval
FROM traxdata d1
INNER JOIN
(
	SELECT customer_no, MIN(chemwatch_id) AS chemwatch_id
	FROM traxdata m
	WHERE module = 22
	GROUP BY customer_no
) t
ON d1.customer_no = t.customer_no AND d1.chemwatch_id = t.chemwatch_id
INNER JOIN traxmods m
ON d1.module = m.module
INNER JOIN traxdata d2
ON m.customer_no = d2.customer_no AND m.chemwatch_id = d2.chemwatch_id

Mine only gets one row though, but perhaps it gives you some ideas.

Tara Kizer
aka tduggan

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-14 : 17:18:31

Tara - I threw you off by including the ChemWatch_ID column, ignore it. I only included that because the tables are clustered on Customer_No,ChemWatch_ID.

Module 21 is a temperature that measures water coming in. Module 22 is a temperature that measures water going out. I need to know what the last temperature coming in was for each temperature going out. They will never have the same timestamp, becuase data is only recorded when the temperature fluctuates 10 degrees on output, and 2 degrees an input. So there could be a lot more "in" temps than "out" Temps records.

Thanks a bunch!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2006-02-14 : 17:22:01

Does Jeff's solution work for you?

Tara Kizer
aka tduggan

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-14 : 17:25:38

JSmith - I executed the first step like this...

DECLARE @OutMod int
SET @OutMod = 5

select t.*, (select Max(TimeStamp) from TraxData t2 
             where t2.Module+1 = t.module and 
                   t2.Customer_No = t.Customer_No and
                   t2.TimeStamp < t.TimeStamp
            ) as PrevTimeStamp,
           t.Module-1 as PrevModule
from traxData t
Where t.Module = @OutMod

24 minutes later & it's still not done. Module 5 has 97,819 rows, Module 4 has 92,374 rows. The table has a total of 678,000 rows. Is it actually looking through all 92,374 records for the latest record on Module 4 for each Module 5 record 97,819 times? I would think there would be a more efficient way of doig this.

Thanks for your help!

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-14 : 17:41:23

JSmith/Tara - Another question. In Jeff's example, would the filters in step 2 be applied to step 1 when the queries are combined?

Gotta go for now - It's V-Day & the fam is waiting. Will try again tomorrow morning.

Thanks again guys. Tara - If the last name change is due to marriage then congradulations! Get off the computer & go spend time with your new family!

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-15 : 08:57:39

I'm back! That step 1 SQL took 4:45 hours. Obviously I would never generate it for the 97,000+ rows, but even generating it for a month takes a long time. Looking at the Execution Plan 84% was dedicated to a process titled Index Pool/Eager Spool, which claims that it creates a temporary index, and had a row count of over 988 million. Ouch! As you can see by the DDL, I have an index on the TraxData table which combines the Customer_No, Module, and TimeStamp columns. Shouldn't that be sufficient? I am going to run the Index Tuning Wizard on it now.

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2006-02-15 : 09:13:42

First off -- aren't you running this for 1 module, for a certain period of time? How many rows are there for that? Never test a sql statement by just running it for all of your rows (especially if you have millions) -- run it for a realistic amount based on what the final query will need to do.

For example, if you are writing a report that multiples units by rate to come up with an Amount, and you want to test it, you don't run it for every row of data in your table, right? And that final report probably won't run for every row in your table, right? they both will be filtered.

Even though I said take it in steps and labelled the first as "step 1", again, it doesn't mean that you should execute that SQL for all of your rows -- how can you test it and verify the results? Run it for a small sample, check the results, make sure it works. Then plug it into the next SQL statement, again with proper filters, and run them together. use QA to help you tweak the query to see how it affects the execution plan.

What is the primary key of these tables by the way? I see indexes -- are those unique indexes?

Finally, one more thing you can do is to adjust the inner select (step 1) like this:


select t.*, (select Max(TimeStamp) from TraxData t2 
             where t2.Module = t.module-1 and 
                   t2.Customer_No = t.Customer_No and
                   t2.TimeStamp < t.TimeStamp
            ) as PrevTimeStamp,
           t.Module-1 as PrevModule
from traxData t

Note the subtle change -- this may allow for better index use on the table in the correlated subquery.

Either way, with your sample data and DDL, my example showed a big performance increase over what you started with.

Ken Blum
Constraint Violating Yak Guru

383 Posts

Posted - 2006-02-15 : 10:56:47

Thanks Jeff, & I totally agree with you. However I was curious to see the worst-case scenario and since I was leaving the office I thought I would just see what happens. I just ran the following SQL which would be a "real-world" situation, one module for the current month. Module 5 has 5,102 rows and module 4 has 3,589 rows for February.

It took over 4 minutes until I got rid of that Eager Spool thing in the Execution plan by setting the Where Clause filter order to correlate with the order of the columns in the composite index. So now it runs in 3 minutes and the Execution Plan says 92% is an Index Seek on my composite Index. I did not know that the order of the filter was relevant when attempting to have the SQL use a composite index. Now I do!

select t.*, (select Max(TimeStamp) from TraxData t2 
             where t2.Customer_No = t.Customer_No and
		   t2.Module = t.module-1 and
                   t2.TimeStamp < t.TimeStamp
            ) as PrevTimeStamp,
           t.Module-1 as PrevModule
from traxData t
WHERE t.Module = 5 and t.Customer_No = 12186 AND t.TimeStamp > '02/01/2006'

So I ran the Index Tuning Wizard on this query and it did not give me any new recommendations.

It appears to me that I have hit my limit in optimizing this query. I think the base problem is in the data, where module 4 is sending too much data. Sometimes every 10 seconds. I'm thinking of creating views which will create readings every 15 minutes, or maybe where the value of the reading changes by 5-10 degrees, and then running the SQL off of the views. Would you agree with this approach?

I really appreciate your time and expertise. Basically I wanted to make sure that I wasn't missing something obvious to a SQL Expert. Thanks again to both you and Tara.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources