Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
yonabout
Posting Yak Master
112 Posts |
Posted - 2005-12-13 : 07:18:19
|
| I'm developing a report application with a VB front end. part of it allows the users to create their own reports (kind of like ms query but specific to my company) without having to worry about joins etc. and part of it allows them to run their 'Standard' reports, most of which are stupidly complicated.There are 20 standards, and all of them are fine, except 3 which are based on the same selection.They all join on to a particular table which is currently around a million rows big, but which will grow to many times that.A 'Select * from' statement on this table takes around 20 seconds to return the data.Each of the reports join onto a bunch of other tables as well.The reports are written as stored procedures so they can accept variables passed in from the VB front end.As I've had to use ##tables to store the data so it can be accessed by VB recordsets, to get around the problem of users sharing the tables, I've made them unique by appending usernames and timestamps using dynamic SQL within the procedures.The reports each generate 8 slightly different selections based on the variables input into the procedure.When these reports are run, they take about 2 - 3 minutes on a good day, and up to 10 if the servers running slow.So far I've run the profiler and analysed the traces when running these reports, but the indexing suggestions only make a 1 percent increase in performance.I've tried separating the 8 sub queries out in case the server is trying to parse everything instead of just the one it needs to. I've done this by writing an individual procedure for each query, and calling sub procedures for each query, but the performance only increases by about 3 - 4 seconds.I've also thought about aliasing the tables, so instead of joining onto this massive table, they just join on to a selection of it based on the users inputs. I think this would work, but I'm still going to have the same problem if anyone uses parameters which are going to select the entire table anyway.Any Ideas?CheersYonabout |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-13 : 08:38:28
|
"A 'Select * from' statement on this table takes around 20 seconds to return the data."That's constrained by the time to physically shift the data from the server to your machine, and probably display it too, so don't worry about the "20 seconds" for that one.Please post the DDL of the tables in question, and if possible a few INSERT statements, a sample query [as constructed by your application] and then we can easily chuck that into SQL Server and "have a go" - and then make some fantastic suggestions which will cause your query to run in merely a few milliseconds Kristen |
 |
|
|
yonabout
Posting Yak Master
112 Posts |
Posted - 2005-12-13 : 10:48:28
|
OK, Thanks. This is a bit of a monster, so I'll post this - if you need more, please let me know.I'm guessing you can't attach text files, so here's the ddl of the largest table (from the execution plan, this is where 99 % of the resources are getting used)if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[InvestmentHistory]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)drop table [dbo].[InvestmentHistory]GOCREATE TABLE [dbo].[InvestmentHistory] ( [ParentInvestmentUID] [int] NOT NULL , [DateReceived] [datetime] NOT NULL , [ContributionTypeCatID] [int] NOT NULL , [ContributionReasonCatID] [int] NOT NULL , [TransactionTypeCatID] [int] NOT NULL , [TransactionEventCatID] [int] NOT NULL , [Amount] [money] NULL , [AmountCurrency] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [DateInvested] [datetime] NULL , [Units] [decimal](18, 6) NULL , [IsTxUID] [int] NULL , [RowIdentity] [int] IDENTITY (1, 1) NOT NULL , [WFCreatorID] [int] NOT NULL , [CreatorStepID] [int] NOT NULL , [WFOwnerID] [int] NULL , [OwnerStepID] [int] NULL , [CreationStamp] [datetime] NULL , [RecType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [AccountTransactionUID] [int] NULL , [TaxDate] [datetime] NULL ) ON [PRIMARY]GO This is what I've done in the procedures..The user is passing in variables from the front end - a from and to date, and contributionreason, transactiontype and transaction events. (The user selects these as text strings, but they get joined to a code table within the query as they are integers in the database).The dates are used to get the selection based on either the datereceived or dateinvested in the table. The other 3 parameters determine the rest of the selection. Because there are 3 parameters, there are 8 permutations of query i.e.1 0 0 1 1 01 1 11 0 00 1 00 1 10 1 00 0 1.There are 2 other parameters that get passed in, but these are to deal with the dynamic sql and creating user-unique ##tables. I'm going to give you an undynamic version, as the speed is pretty much the same either way.Here we go..Pass the parameters in and create the #table.@TempPersonTable char (50),@UserID char (20),@FromDate char (50),@ToDate char (50),@TransType char (50),ASBeginbegin/*Create a temporary table to hold the raw data extracted according to the criteria .*/Create table [dbo]. [#TransactionSelection] (Personuid int, NINO char(9), EERef char(20), surname char(40), inits char (5), Fund char(40), RecDate datetime, InvDate datetime, TransType char(50), TransEvent char(50), ContSource char(50), ContType char(50), Conts decimal(10,2), PayCentre char(100), Location char(100), Scheme char(100), Benefit char(100), FundManager char(100) )end Next there are a raft of if statements for each of the 8 permutations I mentioned above. I've included the first couple so you can see my logic.IF @Transtype <> 'All' and @TransEvent <> 'All' and @ContSrc <> 'All'begin Statement1endIF @Transtype = 'All' and @TransEvent <> 'All' and @ContSrc <> 'All'begin Statement2end etc.etc.Finally, here is a typical statement inserting data into the #table (there are 8 of these varying slightly dependant on the if statements. I'm not convinced that this is the best way to do it, but in my defence, the user can select 'All' as one of the inputs. As this doesn't actually exist in the database, I had to make sure they could select specific values or everything - I couldn't just say where Transevent = @this and ContSrc = @that because if @this and @that were 'All' they don't exist.Insert into #TransactionSelection (Personuid, NINO, EERef, surname, inits, Fund, RecDate, InvDate, TransType, TransEvent, ContSource, ContType, Conts, PayCentre, Location, Scheme, Benefit, FundManager) select p.Personuid, p.NationalIDNumber, e.employeereference, p.Surname, p.Initials, i.name, ih.datereceived, ih.dateinvested, d.longdesc as TransactionType, d1.longdesc as TransactionEvent, d2.longdesc as ContributionSource, d3.longdesc as ContributionType, isnull(sum(ih.amount), 0.00) as Conts, pc.name, l.name, s.name, b.name, fm.namefrom person p inner join employee e on p.personuid = e.personuid inner join schememember sm on e.employeeuid = sm.employeeuid left join location l on e.employeeuid = l.parentlocationuid left join paycentre pc on e.employeeuid = pc.parentpaycentreuid inner join scheme s on sm.schemeuid = s.schemeuid inner join memberbenefit mb on sm.memberuid = mb.memberuid inner join memberinvestment mi on mb.memberbenefituid = mi.memberbenefituid inner join benefit b on mb.benefituid = b.benefituid inner join investment i on mi.investmentuid = i.investmentuid inner join fundmanager fm on i.fundmanageruid = fm.fundmanageruid inner join investmenthistory ih on mi.memberinvestmentuid = ih.parentinvestmentuid inner join description d on ih.transactiontypecatid = d.parentuid inner join description d1 on ih.transactioneventcatid = d1.parentuid inner join description d2 on ih.contributionreasoncatid = d2.parentuid inner join description d3 on ih.contributiontypecatid = d3.parentuid There is another Join which improved the performance of the query. When the user selected the people they wanted to run the report across, the application generates a temporary table of UIDs associated with those people. Prior to this, I was selecting data for everyone, then cutting it down later. I've included it for completeness, but you can ignore it.inner join ' + @TempPersonTable + ' on p.personuid = ' + @TempPersonTable + '.personuid This query is for where 'All' has NOT been selected as an option, therefore the values in the variables are valid within the database.where (dbo.FuncRealDate(ih.datereceived) >= convert(datetime, @fromdate, 103) and dbo.FuncRealDate(ih.datereceived) <= convert(datetime, @todate, 103)) and d.longdesc in (@transtype) and ltrim(rtrim(d1.longdesc)) in ( @transevent) and ltrim(rtrim(d2.longdesc)) in ( @contSrc) The FuncRealDate is a function to strip timestamps from dates in the database because if a user wanted data <= 13/12/2005 a timestamped date like 31/12/2005 00:00:01 is classed as being after the 13th and isn't picked up.group by P.personuid,p.NationalIDNumber, e.employeereference, p.Surname, p.Initials, pc.name, l.name, s.name, b.name, i.name, fm.name, ih.datereceived, ih.dateinvested, d.longdesc, d1.longdesc, d2.longdesc, d3.longdesc There is other stuff in the procedure, theres a bit that pivots the data into something less normalised so the users can read it, but the section I've just laid out is the bit that's killing it.Thanks for your time.Yonabout |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2005-12-13 : 11:16:47
|
| Why the "group by"...you're not aggregating data are you?The use of the "funcrealdate" would be also a killer re using any suitable index.There are alternatives....search here (or mindsdoor.net) for examples (and discussions as to when/why not to use functions in SQL for performance) |
 |
|
|
yonabout
Posting Yak Master
112 Posts |
Posted - 2005-12-13 : 11:30:58
|
The group by is for the isnull(sum(ih.amount), 0.00) as Conts bit.I know that user functions can cripple things (because they aren't transactional?), but it was running like a dog before I added that bit - it was a recent addition in response to user comments (other than the ones threatening to kill me unless I speed it up!) |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-13 : 14:20:31
|
"Create table [dbo]. [#TransactionSelection] ... "Put a PRIMARY KEY on it (often makes a surprising difference to performance on #Temp Tables). Set any columns to NOT NULL that you can (helps the optimiser). For "best practice" put a COLLATE on all the CHAR/VARCHAR columns [just in case you run it on a server/TEMPDB with weird collation in the future]"IF @Transtype <> 'All' and @TransEvent <> 'All' and @ContSrc <> 'All'beginStatement1end"I would propose that each of these "Statement1" chappies is an EXEC to a separate Sproc. That way each child Sproc can get its own Query Plan that is not dependant on the execution path - I have no idea if this is important, but I've had good results with it in the past."from dbo.person p inner join dbo.employee e on p.personuid = e.personuid ...Put the OWNER on each table - supposedly increases the chance of the query plan being cached."where (dbo.FuncRealDate(ih.datereceived) >= convert(datetime, @fromdate, 103) and dbo.FuncRealDate(ih.datereceived) <= convert(datetime, @todate, 103))"Make @fromdate and @todate into datatype "datetime" so that there is no CONVERT in the WHERE. Shouldn't make a difference as it should be optimised out - but if the optimiser misses it it WILL make a difference. (If you need a CHAR parameter to the Sproc use a "working variable" within the Sproc to hold the converted values as a datetime, and use that in the WHERE clause)There is no such thing as a third parameter in CONVERTing to a datetime, so if you need to enforce the style of the date, in the conversion, use SET DATEFORMAT xxx...... CONVERT(datetime, @fromdate) ..."and d.longdesc in (@transtype)and ltrim(rtrim(d1.longdesc)) in ( @transevent) and ltrim(rtrim(d2.longdesc)) in ( @contSrc)"You can't use a variable in this construction to represent a comma delimited list of values - SQL is just going to do an EQUALS, so you might as well use "=" to give the optimiser a potentially better "route"(If you NEED to have the query work with a comma delimited set of values for any of these three parameters ask again and we can point you at how to "split" CSV variables).You only provided the DDL for the [InvestmentHistory] table, so I don't know what the characteristics ofd.longdesc, d1.longdesc, d2.longdesc, d3.longdescare, but they look fairly big, and thus probably "CPU intensive", in a Group By. You might want to consider just storing enough PK information [and the SUM] in your Temporary Table so that you could then JOIN it back to the "details" tables to get the final output data (which might also reduce the number of tables you need in your initial query, as the WHERE only covers ih, d, d1 and d2 - and that appears to make about half the JOIN'ed tables redundant (unless they are there to enforce some referential integrity)Finally, make sure that BOTH sides of each JOIN have indexes (ideally clustered indexes) ... so:from person p inner join employee e on p.personuid = e.personuid needs indexes onp.personuidand e.personuidto help the optimiser.Kristen |
 |
|
|
yonabout
Posting Yak Master
112 Posts |
Posted - 2005-12-14 : 04:58:36
|
| Cheers,I have tried splitting the statements out into individual sprocs before, but I've never set the temp tables up with keys or constraints, and I never knew that putting the owner in front of the table name could make a difference, so I'll see what happens.Thanks for your timeYonabout |
 |
|
|
|
|
|
|
|