SQL XML optimization - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

SQL XML optimization

Author

Topic

Maleks
Starting Member

10 Posts

Posted - 2008-10-28 : 06:26:27

Hey there,

I've recently come across a bit of a problem using stored XML in SQL for an application. We needed a way to store differing amounts of form fields so settled on XML.

The XML follows the form:
<details>
<detail mapping="reference" label="Invoice Number">
<value>5675678</value>
</detail>
<detail label="Other number">
<value />
</detail>
<detail mapping="name" label="Name">
<value>Test Testor</value>
</detail>
<detail mapping="address" label="Address">
<value>12345 Test St</value>
</detail>
<detail mapping="postcode" label="Postcode">
<value>987 123</value>
</detail>
<detail label="Telephone Number">
<value>9876789</value>
</detail>
<detail mapping="email" label="Email">
<value>test@test.com</value>
</detail>
</details>

The problem naturally arises with very slow query times when cherry picking values from the XML with .value and also using these in where clauses.

The typical .value statement looks something like:
Details.value('(/details/detail[@mapping="email"]/value)[1]', 'varchar(255)')

I've done a bunch of research and found and implemented the following:

- Typing the XML (i.e using a schema). Resulted in speed increase
- XML index on primary and secondary on path. Resulted in speed increase
- Creating computed columns (based on functions returning values from the XML) and indexing these columns. Obviously querying these columns instead of using .value . Resulted in a dramatic decrease in speed
- Reducing calls to .value evaluation by including a sub query in the where clause and then refering to its results. Resulted in no noticable shift in speed

Some things which may increase speed but can't do:
- Using less generic node structure. So instead of <detail mapping="email"> it would be <email>
- Shifting XPath clauses [] to the end of the evaluations

This has the potential to be quite a useful discussion so please chime in and help out if you can.

Thanks,
Alex

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-28 : 07:58:50

XML queries can be faster when using CROSS APPLY.

Try this

DECLARE	@xml XML

SET	@xml =	'<details>
			<detail mapping="reference" label="Invoice Number">
				<value>5675678</value>
			</detail>
			<detail label="Other number">
				<value />
			</detail>
			<detail mapping="name" label="Name">
				<value>Test Testor</value>
			</detail>
			<detail mapping="address" label="Address">
				<value>12345 Test St</value>
			</detail>
			<detail mapping="postcode" label="Postcode">
				<value>987 123</value>
			</detail>
			<detail label="Telephone Number">
				<value>9876789</value>
			</detail>
			<detail mapping="email" label="Email">
				<value>test@test.com</value>
			</detail>
		</details>'

SELECT		d.a.value('@mapping', 'nvarchar(max)') AS Mapping,
		d.a.value('@label', 'nvarchar(max)') AS Label,
		v.e.value('.', 'nvarchar(max)') AS Value
FROM		@xml.nodes('/details/detail') AS d(a)
CROSS APPLY	d.a.nodes('value') AS v(e)


Mapping		Label			Value
--------	----------------	-------------
reference	Invoice Number		5675678
NULL		Other number		
name		Name			Test Testor
address		Address			12345 Test St
postcode	Postcode		987 123
NULL		Telephone Number	9876789
email		Email			test@test.com

E 12°55'05.63"
N 56°04'39.26"

Maleks
Starting Member

10 Posts

Posted - 2008-10-28 : 08:45:18

Cross apply looks interesting. Thanks.

I realise I should probably expand on the table and query details.

The table looks like:

Payment
----------------
Id INT,
Date SMALLDATETIME,
User_Id INT,
Amount DECIMAL(18,2),
Details XML

The query as is looks like:

SELECT
Payment.Id,
[Date],
User.Email,
[User_Id],
Details.value('(/details/detail[@mapping="email"]/value)[1]', 'varchar(255)') AS Details_Email,
Details.value('(/details/detail[@mapping="name"]/value)[1]', 'varchar(255)') AS Details_Name,
Details.value('(/details/detail[@mapping="reference"]/value)[1]', 'varchar(255)') AS Details_Reference,
Amount
FROM
Payment LEFT OUTER JOIN
User ON Payment.User_Id = User.Id
WHERE
[Date] >= '01/01/1900'
AND [Date] < '01/02/2079'
AND Payment.Deleted = 0
AND (
Payment.Details.value('(/details/detail[@mapping="email"]/value)[1]', 'varchar(255)') LIKE '%est%'
OR Payment.Details.value('(/details/detail[@mapping="name"]/value)[1]', 'varchar(255)') LIKE '%est%'
OR Payment.Details.value('(/details/detail[@mapping="reference"]/value)[1]', 'varchar(255)') LIKE '%est%'
)
ORDER BY
Details.value('(/details/detail[@mapping="name"]/value)[1]', 'varchar(255)') DESC

Your help is much appreciated

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2008-10-29 : 07:59:22

Generally these generic solutions are always a bit crap though. It's not really how relational databases are designed to work.
Did you consider storing the values in named, indexed columns instead of XML? If you have a static structure then this is the way to go. If not then maybe try having a bunch of different string/int/date columns and put your XML values in there. You can then use your XML schema/meta data to work out which columns refer to which tags in your XML and form a query accordingly.

Maleks
Starting Member

10 Posts

Posted - 2008-10-29 : 08:18:25

Yeah it's definitely not an ideal solution. We used it because of the age old problem that the data collected from implementation to implementation may well vary.

The added hitch is that this system is used across multiple applications and multiple sites so major restructuring in the DB and business layer will be one heck of a headache.

The solution we have settled on:

Creating computed columns for key bits of data based on functions which return a specific value from the XML using something like:

Details.value('(/details/detail[@mapping="email"]/value)[1]', 'varchar(255)')

These computed columns are then indexed and persisted. They are never updated.

Does anyone know of any major drawbacks to this method?

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources