Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 SQL Server Administration (2008)
 Adding a new node to a replication set

Author  Topic 

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2011-07-01 : 07:07:35
(Title is probably hopelessly misleading). Also this is *long* problem description with a quite simple question at the end. :)

Hi everyone. Looking for some high level dba help here.

I'm designing an (OLAP like) system with a pretty horrible performance umbrella requirement:

1) Rolling set of 250 (ish) million quotes (Airline Flight quotes) -- 10 days worth of quotes (Roughly)

2) Near Real time Updates (Merges really) -- From quote discovery we get new quotes constantly (millions per day) that have to be incorporated into the mart. (Mix of Updates and Inserts -- using MERGE seems to be a good solution)

3) Aggregation in any dimension. (A quote is from an airport, to an airport, on a departure date, with a return date, flown by a carrier, sold by a website, With a set number of hops). The *natural* key for a quote would be: FromAirport | ToAirport | Carrier | Website | Stops | DepartureDate | ReturnDate

Queries will be in the form of a filter set (which will be a number of different table valued paramaters) and a Request for SETS of facets rolled up with the minimum price for quote and the number of quotes that take that criteria.

It's really hard to explain in words so here is some SQL to illustrate a sample query:

SELECT
fa.[AirportName] AS [From Airport]
, ta.[CityName] AS [To City]
, a.[AirlineName] AS [Airlines]
, d.[DepartureYearMonth] AS [Departure]
, s.[Stops]AS [Stops]
, COUNT_BIG(*) AS [Facet Counts]
, MIN(s.[Quote]) AS [Min Quote]
FROM
tStar.tbQuotesStar (NOLOCK) AS s
JOIN tStar.tbDimFromAirports AS fa ON fa.[PK_AirportID] = s.[FK_dimFromAirportID]
JOIN tStar.tbDimToAirports AS ta ON ta.[PK_AirportID] = s.[FK_dimToAirportID]
JOIN tStar.tbDimDepartureDate AS d ON d.[PK_DepartureDateID] = s.[FK_dimDepartureDateID]
JOIN tStar.tbDimAirlines AS a ON a.[PK_AirlineID] = s.[FK_dimAirlineID]
WHERE
ta.[CityID] IN ('GLAS')
AND fa.[CountryID] IN ('UZ')
AND d.[DepartureYear] IN ( 2011 )
GROUP BY GROUPING SETS (
( fa.[AirportName] )
, ( d.[DepartureYearMonth] )
, ( ta.[CityName] )
, ( a.[AirlineName] )
, ( s.[Stops] )
-- , ( fa.[AirportName], ta.[CityName], a.[AirlineName] )
, ()
)
ORDER BY
[From Airport]
, [To City]
, [Airlines]
, [Departure]
, [Stops]


Run on a simplified star with 250 million (fake random) quotes we get this result set:


From Airport To City Airlines Departure Stops Facet Counts Min Quote
------------- --------- --------------------------------- --------- ----- ------------- ----------
NULL NULL NULL NULL NULL 5 13.8477
NULL NULL NULL NULL 0 1 13.8477
NULL NULL NULL NULL 1 3 152.538
NULL NULL NULL NULL 2 1 266.4022
NULL NULL NULL 1109 NULL 1 892.3342
NULL NULL NULL 1111 NULL 1 1511.6661
NULL NULL NULL 1112 NULL 3 13.8477
NULL NULL AEROMOST-KHARKOV NULL NULL 1 1511.6661
NULL NULL Air Wisconsin NULL NULL 1 892.3342
NULL NULL Jet4you NULL NULL 1 152.538
NULL NULL Valuair NULL NULL 1 13.8477
NULL NULL Winair/Windward Islands Airways NULL NULL 1 266.4022
NULL Glasgow NULL NULL NULL 5 13.8477
Andizhan NULL NULL NULL NULL 2 152.538
Fergana NULL NULL NULL NULL 1 1511.6661
Samarkand NULL NULL NULL NULL 1 266.4022
Tashkent NULL NULL NULL NULL 1 13.8477

I can read off the cheapest quote and the number of quotes matching filters for each group. GROUPING SETS are amazing!

4) All Queries need to come back in < 200ms

5) The solution needs to scale horizontally. We are anticipating very high concurrency for requests and we need to be able to add more capacity quickly.

-- This leads me to my actual question :)

For number (5) I'm thinking of having a pool of computers (using commodity hardware -- 2 SSD's plugged into the PCIE lanes with a decent commodity cpu and ram. I've seen demonstrations that get ~ 800MB/s throughput on random reads which I'm going to need (I can't organise the data because of the NRT requirement, this has to be an online star so the quotes will be more or less randomly scattered around the disks).

SSD's fail. Commodity hardware fails.

So I want redundancy with numbers of units and the ability to add more on demand / replace the ones that fail while disks are being reinstalled / whatever.

The real question........ :)
If I have these 'slaves' in some sort of transactional replication scheme taking changes from a master store (which would be a traditional active / passive cluster) how do I plug a new machine in and get it spooled up with the complete 250 million quotes set? Do I have to have a snapshot replication publication ready to go on demand *as well* as the replication publications.

Is there such a thing as replication multicasting?

I don't think I've described this very well so I hope you understand what I'm asking.

Help!
Charlie.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
   

- Advertisement -