(Title is probably hopelessly misleading). Also this is *long* problem description with a quite simple question at the end. :)Hi everyone. Looking for some high level dba help here.I'm designing an (OLAP like) system with a pretty horrible performance umbrella requirement:1) Rolling set of 250 (ish) million quotes (Airline Flight quotes) -- 10 days worth of quotes (Roughly)2) Near Real time Updates (Merges really) -- From quote discovery we get new quotes constantly (millions per day) that have to be incorporated into the mart. (Mix of Updates and Inserts -- using MERGE seems to be a good solution)3) Aggregation in any dimension. (A quote is from an airport, to an airport, on a departure date, with a return date, flown by a carrier, sold by a website, With a set number of hops). The *natural* key for a quote would be: FromAirport | ToAirport | Carrier | Website | Stops | DepartureDate | ReturnDateQueries will be in the form of a filter set (which will be a number of different table valued paramaters) and a Request for SETS of facets rolled up with the minimum price for quote and the number of quotes that take that criteria.It's really hard to explain in words so here is some SQL to illustrate a sample query:SELECT fa.[AirportName] AS [From Airport] , ta.[CityName] AS [To City] , a.[AirlineName] AS [Airlines] , d.[DepartureYearMonth] AS [Departure] , s.[Stops]AS [Stops] , COUNT_BIG(*) AS [Facet Counts] , MIN(s.[Quote]) AS [Min Quote]FROM tStar.tbQuotesStar (NOLOCK) AS s JOIN tStar.tbDimFromAirports AS fa ON fa.[PK_AirportID] = s.[FK_dimFromAirportID] JOIN tStar.tbDimToAirports AS ta ON ta.[PK_AirportID] = s.[FK_dimToAirportID] JOIN tStar.tbDimDepartureDate AS d ON d.[PK_DepartureDateID] = s.[FK_dimDepartureDateID] JOIN tStar.tbDimAirlines AS a ON a.[PK_AirlineID] = s.[FK_dimAirlineID]WHERE ta.[CityID] IN ('GLAS') AND fa.[CountryID] IN ('UZ') AND d.[DepartureYear] IN ( 2011 )GROUP BY GROUPING SETS ( ( fa.[AirportName] ) , ( d.[DepartureYearMonth] ) , ( ta.[CityName] ) , ( a.[AirlineName] ) , ( s.[Stops] ) -- , ( fa.[AirportName], ta.[CityName], a.[AirlineName] ) , () )ORDER BY [From Airport] , [To City] , [Airlines] , [Departure] , [Stops]
Run on a simplified star with 250 million (fake random) quotes we get this result set:From Airport To City Airlines Departure Stops Facet Counts Min Quote------------- --------- --------------------------------- --------- ----- ------------- ----------NULL NULL NULL NULL NULL 5 13.8477NULL NULL NULL NULL 0 1 13.8477NULL NULL NULL NULL 1 3 152.538NULL NULL NULL NULL 2 1 266.4022NULL NULL NULL 1109 NULL 1 892.3342NULL NULL NULL 1111 NULL 1 1511.6661NULL NULL NULL 1112 NULL 3 13.8477NULL NULL AEROMOST-KHARKOV NULL NULL 1 1511.6661NULL NULL Air Wisconsin NULL NULL 1 892.3342NULL NULL Jet4you NULL NULL 1 152.538NULL NULL Valuair NULL NULL 1 13.8477NULL NULL Winair/Windward Islands Airways NULL NULL 1 266.4022NULL Glasgow NULL NULL NULL 5 13.8477Andizhan NULL NULL NULL NULL 2 152.538Fergana NULL NULL NULL NULL 1 1511.6661Samarkand NULL NULL NULL NULL 1 266.4022Tashkent NULL NULL NULL NULL 1 13.8477
I can read off the cheapest quote and the number of quotes matching filters for each group. GROUPING SETS are amazing!4) All Queries need to come back in < 200ms5) The solution needs to scale horizontally. We are anticipating very high concurrency for requests and we need to be able to add more capacity quickly.-- This leads me to my actual question :)For number (5) I'm thinking of having a pool of computers (using commodity hardware -- 2 SSD's plugged into the PCIE lanes with a decent commodity cpu and ram. I've seen demonstrations that get ~ 800MB/s throughput on random reads which I'm going to need (I can't organise the data because of the NRT requirement, this has to be an online star so the quotes will be more or less randomly scattered around the disks).SSD's fail. Commodity hardware fails.So I want redundancy with numbers of units and the ability to add more on demand / replace the ones that fail while disks are being reinstalled / whatever.The real question........ :)If I have these 'slaves' in some sort of transactional replication scheme taking changes from a master store (which would be a traditional active / passive cluster) how do I plug a new machine in and get it spooled up with the complete 250 million quotes set? Do I have to have a snapshot replication publication ready to go on demand *as well* as the replication publications.Is there such a thing as replication multicasting?I don't think I've described this very well so I hope you understand what I'm asking.Help!Charlie.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION