MANY tables (50k+) or MANY rows

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

MANY tables (50k+) or MANY rows

Author

Topic

WastelandWanderer
Starting Member

3 Posts

Posted - 2009-03-19 : 18:26:07

Project Summary:
Create a database (or multiple databases) to handle:
- 100k+ user ids with many columns (200+) of frequently changing data
- very VERY large scale grid-style terrain info that frequently changes
- +/- 25M locations possible, 8M likely
- each location could have many objects (500+)

Typical action by client:
Move to next location.
Server sends overall general terrain data first.(I got this part)
Server then needs to send specific object data.(unknown)

There is a built-in client lag of 10-40 seconds before the results need to be there.

For the questions below, assume that only the first column in a table will ever be used to pull data. That data will be parsed to pull more data from another table using only the first column.

Questions:
1. Is it better to store 250,000 unique users in one table, several tables [name starts with 0-9] [a-m] [m-z], or many tables [a-e][s,t] etc.? Keep in mind that user data will change frequently (every minute or less if active user.)

Let's take it up a notch:

2. Is it better to have 3,000,000 tables with 9 rows of frequently changing data, or 150,000 tables with 225 rows?

3. Would creating more columns in a table be better (90+ columns)?

Thanks in advance for your input.

So much programming, so few years.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-19 : 21:14:22

Briefly:

1. One table, unless you provide a lot more detail on these updates.
2. Neither option is sane, much less reasonable, and nowhere near practical.
3. "Better" depends on the details of the updates.

This sounds like a game or simulation of some kind. If so, I don't think a relational database would be a good storage solution for you, at least not with the requirements you're posting. More details would help.

WastelandWanderer
Starting Member

3 Posts

Posted - 2009-03-19 : 23:02:41

Hey, Robvolk. Yes, the database will be supporting an online game that's mostly text-based with a very stripped down GoogleMaps type of terrain navigation. Just about all of the calculating of what is needed is done through a php interface that is passed values by a javascript client. The numbers really ARE that crazy, though I could scrub some at the expense of the size of the "world."

I'm estimating the DB size to be maybe 20gig after a few months and 50,000 users (should I be that successful.)

A user will have the following columns:
id,name,picURL,(stats-combined),LOCation,inventory,terrain MX[0001]...[nnnn] meaning 1 terrainMX per each grid (minimum of probably 100 and I could compile all of that into just one column)

The stats(25 delineated integers) could update frequently in combat (every 10 seconds), the inventory (large string), LOC (5 char string) and terrain MX (25x3digit delineated hex) could update on average every 30 seconds (depending if the character is moving and how fast)

The smallest terrain unit is a "parcel" which contains the only object data. 9 parcels to a block (3x3), 25 blocks(max) (5x5) per grid (1 sq mile),and however many grids I feel like designing per zone, and 4 zones with option for more.

I know the DB can handle the volume of data storage, and I have done my best so far to condense the data, leaving the client to decode and array the results. My main concern is how quickly could the engine search through hypothetically 1 million+ rows to find the one with the ID requested by the user. Then, if that parcel data needs to get updated (a person enters, leaves, drops an item, picks one up, etc), and if the table locks, with 100,000+ users pulling data and doing the same from the same table, I was thinking I should make more tables instead of more rows.

With the built-in lag time between parcels (3mph walk speed - 50 mph car) there should be (I'm hoping) enough time to search through the parcel list to get to the data in spite of the locks. I just didn't know if the engine would respond better to looking for table names, or row IDs so I can snatch the data and be lock-free asap.

The expected parcel data return would be 100% of 8 columns, with 1 for the ID:
[id] [squads] [persons] [npcs] [animals] [eqpt] [vehicles] [structures] [misc]
Each column would have a delineated list of the the IDs of objects of that class, which I could put a limit on if for some reason, 4,000 people showed up in a 352'x 352' area.
Those lists would be parsed and looked up in other object data tables to get the object's values and the final result to be DOMed and sent to the client to make sense of it.

A parcel data write would be 1 item added to or taken from one of the column lists. So that brings another question:

Does the lock take place the second the engine begins looking through the table to write, or at the point it has found what it needs and is preparing to write?

Oh, and write requests will have to wait in line like read requests.

If using a relational database is not the answer, I really don't know what else would be, and I've certainly never been accused of being sane :) For a little bit of personal trivia, this is my first experience with mySQL and I'm learning as I go. This is week 2.

No matter what type of system I use, at some point, there's going to be a list of around 150,000 items that will need to be looked through, whether filenames in a directory, tags in XML, or rows in an ID column.

Thanks again for your assistance.

So much programming, so little time.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-19 : 23:51:12

The closest database application I can think of to what you're describing is the original TerraServer app:

http://terraserver-usa.com/about.aspx?n=AboutBody

I'd suggest reading through the whitepapers and documents. It's very informative and interesting and may provide ideas on how to implement your game. However, it's important to recognize that TerraServer is NOT a typical relational database application, and while they made it work, it has a lot of big iron behind it.

One thing I can tell you is, if you're expecting your game to be responsive, then physical disk I/O is your enemy, and with the numbers you're posting, there's no DB design I can think of that would be helpful. If you can't fit all or most of your data in RAM at one time, it's likely going to crawl. Like TerraServer, you're going to need a server farm or one frickin' huge honking box with lots o' RAM.

If you do go with the DB anyway, fewer tables are better for you, and here's why: SQL Server stores objects on 8K pages, and does I/O on them in 64K extents (8 * 8K). The minimum storage used for 1 table is 1 page, so if your table has less than 8K of data, it's wasted. Packing more rows into fewer tables makes better use of memory and disk.

And unless you need to save/persist user data, like a saved game, I don't see any need for a database at all...your game engine has to work with this data constantly, there's no reason for it to save to disk for every item that changes. I've never designed a game so I could be wrong, but I do know that high-performance game programmers (Carmack, the Blizzard crowd) are ruthless in efficient resource use and eliminating even the slightest bottlenecks. I'd hazard a guess that they never write to disk except to save a game, and the only other write activity is memory paging, which the OS would handle.

Sorry I couldn't be more encouraging. Good luck though, hopefully I'm totally wrong and you make this thing FLY.

edit: I missed the part that said you're using MySQL. So the 8K page of data probably wouldn't apply, I don't know MySQL internals at all. I'm fairly confident the rest will still apply though.

Since you're already formatting/serializing your data as small as possible, if you keep your IDs indexed and minimize the writes, then it may work fine, although I doubt it will be ideal from a programming perspective. Doing a lot of SELECT and UPDATE stuff carries a lot of overhead that regular arrays, mem-maps and structs don't.

WastelandWanderer
Starting Member

3 Posts

Posted - 2009-03-20 : 00:29:36

Thanks again, robvolk. Hmmm. I'll just have to compress the data more. I'm hoping the programmed lag-time will slow down server requests to where it's managable. I am just hard-headed enough to try it a little at a time and see how well it does. At the worst, it'll be an interesting experiment. I've seen some BBSs that use SQL databases that are quite large, and the turnaround is very quick, though the numbers are less.

Since the game is web-based, I can't rely on the client to have more than let's say 100 meg avail, and I certainly don't want to use that much. The users' characters will remain active in the game even if they are not present, so it's a persistent environment. The hosting server at the moment is shared, but I plan to migrate to my own server by fall for open beta. I'll post some updates as to how this SQL implementation performs for inquiring minds. For now though, I'll go read about TerraSever.

I hope you are wrong too, but it sounds like you really know your stuff, and I tend to be the square peg in the round hole + hammer type. Also, I have no illusions about being the next gaming sensation. I just want to get this idea on screen.

Thanks again for all!

So much programming, so little time.

darkdusky
Aged Yak Warrior

591 Posts

Posted - 2009-03-20 : 05:33:45

Table partitioning would help spread the load of large tables across multiple disks.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html

enmadrid
Starting Member

5 Posts

Posted - 2009-03-20 : 15:31:38

This sounds awesome! Keep us posted.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources