Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
dmiller
Starting Member
6 Posts |
Posted - 2011-12-24 : 01:18:36
|
Hi forum,New poster but I have read a lot from here will certainly try and give back what I have learned as I can!I am pretty studied up on normalization...although the more complicated aspects of it kind of make my head spin (BNCF seems like something simple to explain that has been severely overly complicated)Anyway, to my question. I am working on database to store legislative records and the votes on bills. For example, Congress has all these bills and at least 538 potential votes on any bill. (435 in house, 100 in Senate). There can be dozens of votes on any given bill and thousands of bills per year. The system I have now has like 15 tables capturing all sorts of goodies about the bills, from the actual text to a brief history in the life of the measure. The vote tables, as they are laid out now, include one votesMaster table which produces a unique, auto inc key, and other vital information such as the date, tally, description, etc. The voteID is used in two other tables - votesHouse and votesSenate - one table for each chamber. I also have 2 tables - infoSenators and infoReps - which list every member in both chambers, also with an auto inc p key. Now for the heart of my question: I am trying to figure out the best way to lay out the database which holds the actual roll calls - how everyone voted. I have kicked the ideas around in my head so many times and I am getting nowhere, and as Google did not shine any light on the subject, so I thought I would ask the wisdom of the intertubes...Should I lay the table out as RollID | VoteID | 1 | 2 | 3 | 4 | 5 ... n1 | 1 | y | n | y | y | y ... Where each column name refers to the senatorID or repID? Remember this is not a fixed number of legislators. Next year I might need to add 50 new names and 50 new columns. This seems to violate all sorts of principals of not making lists out of columns (eg order1, order2, etc). Plus it is complicated by abstracting out the name of the legislator.Option 2 - Similar, but use actual names...RollID | VoteID | Yeager | Sununu | Reid | Someguy ...1 | 1 | y | y | y | y you get the idea... and then make the name (maybe the year and the name) as the primary key in the senators and reps tables, thereby removing the abstraction. Same problems, but more readable. Option 3 - Treat each vote as an individual entry:RollID | VoteID | LegID | Vote1 | 1 | 1 | Y et cetera.This looks a lot nicer. The Vote is an enum of about 6 poss values. But could this get too big? We are talking a LOT of entries - being as I want to go back like 15 years with this database and leave room for the future. A rough calc reveals we are talking about 25 million entries AT LEAST! for this sort of thing. This seems to follow the rules a lot better, and if everything is done right then it should still be pretty fast to operate.... However, keeping all legislators in a row would cut down on the number of entries by a factor of about 500. What does the board think, and does anyone have any thoughts on a better way to do this? Thanks a bunch! |
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-12-24 : 02:58:17
|
i would go for certainly option 3 as the number of voters is not certain and keeps on changingI would keep a separate table for Voters with fields VoterID,VoterName,Active and any other attributes that can be associated with a Voter. and VoterID will be linked via FK to this table from your RollCall table. If Voter is expelled, I can even represent that in Voters table by changing Active bit to 0.In case you dont need a seperate table you can link VoterID from RollCall to votesHouse and votesSenate tables themselves but in that case you have to have two id columns and link them via FK to reelevant tables to maintain referential integrity and in addition you should have a check constraint to make sure at least one of them is NOT NULL------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
dmiller
Starting Member
6 Posts |
Posted - 2011-12-24 : 03:51:13
|
Great! Honestly, when I actually wrote out the three schemes I did immediately see the benefits of using option 3. I was worried about the size of the db but when i calc'd it all up, it looks like it will only come out to be around 54 MB. Easily within modern db engine capabilities.The active voter idea is good, actually. This is another thorny issue. I need to know what year each legislator is active. I thought about a low and high range, or specific date...Perhaps an additional table which lists the year/session, voterID, active as boolean, to determine who a roll call list is referencing when a name is given (2011-smith-james or 2011-smith-james-jr). It's either do that or put a low and high date (took office, left office) column in the voters table. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-12-24 : 04:08:28
|
quote: Originally posted by dmiller Great! Honestly, when I actually wrote out the three schemes I did immediately see the benefits of using option 3. I was worried about the size of the db but when i calc'd it all up, it looks like it will only come out to be around 54 MB. Easily within modern db engine capabilities.The active voter idea is good, actually. This is another thorny issue. I need to know what year each legislator is active. I thought about a low and high range, or specific date...Perhaps an additional table which lists the year/session, voterID, active as boolean, to determine who a roll call list is referencing when a name is given (2011-smith-james or 2011-smith-james-jr). It's either do that or put a low and high date (took office, left office) column in the voters table.
Ok one question hereAre you interested in retrospective analysis of when all a particular senator was inactive or are you interested only in current status?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
dmiller
Starting Member
6 Posts |
Posted - 2011-12-24 : 13:28:02
|
Retrospective primarily. I'll be parsing the source web files going backwards about 10 years. It also needs to be able to go forward without much difficulty - say I start to parse the bill information now, in the middle of this particular Congress's life, then every bill gathered which has not been passed and made into settled law can have future votes and amendments, etc. So it needs to be progressive as well to pick up midway through a session, even going so far as to gather more information in real time or on a daily basis. That make sense? |
|
|
|
|
|
|
|