Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Unique key algorithm

Author  Topic 

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-10-11 : 19:19:04
This has become a pain. I have a date time value (16 chars)

for example...
2001100113450134

bueatiful date time hey? This field is unique and it makes a great natural key. However an application being used can only accept a 32 bit integer for it's key. Does anybody have any clue what type of algorithm could be used to take a datetime stamp and turn it into a 32 bit integer. This process should be reversable so that you could come up with that date time from that integer. Any ideas at all?

-----------------------
SQL isn't just a hobby, It's an addiction

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-10-11 : 20:12:07
Well, if you could get by with 1 second accuracy, AND your date ranges fall within a 136-year span, you could've used DateDiff to find the difference in seconds between the date and a base date:

SELECT DateDiff(second, '19000101 00:00:00', CONVERT(datetime, '20011001 13:45:01')) AS IntKeyValue --this is missing the last two digits of the original (34)

You'd probably have to pick a base date close to the average date value, something that would allow for both negative and positive differences, in order to get the most out of the capacity of an int (+/- 2 billion). The problem is that the date value you have seems to go to 1/100th of a second, and it greatly limits the date ranges you can work with (1.3 years)

That's the best I can come up with. Why does the application HAVE TO have a 32 bit integer key? If so, why are you trying to make these values fit into that kind of design.

Go to Top of Page

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-10-15 : 12:56:17
Requirement of 3rd party app. Stupid thing don't work for a 16 byte and needs 32bit (4byte). I think I've got the coding... I'l post it up here (or in the scripts section) when it's complete

-----------------------
SQL isn't just a hobby, It's an addiction
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-10-15 : 16:42:42
No, Rob's right. If you need to encode a date to a precision of 0.01 seconds then 4 bytes will buy you 497 days. Unless you can exclude a large (and regular) proportion of times, or don't need that much precision, or can live with a hashed value (which won't be reversible) and the chance of a collision, then that's the best you can do. And if you don't believe me, ask Claude -- actually, don't bother, he died in February last year... over 497 days ago.




Edited by - Arnold Fribble on 10/15/2002 16:43:18
Go to Top of Page

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-10-16 : 11:37:17
Well, after reading this a couple hundred time I finally understand it. I guess the 2 way translation isn't needed, but that doesn't help much... Theres still a chance of collisions if I hash it. I got it down to 46 bits... about 556 billion different combinations seem to be possible.

I guess hashing would be the way to go.. Although I don't particularly enjoy the off chance of collision.

Little bit further explanation (I'm finding it out as I go)... The .01 seconds part is how the record input into this IDMS were made unique. If 5 records were inserted at the same time, one would get .01, then .02, then .03 and so on. Little bit screwy... But I have some identical dates with only the milliseconds are different. I'll have to keep that in consideration when I create a hash formula... no?

If you have anymore info/advice on the subject, I'd be very appriciative to hear it.

-----------------------
SQL isn't just a hobby, It's an addiction
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-10-16 : 12:17:25
Here's some questions about the date values you ACTUALLY have: what are the minimum and maximum date values you have now, and how far into the future could the max date extend? What about the timestamps for this stuff? Does it span a 24-hour clock or is it totally concentrated over business hours, or thereabouts? Same thing about business days, do these dates include weekends?

I have a (sort of) algorithm I'm looking at that just makes it, but it requires some compromises on the range of dates (8 year time period max, 16 hour day max, etc.) and it limits how many of those 1/100 second values you can have...since these don't actually reflect a timestamp they can be safely reduced...I think????

Go to Top of Page

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-10-16 : 12:36:43
Well, heres what I've got so far. I brok the date down into sets of 2
20 01 10 01 13 45 01 34
These are the maximum digits
Century - 19 or 20. Can be represented with 1 bit (0 or 1)
year - 89 through 20(can be represented with 5 bits. Range of 32)
Month - 12 monthes can be represented by 4 bits
Day - 31 days max... 5 bits
Hour - 24 hours, 5 bits
Minutes - 60 - 6 bits
Second 60 - 6 bits
miliseconds - 1 through 99 = 7 bits.

1+5+4+5+5+6+6+7 = 39 bits


hmm, down to 39 bits now. But after that I'm kinda screwed. I'm wondering if theres some way of combining minutes and seconds (2 6 bit fields) into one with a max of 120 (7 bit). That would help somewhat.

-----------------------
SQL isn't just a hobby, It's an addiction

Edited by - m.e. on 10/16/2002 12:54:39
Go to Top of Page

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-10-16 : 12:54:01
Little bit more.

Maximum date would be
2020123123595999

Broken to binary

20 = 1
20 = 00000 -what I did here to conserve numbers.. I'll explain at bottom
12 = 1100
31 = 11111
23 = 11001
59 = 111011
59 = 111011
99 = 1100001

What I did for year is 17 = max and 89 (our earliest year) is 11111

all together thats 100000110011111110011110111110111100001
this is 28,184,5049,761

Our minimum is
1989010101000000

19 = 0
99 = 10101
01 = 0001
01 = 00001
01 = 00001
00 = 000000
00 = 000000
00 = 0000000

note, because I went backwqards with the year the lowest number will be 1999 (010101). 1989 would be 011111

combined is 010101000100001000010000000000000000000
180,942,798,848


That leaves a range of 100,902,250,913

Still 25* more then what I need. Getting alot closer though.





-----------------------
SQL isn't just a hobby, It's an addiction
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-10-16 : 14:13:04
You're not going to be able to squeeze minutes and seconds together; to represent every combination of 0-59 for seconds and 0-59 for minutes, you need to be able to express 3600 distinct combinations. 3600 in binary = 111000010000, or 12 bits. If you combine mintues, seconds, and hours, 24*60*60 = 86400 = 10101000110000000, still 17 bits (though *close*).

As Rob pointed out, if you can eliminate some hours from the clock, you can save bits. For instance, if you fit everything into an 18 hour day, 18*60*60 = 64800 = 1111110100100000, or 16 bits, saving 1 bit.

However, why not just use an identity column rather than insisting on a natural key?

-b

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-10-16 : 15:00:49
quote:
However, why not just use an identity column rather than insisting on a natural key?
Because he is a conscientious and considerate database developer trying to maintain integrity in his data and database design.

Not to beat a dead horse, but that question is EXACTLY what pisses me off about database developers today. They reach for an easy out that not only doesn't solve their problem, but introduces new ones without their knowledge.

Suppose you used an identity column. How do you relate that identity value to the existing datetime value? You'd end up having to store the datetime, which is already unique, and then add ANOTHER unique column with totally arbitrary, meaningless values. Frankly I don't see any logic to that at all, and I don't think that accommodating the limitations of the program are a valid reason either. And without a unique index on the datetime column, nothing would prevent you from inserting the same rows multiple times.

I'm not aiming this at you personally, but that's why identity is such a bad thing: it makes people STOP THINKING, and prevents them from STOPPING AND THINKING.

Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-10-16 : 15:03:37
Claude E Shannon! You [M.E.] don't give up, do you?

Ignoring the years:
quote:

Month - 12 monthes can be represented by 4 bits
Day - 31 days max... 5 bits
Hour - 24 hours, 5 bits
Minutes - 60 - 6 bits
Second 60 - 6 bits
miliseconds - 1 through 99 = 7 bits.


By your reckoning:
4+5+5+6+6+7 = 33 bits
i.e. you've already spent more than 32 bits on one year.
The best you can do is to encode the time as a number of milliseconds which will cost:
SELECT LOG(100.0*60*60*24*366) / LOG(2)
= 31.558 bits
for a leap year.




Edited by - Arnold Fribble on 10/16/2002 15:07:46
Go to Top of Page

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-10-16 : 15:54:32
Heh, go Rob.

Aiken, the reason is simple. I have 2 tables coming from flat files. These 2 tables only have one thing in common. 2 guesses what that might be. Our entire database here is designed on natural keys (and thats brought up alot of discussion/conflict between the database designers and the developers). No use/need for an identity here.

Arnold.. I gotta ask.. who the $%@!# is claude e shannon? Heh, and no way do I ever give up. Well, I will if someone buys me a .

Nice use of a log function. I kinda forgot that knowledge with the rest of my university math years ago. I'm now thinking turning the minutes,seconds, and milliseconds into just milseconds and storing like that. As for hours, I think I can store it as 4 bit. Great, Now I get to make new calculations. That would be 19 bits for minutes on. + 4 for the hour + 5 for the day and +4 for the month. That would be 28 bits. wait a tick... somethings gotta be wrong with that calculation. Thats what... 32 bits... aww damnit + my 4 bits year... Y'know, never had 4 1's and 0's been such a big pain in the ass

-----------------------
SQL isn't just a hobby, It's an addiction

Edited by - m.e. on 10/16/2002 15:56:46
Go to Top of Page

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-10-16 : 16:27:20
Aww damnit... Apparently people here work 24/7 and on holidays. I got an entry in here for decemeber 25th at 7 in the morning. Theres a few right around newyears eve too. It's like theres a contest for first entry of the millinium. Geez, I'm suddenly scared of my co-workers

-----------------------
SQL isn't just a hobby, It's an addiction
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-10-16 : 17:02:00
Bell Labs mathematician, juggler and unicyclist. Invented information theory in 1948.


Go to Top of Page
   

- Advertisement -