Author |
Topic |
rtpnc
Starting Member
19 Posts |
Posted - 2007-09-11 : 14:36:09
|
Well, we have run accross a tab delimited file that has multiple fields spread through 4 lines of data. Just wondering if someone could give me an example of how to extract data from each line. Or if this is even possible with dts.01000ALAMANCE 2005CR 058991PRIDE,TERMONE,O'BRIAN 716 DURHAM ST BURLINGTON NC272170000BM19870506*****3514 20070122WW 20070119 TPBM3514O-000 03012212FBREAKING AND OR ENTERING (F) 14-54(A) 2212FBREAKING AND OR ENTERING (F) 14-54(A) 20050927 03022356FLARCENY AFTER BREAK/ENTER 14-72(B)(2) 2356FLARCENY AFTER BREAK/ENTER 14-72(B)(2) 20050927 04LLOYD,ROGER SFF Thank you. |
|
rmiao
Master Smack Fu Yak Hacker
7266 Posts |
Posted - 2007-09-12 : 23:51:22
|
You can specify proper end of line character. |
|
|
rtpnc
Starting Member
19 Posts |
Posted - 2007-09-14 : 10:16:28
|
I'm sorry but could you help me out a little more. Each end of line has a crlf, but how deos this help?Explain more detail please.Thank you. |
|
|
rmiao
Master Smack Fu Yak Hacker
7266 Posts |
Posted - 2007-09-15 : 00:40:10
|
Which tool do you use to load data from file? |
|
|
rtpnc
Starting Member
19 Posts |
Posted - 2007-09-17 : 16:31:53
|
Data Transformation Services for SQL 2000 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-09-17 : 16:40:05
|
quote: multiple fields spread through 4 lines of data
Could you explain this a little bit more?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
|
|
rtpnc
Starting Member
19 Posts |
Posted - 2007-09-18 : 10:05:30
|
quote: Originally posted by tkizer
quote: multiple fields spread through 4 lines of data
Could you explain this a little bit more?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/
Sure,Line1 contains fields we need to extractLine2 contains fields we need to extractLine3 contains fields we need to extractLine4 contains fields we need to extractAll of these fields extracted should be one record in the database.example:ALAMANCE 2005 CR 05899103012212 BREAKING AND ENTERING 14-54In the database we would want:ALAMANCE 2005 CR 058991 03012212 BREAKING AND ENTERING 14-54 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-09-18 : 12:08:13
|
If the data is across multiple lines, then it's not going to be easy using SSIS for this. I think you'll need to write a custom utility to do this.Can't you just fix the file though? Who is generating this weird format?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
|
|
X002548
Not Just a Number
15586 Posts |
|
rtpnc
Starting Member
19 Posts |
Posted - 2007-09-18 : 15:23:23
|
Does it really matter if its SQL 2K or 2K5. We have both but if it helps to use one or the other please let me know.Cant really edit in notepad, there can be thousands of records each day.Now come to find out the data can span from 2 lines up to 6 line depending on the type of data it is.If the record begins with 01 its type1 if the record begins with type 02 its another type of layout all together.Anymore ideas? |
|
|
X002548
Not Just a Number
15586 Posts |
|
rtpnc
Starting Member
19 Posts |
Posted - 2007-09-18 : 16:00:30
|
Here is some more information:Because the amount of data per case could exceed the maximum sizeallowed for a record, the data has been grouped into six categories: case, alias, offense, witness, probation, sentencing,and special conditions information. Each data type has its own record. Every case will have a case record, one or more offense records, and most likely a witness record. There may or may not be any alias records attached to the case.The maximum record length is 394. The first field in each record is the record type, which indicates what type of information is contained in the recordRecord type code (present in each record):00 = CASE DELETE01 = CASE DATA02 = ALIAS DATA03 = OFFENSE DATA04 = WITNESS DATA05 = JUDGMENT DATA06 = SPECIAL CONDITIONS99 = TRAILER (END-OF-FILE) RECORDStart Position Ending Position Length Description3 5 3 County Code6 17 12 County Name18 21 4 Case Year22 24 3 Case TypeCR = Criminal District CourtCRS = Criminal Superior Court25 30 6 Case SequenceDATA RECORD LAYOUT FOR CASE DATA (‘01’ Record Type):Start Position Ending Position Length Description3 5 3 County Code6 17 12 County Name18 21 4 Case Year22 24 3 Case TypeCR = Criminal District CourtCRS = Criminal Superior Court25 30 6 Case Sequence31 119 88 Case Description120 120 1 Race Code121 121 1 Sex Code122 129 8 Defendant Date of Birth/Age: DOB = CCYY****Age = Right justified, zero filled130 134 5 Filler135 138 4 Last 4 digits of Social Security Number (Implemented 11/01/2004)139 163 25 Defendant Driver’s License Number164 165 2 Defendant Driver’s License State166 173 8 Citation Number174 181 8 Trial Date: CCYYMMDD182 182 1 District Court Case Process Type183 183 1 Superior Court Case Process Type184 193 10 Defendant State Identifier194 194 1 Defendant In-Jail Indicator195 202 8 Date Case Served on Defendant: CCYYMMDD203 209 7 Check Digit Number210 224 15 LID (Local Identification) Number225 233 9 ORI (Originating Agency Routing Identifier) Number234 235 2 Case Office Location Code236 236 1 Domestic Violence Charged Flag ‘Y’ indicates that that the defendant was charged with domestic violence)237 237 1 Victims Rights Act flag (‘Y’ indicates that this is a VRA case)DATA RECORD LAYOUT FOR ALIAS DATA (‘02’ Record Type):Start Position Ending Position Length Description3 30 28 Defendant Alias Name: Last Name(Comma)First Name(Comma)Middle Name(Comma)SuffixDATA RECORD LAYOUT FOR OFFENSE DATA (‘03’ Record Type):Start Position Ending Position Length Description3 4 2 Offense Sequence Number:01 – 49 District Court Case Offense51 – 99 Superior Court Case Offense5 8 4 Charged Offense Code9 9 1 Charged Offense Type:F = FelonyM = MisdemeanorT = Traffic MisdemeanorI = Infractions10 54 45 Charged Offense Description55 69 15 Charged Offense Statute Number70 73 4 Arraigned Offense Code74 74 1 Arraigned Offense Type:F = FelonyM = MisdemeanorT = Traffic MisdemeanorI = Infractions75 119 45 Arraigned Offense Description120 134 15 Arraigned Offense Statute Number135 138 4 Convicted Offense Code139 139 1 Convicted Offense Type:F = FelonyM = MisdemeanorT = Traffic MisdemeanorI = Infractions140 184 45 Convicted Offense Description185 199 15 Convicted Offense Statute Number200 207 8 Called and Failed Date: CCYYMMDD208 215 8 Failure to Appear Date: CCYYMMDD216 223 8 Order for Arrest Date: CCYYMMDD224 225 2 Convicted Offense Class226 227 2 Plea Code228 229 2 Verdict Code230 231 2 Method of Disposition Code232 239 8 Offense Disposition Date: CCYYMMDD240 244 5 Fine Amount (Whole Dollars)245 251 7 Court Costs Amount (Dollars (.) Cents)252 260 9 Restitution Amount (Dollars (.) Cents)261 261 1 Monies Paid Indicator262 269 8 Monies To-Be-Paid Date: CCYYMMDD270 277 8 Non-Motor Vehicle Fail-To-Comply Date: CCYYMMDD278 285 8 Show-Cause Order Date: CCYYMMDD286 293 8 Probation Violation Date: CCYYMMDD294 301 8 Motor-Vehicle Failure-To-Comply Date: CCYYMMDD302 360 59 Special Condition361 368 8 Charged Offense Date: CCYYMMDD369 369 1 Blood Alcohol >= .16370 371 2 Prior Points372 374 3 Judge Initials375 377 3 ADA initials378 378 1 Domestic Violence Convicted Flag (‘Y’ indicates that the defendant was convicted of domestic violence).DATA RECORD LAYOUT FOR WITNESS DATA (‘04’ Record Type):Start Position Ending Position Length Description3 30 28 Complainant Witness Name: Last Name(Comma)First Name(Comma)Middle Name(Comma)Suffix31 33 3 Agency CodeDATA RECORD LAYOUT FOR JUDGMENT DATA (‘05’ Record Type):Start Position Ending Position Length Description3 4 2 Offense Sequence Number:01 - 49 District Court Case Offense51 - 99 Superior Court Case Offense5 17 13 Consolidated for Judgment – File Number 18 19 2 Consolidated for Judgment – Offense Number20 22 3 Sentence Length Minimum23 23 1 Sentence Length Minimum Frame24 26 3 Sentence Length Maximum27 27 1 Sentence Length Maximum Frame28 28 1 Sentence Type29 31 3 Probation Length32 32 1 Probation Length Frame33 33 1 Supervised Probation Flag34 37 4 Credit for Time Served38 38 1 Consecutive Sentencing Indicator39 42 4 Community Service Length43 45 3 Community Service to be Completed Within46 46 1 Assessment47 54 8 Community Service License Revocation55 62 8 Community Service License Revocation Compliance63 63 1 Enhanced Firearm Penalty64 64 1 Sentencing Service Plan65 68 4 Special Probation Length69 69 1 Special Probation Active Term Frame70 70 1 Custody of71 71 1 Impact Program72 121 50 Program Name122 125 4 Residential Program Length126 126 1 Residential Program Length Frame127 130 4 Electronic Monitoring/House Arrest131 131 1 Electronic Monitoring/House Arrest Frame132 135 4 Intensive Probation Supervision136 139 4 Day Reporting Center140 140 1 Day Reporting Center Frame141 141 1 Special Conditions for Sex Offender142 142 1 Drug Treatment CourtDATA RECORD LAYOUT FOR SPECIAL CONDITION DATA (‘06’ Record Type):Start Position Ending Position Length Description3 4 2 Offense Sequence Number:01 - 49 District Court Case Offense51 - 99 Superior Court Case Offense5 63 59 Special Condition 264 122 59 Special Condition 3123 181 59 Special Condition 4TAILER RECORD LAYOUT (‘99’ Record Type):Start Position Ending Position Length Description3 15 13 Count of Total records contained in file (excluding the trailer)16 28 13 Count of Delete/Expunge records contained in file29 41 13 Count of Case records contained in file42 54 13 Count of Alias records contained in file55 67 13 Count of Offense records contained in file68 80 13 Count of Witness records contained in file81 93 13 Count of Judgment records contained in file94 106 13 Count of Special Condition records contained in fileAm I missing something here? |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2007-09-18 : 16:06:46
|
This reminds me of Mainframe processingIs there a key that joins all of the "records" together?Make the mainframe programmers put the data in relational formatI mean, you could do it, but I'm suspect about you wanting to "combine" the data into 1 rowI would load each type to it's own table to start, then I would just join on the keyBrett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
|
|
rtpnc
Starting Member
19 Posts |
Posted - 2007-09-18 : 16:22:25
|
X002548,Thanks, I know the key field that ties the record together but it is only on one line. I dont even know how to start to "I would load each type to it's own table to start, then I would just join on the key"Got any more tips? |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2007-09-18 : 16:29:50
|
IOf the "key" is only on 1 record type, howdo you plan to tie the data together?Oh, let me guess, the file is in sequential order...I'm right aren't IIf that's true, then it's problematic for any database.If the file is really coming from a mainframe, they have to format the data for you, using FIFo processing with break logic to make sure the data stays together.If you load the data, the rows my load out of order (although I've never seen), but it is possibleBrett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
|
|
cat_jesus
Aged Yak Warrior
547 Posts |
Posted - 2007-09-20 : 10:51:06
|
quote: Originally posted by X002548 IOf the "key" is only on 1 record type, howdo you plan to tie the data together?Oh, let me guess, the file is in sequential order...I'm right aren't IIf that's true, then it's problematic for any database.
This is actually pretty easy to do as long as the first record in the series has the key value. What you do is load the data in order into a table with an auto numbered key, with a clustered index on that column. This is to ensure that the data is actually stored in sequential order. Then you add another column for your natural key that ties all your segmented records together. Then you run an update similar to this.declare @key as varchar(25)update raw_tableset naturalKey = @key = case when recordtype = '00' then SourceKeyfield else @key endI haven't checked the syntax but you should be able to get the idea. As you run down the table(and you will do it sequentially because of the clustered index) when you hit the 00 record you will update your variable and set the natural key for all the records that belong together. Then you can set abut writing your stored procedures to split all the different record types into their individual tables and normalize the data, or port it.I've done this type of thing dozens of times(with police and court data too) and once you get it, the rest falls into place very easily. |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2007-09-20 : 10:54:22
|
quote: Originally posted by cat_jesusThis is actually pretty easy to do as long as the first recordin the series has the key value. What you do is load the data in order into a table with an auto numbered key, with a clustered index on that column.This is to ensure that the data is actually stored in sequential order.
There is no guarentee that it will load in the right order. The clustered index would need to be on a key, which they don't have.So your assertions are falseBrett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
|
|
cat_jesus
Aged Yak Warrior
547 Posts |
Posted - 2007-09-20 : 10:56:51
|
BCP will load it in the order it comes in.But, let's say you're very worried that BCP won't load it in order(which it does). It's a simple matter to put a record number on the input file before you load it into SQL Server and then you just use that field to put your clustered index on. What's fun is de-normalizing a database and putting it into one of those mainframe formats. Bleh. |
|
|
Van
Constraint Violating Yak Guru
462 Posts |
Posted - 2007-09-20 : 11:19:03
|
There's a piece of software called Monarch that should be able to handle this. You can take a report (not a report with one record per line, but like a report with one record per page with fields spread all over the page) and define each field where it's at on the page. You have to specify what signals a new record and all. Then it takes each defined record, frome fields spread out on the page, and puts it into a row. You can then export the records to a txt, xls, (bunch of different formats) file to be imported to SQL Server. Monarch can be ran via cmd prompt once you get you report and templates and all set up. I used to use it with Access alot years ago. A file would come automatically, click a button in Access to do the Monarch part and create a .txt file with one row per line, then click a button in Access to import the file. That could all be set up with DTS/SSIS too. It'd be an extra step to deal with these types of files, but it will get them into a format you can import. |
|
|
cat_jesus
Aged Yak Warrior
547 Posts |
Posted - 2007-09-20 : 11:34:16
|
My experience with monarch is that it drops records and/or fields. If the data needs to go into SQL server then my experience is that it's best to put it in SQL server first, then write SPs to split the data out into the tables they need to be in.YMMVMonarch... Oh the horrors I've seen created by Monarch.Seriously though, if you aren't too worried about data integrity and it's a one off job, then Monarch may be fine. But if it's going to be a regular production process, stay away from tools like Monarch. |
|
|
Van
Constraint Violating Yak Guru
462 Posts |
Posted - 2007-09-20 : 11:53:20
|
I've saw that happen with monarch but it was because the model wasn't set up right. Trust me, I hate monarch and would much rather put it in sql server and deal with it there, but sometimes that isn't an option depending on the report/txt file. |
|
|
Next Page
|