Author |
Topic |
lemondash
Posting Yak Master
159 Posts |
Posted - 2013-01-14 : 15:27:23
|
Hello.I have a table where there is one column which unfortunately has the full name and in some cases has the titles or affiliations also included in the column. There are number of different permutations on how its stored in that one column.Whats is the best way to tidy this data up and put each part in is correct field. The main issue i'm having is finding the positions of the spaces because i'm assuming that will help split the values up ? Any help or advice would be great.Examples of dataColumnDr FirstName LastnameMiss FirstName MiddleName LastNameFirstName LastName |
|
jimf
Master Smack Fu Yak Hacker
2875 Posts |
Posted - 2013-01-14 : 17:04:30
|
There may be a best way, but there certainly isn't a good way. You'd have to pull it apart piece by piece, making a lot of assumptions. For example, if there's one space, then it's FirstName,LastName. If there's 2 spaces, then if the first part is in ('Dr','Miss', etc) then that's the title, else it's a first name. Even doing this agonizing process while give inconsistent results. For the name Oscar de la Hoya, how you can program that Oscar is the first name and de la Hoya is the last?JimP.S. Shoot the person that designed this table, I won't tell anyone.Everyday I learn something that somebody else already knew |
|
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2013-01-14 : 22:46:52
|
quote: Originally posted by jimfP.S. Shoot the person that designed this table, I won't tell anyone.
I'll tell everyone. "SEE??? This is what happens to you when you design stupid stuff!" --Jeff Moden RBAR is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row".First step towards the paradigm shift of writing Set Based code:"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." When writing schedules, keep the following in mind:"If you want it real bad, that's the way you'll likely get it." |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2013-01-14 : 23:42:20
|
quote: Originally posted by lemondash Hello.I have a table where there is one column which unfortunately has the full name and in some cases has the titles or affiliations also included in the column. There are number of different permutations on how its stored in that one column.Whats is the best way to tidy this data up and put each part in is correct field. The main issue i'm having is finding the positions of the spaces because i'm assuming that will help split the values up ? Any help or advice would be great.Examples of dataColumnDr FirstName LastnameMiss FirstName MiddleName LastNameFirstName LastName
Ask the designer to read on normalisation and first normal form------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
lemondash
Posting Yak Master
159 Posts |
Posted - 2013-01-15 : 04:33:38
|
I agree with all your comments above but due to his poor DB design this person has been kept many miles away from this data. But it does leave me with the problem of tidying it up.Is there away i could count the spaces with in the column and based on the result split the column up. Using this theory will clean most of the mess up. |
|
|
lemondash
Posting Yak Master
159 Posts |
Posted - 2013-01-15 : 05:15:34
|
This is what i have put together so far.SELECT vContact , PARSENAME(REPLACE(vContact, ' ', '.'), 2) AS Firstname , PARSENAME(REPLACE(vContact, ' ', '.'), 1) AS LastName , CASE WHEN vContact LIKE 'Miss%' THEN 'Miss' WHEN vContact LIKE 'Dr%' THEN 'Dr' WHEN vContact LIKE 'Mrs%' THEN 'Mrs' WHEN vContact LIKE 'Mr%' THEN 'Mr' WHEN vContact LIKE 'Ms%' THEN 'Ms' END AS [S]FROM table1 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2013-01-15 : 05:39:50
|
quote: Originally posted by lemondash I agree with all your comments above but due to his poor DB design this person has been kept many miles away from this data. But it does leave me with the problem of tidying it up.Is there away i could count the spaces with in the column and based on the result split the column up. Using this theory will clean most of the mess up.
see two methods here to split long string based on delimiterhttp://visakhm.blogspot.in/2013/01/delimited-string-split-xml-parsing.htmlhttp://visakhm.blogspot.in/2010/02/parsing-delimited-string.html------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
jimf
Master Smack Fu Yak Hacker
2875 Posts |
Posted - 2013-01-15 : 07:02:31
|
SELECT LEN(vContact) - LEN(REPLACE(vContact,' ','')) will tell you how many spaces you have.JimEveryday I learn something that somebody else already knew |
|
|
lemondash
Posting Yak Master
159 Posts |
Posted - 2013-01-16 : 04:13:33
|
Thank you for all your help. |
|
|
erikhaselhofer
Starting Member
30 Posts |
Posted - 2013-01-17 : 15:19:22
|
I might be too late but I would propose, and see if it got anywhere, some other ideas.1. Don't fix it. Instead, rename the column, to say lastname, and treat the output differently if the firstname field is NULL. Going forward the interface can handle new accounts correctly and be backwards compatible with the junk. It might cause a few minor problems with reporting, an interface tweak or two, and maybe some work with mailing list but it could be done. You probably won't win this one.2. Add a flag to the user table to the effect of "forceUpdate", set it to yes, and require all users, on login, to update their account. Assuming the interface is being changed to reflect a new database design, you could simply have the users fix the problem. You could win this one.3. Put it in an Excel spreadsheet and toss the problem down the line. Maybe there's someone with some free time on their hands who can just plow it over a week. You might win this one too.I don't know the source of your data, it's purpose, or what you are trying to do going forward, but the odds are very good that a chunk of your data is junk. Even in rock-solid, tight businesses that can often be the case. And given your situation I'd bet you have a lot of junk, but, I could be wrong.So, before I did anything, I'd run some queries to try and estimate how much work I actually had. Sometimes you can make a business case for not fixing the past. |
|
|
erikhaselhofer
Starting Member
30 Posts |
Posted - 2013-01-17 : 15:39:25
|
Upon further thought, trying to tweak the users to update the data could be a bit of a loopy solution as having "Mr. John Jones" show up in a single column could cause people to scratch their heads a bit. You probably wouldn't want to do that in this case. |
|
|
|