StrataFrame Forum

How to parse alphanumeric value?

http://forum.strataframe.net/Topic27445.aspx

By Edhy Rijo - 6/15/2010

I have a field which may have alphanumeric values like this:



AMDNY-0001253

AMDNJ-0011253

000001253

XXHAM 1253





I need to get the numeric part to do some calculations and then assembly the alpha + numeric part again. Is there any cool .Net function that would help me on this process instead of me looping the value backwards to get only the numeric part of 1253 in the above example?
By Greg McGuffey - 6/15/2010

Regex! It is a bit of a pain to learn, but it rocks at solving problems like this. You'd use a regex patterns to match parts of an input string.

This would match the first two examples.



([A-Z]*)-([0-9]*)



This would match the third example:



(0*)([0-9]*)



This would match the last example:



([A-Z]*) ([0-9]*)



This would match either of the first two or the last one:



([A-Z]*)(-| )([0-9]*)





Further, the parentheses are used to capture the matched text. Thus, in each of these examples you could then access the prefix separate from the suffix. I.e. with first pattern, and the input AMDNY-0001253 you could pull out AMDNY and 0001253 separately.



I highly recommend RegExBuddy as a tool to help figure the patterns out.



You then use the Regex object in .NET to work with this stuff:

Dim re As New Regex("(?[A-Z]*)-(?[0-9]*)")

Dim mc As MatchCollection = re.Matches("AMDNY-0001253")

Dim m As Match = mc(0)

Dim prodGroup as Group = m.Groups("prodgroup")

Dim delimGroup as Group = m.Groups("delimiter")

Dim idGroup as Group = m.Groups("id")



MessageBox.Show(prodGroup.Value)

MessageBox.Show(idGroup.Value)




Regex is a big subject, but hopefully this will get you started.
By Greg McGuffey - 6/15/2010

Whoops, code example got messed up because of less than/greater than signs:



Dim re As New Regex("(?<prodgroup>[A-Z]*)-(?<id>[0-9]*)")

Dim mc As MatchCollection = re.Matches("AMDNY-0001253")

Dim m As Match = mc(0)

Dim prodGroup as Group = m.Groups("prodgroup")

Dim idGroup as Group = m.Groups("id")



MessageBox.Show(prodGroup.Value)

MessageBox.Show(idGroup.Value)
By Edhy Rijo - 6/15/2010

Thanks Greg,



I will investigate RegEx even though in my case I don't have a constant delimiter like in sample 3 and I need to get just the numeric part to do the calculations and generate new numbers to be added to the alpha part. If I would have a delimiter I could use the String.Split() but I guess I will need to do a function to scan each character and test for IsNumeric() or Char.IsDigit() from right to left until an alpha is found then split the whole value.



It would be really cool if a RegEx expression could get it right. w00t
By Edhy Rijo - 6/15/2010

Greg, once again, thanks for the RegEx samples but I decided to create a small function to parse the values. Here is the code I came up with:



Private OriginalValues() As String = {"AMDNY-0001253", "AMDNJ-0001253", "0001253", "XXHAM 1253", "1253"}

Private Sub ParseValues()

Dim sb As New System.Text.StringBuilder



'-- Loop each original value and separate the Alpha from the Numeric

For Each stringItem As String In OriginalValues

Dim CharacterEnumerator As CharEnumerator = stringItem.GetEnumerator



Dim AlphaValueCounter As Integer = 0

While (CharacterEnumerator.MoveNext())



If Char.IsDigit(CharacterEnumerator.Current) Then

'-- If the current character is a digit then ignore the zero value

If CharacterEnumerator.Current <> "0" Then

' If the current character is not a zero then we have the current position

' to split or in this case SubString() the original value

Exit While

End If

End If

AlphaValueCounter += 1

End While



'-- Create a string builder output to show the different values

sb.AppendLine("Original Value = " & stringItem)

sb.AppendLine("Alpha value = " & stringItem.Substring(0, AlphaValueCounter))

sb.AppendLine("Numeric value = " & stringItem.Substring(AlphaValueCounter))

sb.AppendLine("---------------------------------------------")

sb.AppendLine()

Next

MessageBox.Show(sb.ToString)

End Sub





Then if you have this code in a form, you will call it like Me.ParseValues() it will simply show a message box with all the original values and the parsed ones.



If anybody come up with a simpler approach, please let us know, it will be greatly appreciated. Hehe
By Greg McGuffey - 6/15/2010

I'm glad you got it going.



I'd suggest that when you have some time, you might learn some more about regex. I just barely touched on the power of them. They can get very hairy to work with, but they have tons of power. They are also fast. And this is pretty much exactly what they were designed to do. The SF SyntaxEditor uses regex to do the color highlighting. While the initial learning can be a bear, once you have an understanding of them, they turn many hard problems into rather simple ones. BigGrin
By Edhy Rijo - 6/15/2010

Greg McGuffey (06/15/2010)
I'd suggest that when you have some time, you might learn some more about regex... BigGrin




I agree, I have been wanted to get into RegEx since I was working with VFP and I have read some documentation about them but like you said it make take some deep testing to get the matching expressions correct.



This particular fix looks pretty simple, but when calling this method/function hundreds of time in a process it could take more time than using a properly RegEx expression, so I will explore it too this week.
By Charles R Hankey - 6/16/2010

Hey Edhy



As you explore regex be sure to check out this free tool - Expresso



http://www.ultrapico.com/Expresso.htm
By Greg McGuffey - 6/16/2010

Using some kind of tool definitely makes using Regex easier. The tool I use is Regex Buddy:



http://www.regexbuddy.com/



The same company also has a new tool Regex Magic that looks promising:



http://www.regexmagic.com/



It's nice to have choices. BigGrin


By Ivan George Borges - 6/16/2010

Hey, good to know. I've got RegexBuddy, will have a look at RegexMagic too.
By Edhy Rijo - 6/16/2010

Thanks Charles,



Believe me, taking a quick look at the Regex Magic and Buddy, they looks very powerful and I like the fact of how they highlight all parts to be matched.



Expresso looks good but I am under the impression that you need to be a more Regex experience developer to take the best of that tool.



The Just-Great-Software tools feels like a more intuitive tool, even though there is no a trial or demo version, the money back policy looks good and also the price is very affordable for such a tool, now the question is which one to choose Regexbuddy or RegexMagic? Cool
By Charles R Hankey - 6/16/2010

Whatever tool you use notice there is a regex tutorial link on the ultrapico page.
By Greg McGuffey - 6/16/2010

I think I'd try RegexMagic. It looks really accessible, with the ability to build regex patterns without having to actually know regex (i.e. they've abstracted out many pattern matching concepts to a higher level...you can search for a SSN rather than having to know the regex ([0-9]{3}-[0-9]{2}-[0-9]{4}). It also can generate code snippets for you. And there is a free trial... Sounds like its worth a try to me!



http://www.regexmagic.com/

http://www.regexmagic.com/download.html



Regex Buddy is more useful if you already know regex (or want to learn it) and it doesn't have a free trial (but does have a nice money back guarantee).



My guess is that you'd still want to learn regex, but regex magic might make it possible to start using regex without the hours of trying to figure out how to build a pattern, which might actually increase the speed you learn it.
By Edhy Rijo - 6/22/2010

Hi Greg and all,



Just to let you know that today I finally had the opportunity to test a Regex match and replace and with the help of the RegexMagic I was able to create the correct pattern and replaced the previous flat condition which had to match several characters using String.Replace().Replace(), etc.



Once again, thanks for the contribution and push BigGrin nicely for us to move in the right direction Tongue
By Greg McGuffey - 6/22/2010

Glad to hear it! Sounds like RegexMagic is quite helpful.