Monday, October 22, 2012

Reformating output using perl and regex

Perl can be very handy when working with strings. once can reformat almost any string using regular expressions. I had to work on MIB values recently and input these values to Cacti. I would parse the MIB on a network device and get a list of values like :
TERADICI-PCOIPv2-MIB::pcoipGenDevicesName.1 = STRING: "pcoip-host-0030040e079e"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesDescription.1 = ""
TERADICI-PCOIPv2-MIB::pcoipGenDevicesGenericTag.1 = ""
TERADICI-PCOIPv2-MIB::pcoipGenDevicesPartNumber.1 = STRING: "TERA1200 revision 1.0 (128 MB)"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesFwPartNumber.1 = STRING: "Leadtek rev M host card with copper"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesSerialNumber.1 = STRING: "L12060000489"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesHardwareVersion.1 = STRING: "62917013120-D"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesFirmwareVersion.1 = STRING: "4.0.2"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesUniqueID.1 = STRING: "00-30-04-0E-07-9E"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesMAC.1 = STRING: "00-30-04-0E-07-9E"
TERADICI-PCOIPv2-MIB::pcoipGenDevicesUptime.1 = Counter64: 264895
TERADICI-PCOIPv2-MIB::pcoipImagingDevicesIndex.1 = INTEGER: 1
TERADICI-PCOIPv2-MIB::pcoipImagingDevicesIndex.2 = INTEGER: 2
TERADICI-PCOIPv2-MIB::pcoipImagingDevicesIndex.3 = INTEGER: 3
TERADICI-PCOIPv2-MIB::pcoipImagingDevicesIndex.4 = INTEGER: 4


To pass these values to Cacti, I needed to reformat these lines so they would display :.

To replace a substring in perl, we use the =~ s///. We can also add an "i" at the end to make it case insensitive : =~ s///i.
A simple exemple would be :
$output = "Characters replacement in perl";
$output =~ s/Characters/String/i;
print $output;

would print out :
String replacement in perl

All the power comes from search operators that we can apply on the search part. These seach operators are :
.   Match any character
\w  Match "word" character (alphanumeric plus "_")
\W  Match non-word character
\s  Match whitespace character
\S  Match non-whitespace character
\d  Match digit character
\D  Match non-digit character
\t  Match tab
\n  Match newline
\r  Match return
\f  Match formfeed
\a  Match alarm (bell, beep, etc)
\e  Match escape
\021  Match octal char ( in this case 21 octal)
\xf0  Match hex char ( in this case f0 hexidecimal)
You can follow any character, wildcard, or series of characters and/or wildcard with a repetiton. Here's where you start getting some power:
*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times
 
So, if we want to get rid of some text in a line, we could use :
$output = "Today blabla is a 354446 nice day!";
$output =~ s/Today \w+ is a \d+ nice day!/Today is a nice day!/i;
print $output;
 
This would output :
Today is a nice day!
 
Another very powerful elements are (). They store the matching pattern in variables and name them \1 \2 etc...
Let see another exemple :
$output = "I want to keep this value and this other value";
$output =~ s/I want to keep (this value) and this (other value)/\1 AND \2/i;
print $output; 
 
Would print :
this value AND other value
 
so back to our MIB values. In order to replace :
TERADICI-PCOIPv2-MIB::pcoipImagingDevicesDisplayHeight.1 = INTEGER12: 1080
by :
pcoipImagingDevicesDisplayHeight.1:1080
We would use the following statement :
$_ =~ s/^TERADICI-PCOIPv2-MIB::(\w+\.\d) = \w+\d*: (\d+)$/\1 \2/i;

One last process. In the previous line, I wanted to keep only the OID name ( 
pcoipImagingDevicesDisplayHeight
 ) without the .1 for all OIDs ending with .1 and for every other OID name I wanted to append the . to the OID name. This is done with the following 2 lines :
 
$_ =~ s/^(\w+)\.1 (\d*)$/\1 \2/i;
$_ =~ s/^(\w+)\.([234567890]) (.*)$/\1\2 \3/i;
 
So because the first line will only match OID.1, it only affects these lines. Line 2 takes care of the others :
pcoipImagingDevicesDisplayProcessRate.1 -> pcoipImagingDevicesDisplayProcessRate
pcoipImagingDevicesDisplayProcessRate.2 -> pcoipImagingDevicesDisplayProcessRate2