Today I thought I’d give a demonstration on the use of regular expressions [reference page here]. Regular expressions are basically a simplified scripting language for finding and replacing complex text strings, and are implemented into much of today’s software which involve a lot of text editing. They are a fabulously handy tool for computer users and are especially useful for programmers. I believe RegExps actually originally gained their notoriety through the Perl programming language. I also recently heard that it is definite that the new version of C++ (C++0x) will have native library support for regular expressions, yay!
Since I posted yesterday on DNS stuff, and have the examples from it handy, I figured I’d use those :-).
Let’s say you had a group of .com domains and wanted to find out their name servers (I’ve had to do this when switching to new name servers to make sure all the domains we did not control at the registrar level had their name servers set to the new ones). For this example, we will use the following domains “castledragmire.com”, “riaboy.com”, “NonExistantDomainA.com”, and “dakusan.com”.
- First, we’d need to have the list of the domains, for this example, one domain per line is used.
castledragmire.com
riaboy.com
NonExistantDomainA.com
dakusan.com
- Next, we need to turn them into a bash (Linux) script to grab all the information we need.
Replace: “^(.*)$”
With: “echo '!?$1?!'; host -t ns $1 a.gtld-servers.net | grep ' name server ';”
Sample output: (The !? ?! stuff are markers for easier viewing and parsing)echo '!?castledragmire.com?!'; host -t ns castledragmire.com a.gtld-servers.net | grep ' name server ';
echo '!?riaboy.com?!'; host -t ns riaboy.com a.gtld-servers.net | grep ' name server ';
echo '!?NonExistantDomainA.com?!'; host -t ns NonExistantDomainA.com a.gtld-servers.net | grep ' name server ';
echo '!?dakusan.com?!'; host -t ns dakusan.com a.gtld-servers.net | grep ' name server ';
- Next, we run the script, and it would output the following:
!?castledragmire.com?!
castledragmire.com name server ns3.deltaarc.com.
castledragmire.com name server ns4.deltaarc.com.
!?riaboy.com?!
riaboy.com name server ns3.deltaarc.com.
riaboy.com name server ns4.deltaarc.com.
!?NonExistantDomainA.com?!
!?dakusan.com?!
dakusan.com name server ns3.deltaarc.com.
dakusan.com name server ns4.deltaarc.com.
- Next, we would keep running the following regular expression until no more replacements are found.
This would combine all domains with multiple name servers onto one line with name servers separated by spaces.
Replace: “(.*?) name server (.*)\n\1 name server (.*)”
With: “$1 name server $2 $3”
It would output the following:!?castledragmire.com?!
castledragmire.com name server ns3.deltaarc.com. ns4.deltaarc.com.
!?riaboy.com?!
riaboy.com name server ns3.deltaarc.com. ns4.deltaarc.com.
!?NonExistantDomainA.com?!
!?dakusan.com?!
dakusan.com name server ns3.deltaarc.com. ns4.deltaarc.com.
- The final regular expression would turn the output into a single line per domain, followed by its domain servers. The current extra line before the list of name servers is to help spot any domains that did not provide us with name servers.
Replace: “!\?(.*?)\?!\n\1 name server (.*)”
With: “#$1 \t $2”
Which would output the final following data:#castledragmire.com ns3.deltaarc.com. ns4.deltaarc.com.
#riaboy.com ns3.deltaarc.com. ns4.deltaarc.com.
!?NonExistantDomainA.com?!
#dakusan.com ns3.deltaarc.com. ns4.deltaarc.com.
This data could be directly pasted into Excel, which would put the first column as domains and second column as name servers).