Home Page
  • April 18, 2024, 07:53:09 pm *
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

Official site launch very soon, hurrah!


Author Topic: Better Regular Expression Lists  (Read 13222 times)

Dakusan

  • Programmer Person
  • Administrator
  • Hero Member
  • *****
  • Posts: 536
    • View Profile
    • Dakusan's Domain
Better Regular Expression Lists
« on: November 10, 2016, 04:52:15 pm »

Original post for Better Regular Expression Lists can be found at https://www.castledragmire.com/Posts/Better_Regular_Expression_Lists.
Originally posted on: 11/10/16

Regular expressions have been one of my favorite programming tools since I first discovered them. They are wonderfully robust and things can usually be done with them in many ways. For example, here are multiple ways to match an IPv4 address:
  • ^\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?$
  • ^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$
  • ^(\d{1,3}\.){3}\d{1,3}$
  • ^([0-9]{1,3}\.){3}[0-9]{1,3}$

One of my major annoyances though has always been lists. I have always done them like ^(REGEX,)*REGEX$.

For example, I would do a list of IP addresses like this: ^(\d{1,3}\.){3}\d{1,3},)*\d{1,3}\.){3}\d{1,3}$.

I recently realized however that a list can much more elegantly be done as follows: ^(REGEX(,|$))+(?<!,)$. I would describe this as working by:

  • ^: Start of the statement (test string)
  • (REGEX(,|$))+: A list of items separated by either a comma or EOS (end of statement). If we keep this regular expression as not-multi-line (the default), then the EOS can only happen at the end of the statement.
  • (?<!,): This is a look-behind assertion saying that the last character before the EOS cannot be a comma. If we didn’t have this, the list could look like this, with a comma at the end: “ITEM,ITEM,ITEM,”.
  • $: The end of the statement

So the new version of the IP address list would look like this ^((\d{1,3}\.){3}\d{1,3}(,|$))+(?<!,)$ instead of this ^((\d{1,3}\.){3}\d{1,3},)*(\d{1,3}\.){3}\d{1,3}$.


Also, since an IP address is just a list of numbers separated by periods, it could also look like this: ^(\d{1,3}(\.|$)){4}(?<!\.)$.

Logged