Programming and accessibility tips

from the Cynthia Network

Perl »

Create an array of all matches of a regular expression in Perl

This post illustrates how to take a string in Perl and construct an array of substrings matching a given pattern. We start with a review of regular expressions in Perl.

A regular expression describes a set of strings matching a pattern. In a computer program, regular expressions can be used to test whether a string contains a certain pattern as a substring or to alter substrings matching a given pattern.

Consider, as an example, the ordered pair (2,3). The following Perl regular expression checks whether the string (2,3) is a well-formed ordered pair:

my $pair = '(2,3)';
if ( $pair !~ /^\(\d+,\d+\)$/ ) {
  print 'syntax error';
} # not a proper ordered pair

The next code uses regular expressions to identify the \(x\)-coordinate and the \(y\)-coordinate from the ordered pair:

my $pair = '(2,3)';
$_ = $pair;
my $x = $1;
my $y = $2;

We can also manipulate strings. The split() function uses regular expressions to split a string on every occurrence of a given substring or pattern. The output of split is an array containing all substrings that are separated by the split pattern.

The following code splits a string containing a comma-separated list of numbers into an array of numbers:

my $string = '1,2,3,4,5';
my @array = split(',', $string);

In the above example, the commas only served to separate the numbers in the list. Now suppose that we have a string representing the following list of ordered pairs:


Separating the above string into an array of ordered pairs is a bit more tricky because the comma is serving two different purposes; a comma is used to separate coordinates in each ordered pair and a comma is used to separate the ordered pairs. Therefore, using the split function will not produce the appropriate array.

We can use a regular expression to match a single ordered pair as a substring with the following code:

my $string = '(1,2),(3,4),(5,6)';
if ( $string =~ /\(\d+,\d+\)/ ) {
} # this string contains an ordered pair

Using Perl's /g regular expression operator, we can match each occurrence of the substring pattern, instead of just matching one occurrence as in the preceding example. When we apply the /g operator to a regular expression match, the output of the match instruction is a list of all matching substrings.

The following code produces an array of strings, each of which represents an ordered pair:

my $string = '(1,2),(3,4),(5,6)';
my @array = (  $string =~ /\(\d+,\d+\)/g );

The resulting array contains the three strings (1,2), (3,4), and (5,6). The pair of parentheses around the regular expression captures all matches generated using the /g operator and stores them in $array.

Computer science help



By Bill Hollingsworth
© 2016 by Cynsight, LLC.
Cynthia™ is a trademark of Cynsight, LLC.