I frequently encounter situations where I have a lot of code that needs to be refactored. The situation I’m talking about isn’t a refactoring task that is supported by the refactoring tools built in to most IDEs so I enlist that double-edged sword, regular expressions, to make things easier and faster.

The example I’ll use below is not necessarily a great case for using this approach — I want to be clear about that. For a slightly more realistic context, imagine you have a length list of constants that you need to make available as symbols for your application. Well, you can do it manually in 10 or 15 minutes or you can get it knocked out in rapidly using regex find/replace.

For these examples, I’m going to cover how I would do this in my favorite IDE, IntelliJ (Android Studio). It shouldn’t take much effort to find the specifics for your favorite editor or IDE. Regular Expressions are, more or less, universal, but how you enable regex in your find/replace dialog will likely differ depending on the editor.

Today’s Example

Consider the following example code:

class Scratch {
    public static void main(String[] args) {
        MyObject myObject = new MyObject();
        myObject.setStringValue("a string value");
        myObject.setDoubleValue(3.14159);
        myObject.setIntegerValue(42);
        myObject.setBooleanValue(false);
    }
}

class MyObject {
    String stringValue;
    double doubleValue;
    int integerValue;
    boolean booleanValue;

    void setStringValue(String value) {
        this.stringValue = value;
    }

    void setDoubleValue(double value) {
        this.doubleValue = value;
    }

    void setIntegerValue(int value) {
        this.integerValue = value;
    }

    void setBooleanValue(boolean value) {
        this.booleanValue = value;
    }
}

Let’s assume I would like to eliminate the setters and use direct member access (but let’s not argue about why we made that choice; our senior has issued the directive). I can type out every line of code or I can save myself some time and use regex replace to make the task faster.

Again, I don’t necessarily recommend using regex replace for so few lines on a regular, pun intended, basis. It’s probably just as fast, or faster, to just make the change by hand for small cases.

So, here is the code that I’m targeting as a result from my example above:

class Scratch {
    public static void main(String[] args) {
        MyObject myObject = new MyObject();
        myObject.stringValue = "a string value";
        myObject.doubleValue = 3.14159;
        myObject.integerValue = 42;
        myObject.booleanValue = false;
    }
}

My Approach

To get to my target code from the existing code, I need to remove some characters in a few places, add the assignment operator, and change the case of the member name embedded in the setter method name. It wouldn’t take too long to do it by hand (again: at least not for this case — only four lines of code), but doing it that way is tedious and boring. Aside from that, we can make ourselves more productive by practicing our regex skills!

Now, I need to craft a regex that manipulates existing lines to get our desired code. The first step here is to understand what the desired code needs to look like. For this example, I want to change the method call to be an assignment statement. I know the member name I’ll be assigning a value to is embedded in the setter method name. To do that I need to:

  • keep the single method parameter to use as the assignment value, but get rid of the parentheses that surround it
  • insert an assignment operator prior to the value
  • change the method name into the property/member name that needs to be assigned a value
    • I’ll have to remove set and then change the case of the next character of the method name
  • keep the semicolon on the end because this example is Java

Regex Captures

Most users of a modern IDE or text editor will be familiar with find and replace functionality. Access to the feature usually lives in a menu somewhere — typically the Edit menu since it is an editing function. With a simple find and replace, the target text is found and blindly replaced with the desired new text. Nothing about the original text is maintained after the operation.

Leveraging the power of regular expressions changes this process to allow a properly formed regex to carry over parts of the original text into the replacement text! Using captures allows us to hold on to chunks of the original text and then easily use them in the replacement text. Captures are created by surrounding a portion of the regex in parentheses and then referencing that capture positionally in the replacement expression. If wanted to match and capture the word foobar I could do it with a regex like this one: (foobar).

Remember, from our example, we want to keep the parameter that is passed in to the setter. We can do that by entering a regex for the source text that is something like: .*\((.*)\);. That regex will match every character of each line in our example and capture the parameter. We would then reference the parameter as the first capture: $1. If we put the regex $1 as our replacement text and then clicked Replace All, we would effectively throw away everything on those lines that was not inside the first set of parentheses. IntelliJ helpfully shows what the replacement will be for a given match so you have a simple check that your regex is reasonable.

I want to take a moment to unpack the regex I used to search the example code and capture the parameter. I’ll quickly break it down into its component parts to give you something to grow with as you move on from this blog post — unfortunately a full tutorial on regular expressions is a little outside the scope here, but there are tons of awesome references available online, and I’ll link you to my favorite at the end.

First, we see .*. This component of the expression matches multiple of any character. . means any character, and * matches it 0 or more times. Next, we see \( because we expect there to be a literal left parenthesis, and we need to match it and start capturing after it. Now, we see our capture (.*) ending with a literal right parenthesis and semicolon: \);.

We capture any characters that appear between the first left parenthesis and final right parenthesis followed by a semicolon on a given line. We could be specific on what gets captured, but it would limit what we could do. For this case, we know there is a single parameter of varying types for each line, and we need to capture it.

Now, we need to capture the member variable name from the method name and then start crafting a better replacement text. I prefer to start matching with the beginning of the setter name: set. I also prefer to be a bit more specific about the match for the name of the member encoded in that method name. We know there are only alphabetic characters so we can limit the match to those by specifying a character class.

Character classes in regular expressions are quite useful and you specify them within square brackets. In this case we need alphabetic characters, upper and lower case, and multiple of them: [A-Za-z]+. So our capture for the name will be formed like this and now we can build our entire regex to find: set([A-Za-z]+)\((.*)\);

Our full regex for this case is similar to the previous regex example except we’ve replaced any characters at the start (.*) with specific characters including a new capture: set([A-Za-z]+). Now we’re looking for any line that contains set then one or more alphabetic characters followed by left parenthesis, some arbitrary characters, and then ends with right parenthesis and semicolon.

So our replacement text is our first capture, the member name, the assignment operator, and then the value — and of course our friend semicolon because Java. This replacement text is: $1 = $2; We just use simple positional arguments for the captures we captured. Use $ followed by a number to specify the capture. And we are almost there.

We are most of the way to our goal. We need to make one change to the case of the member name. It is currently PascalCase inside the setter name, and we need camelCase in the member name. Thankfully, regular expressions provide a convenient way to take care of this, and we just put \l before the character we want to change to lowercase: \l$1 = $2;. So the first character of the first capture is lowercased (check your regex dialect for support).

A note on more complex captures

This is a simple example of using captures. Captures can be far more complex. It might be helpful to note captures work from outside to inside and from left to right. Here is a somewhat contrived example to illustrate that: \.(([A-Za-z]+)\((.*)\);). So this regex starts at a literal . and captures everything after that (capture 1). It also captures the first set of one or more alphabetic characters following the literal . (capture 2). It matches a left parenthesis. It captures zero or more arbitrary characters (capture 3) until it encounters a right parenthesis and semicolon which it matches. If the text contains something that fits this pattern, the match succeeds. This regex is naive and contrived, but would be a good starting point for matching a method call, and the list of parameters that it was called with. We would capture the entire thing as well as separately capture the method name and the parameter list.

Conclusion

If you didn’t know about or were hesitant to use the regex find/replace functionality of your IDE, then I hope your appetite has been whetted. I’d recommend creating a sandbox project and start exploring this new territory. With a little effort, you will become more comfortable with regular expressions and be able to leverage them to make yourself more productive in a variety of circumstances. The example I’ve outlined is just one way you might use regex to save time when solving a problem.

I encourage you to get out there and learn more about regular expressions. You’ll find that it is uncomfortable at first. You will make some mistakes. Study and search for internet resources and examples to help you accomplish a specific task using regular expressions until you start getting more comfortable and making fewer mistakes. You’ll eventually find that the simpler cases (like what I’ve outlined above) become easy and you have a foundation of knowledge to build on for the more complex cases.

For cases where the regex you are creating is important to some function of your application I highly recommend using a tool such as regex101.com. Regex101 lets you save a library of commonly used regular expressions and even create unit tests against them. A tool like this allows you to validate your assumptions about your regex and document cases of how you are using it. When you eventually find a case your regex doesn’t match or matches unexpectedly, you can dissect the regex and create new unit tests to validate any changes you make.

Helpful Links:

Leave a Reply

Your email address will not be published. Required fields are marked *