Regular expressions (regex or regexp) can help save time and effort when working with Google Analytics, Google Tag Manager and now Google Data Studio. They add extra flexibility to the way you create definitions or include & exclude data from your reports.
You can think of regex as specific sequences of characters that broadly or narrowly match patterns in your data. And to build a pattern you need learn the regex metacharacters. In this post, I will be covering most of the metacharacters along with practical examples from Google Data Studio.
Before we start, I recommend you have a look and understand the formulas in the article below, because all of our regex examples are based on these formulas.
Create Advanced Dimensions in Google Data Studio via these 6 Formulas
Caret ^
It matches the start of a string. In other words, it means something begins with.
^social matches social media and social network, but not online social.
Practical Use in Google Data Studio
Let’s say you have 3 paid social sources in your Google Analytics (facebook / paid-social, instagram/paid-social, twitter/paid-social). And you want to group any medium that starts with the string paid under a channel grouping called “Paid” and anything else under “Organic”.
CASE
WHEN REGEXP_MATCH(meduim,"^paid") THEN "Paid"
ELSE "Organic"
End
Dot .
It matches any single character.
go.gle matches google, goagle and gosgle but not gogle.
The power of using the dot lies in combining it together with other characters.
Practical Use in Google Data Studio
Google Data Studio by default has a formula to convert all text to uppercase, but what if you want to capitalize only the first letter of your traffic sources in Google Analytics and create a report of the same.
This can be done in three steps using the caret and the dot together ^. (which means select the first letter):
1- Extract the first letter using REGEXP_EXTRACT() with the regular expression ^. which captures the first letter.
2- Capitalize the extracted letter using UPPER().
3- Replace the first letter in Source with nothing using REGEXP_REPLACE() and concatenate it with the extracted CAP letter using CONCAT().
So if we put all the formulas altogether, the final syntax will be as below:
CONCAT(UPPER(REGEXP_EXTRACT(Source, "(^.)")), REGEXP_REPLACE(Source, "^.", ""))
Asterisk *
It matches the preceding character 0 or more times.
go*gle matches with google, gooogle, gooooogle and gogle, but not goegle.
One of the most powerful combinations of asterisk is with the dot (.*) It actually matches everything,
go.* matches with google, goooogle, goosssf and goelge.
Practical Use in Google Data Studio
You have search and display campaigns running within the same AdWords account, you want to review and compare the performance of each channel separately. Your campaigns naming includes the acronym SN for search and CN for display.
^.* means text starts with anything, then contains (SN, CN) and finally ends with any string.
So basically, the regex below means search for text contains my acronyms (SN, CN) in the campaign name.
CASE
WHEN REGEXP_MATCH(Campaign,"^.*(SN).*") THEN "Search"
WHEN REGEXP_MATCH(Campaign,"^.*(CN).*") THEN "Display"
ELSE "Undefined"
End
Pipe |
It means OR.
men|women match with men clothes or clothes for women.
Practical Use in Google Data Studio
You can run a performance report of your landing pages grouped into unified dimensions. For example, you’re running a fashion website that sells dresses, shirts, boots, sandals..etc. You want to compare the performance of your website categories and review which category has the most sessions or transactions.
CASE
WHEN REGEXP_MATCH(Landing Page,"^.*(shirt|dress|jeans") THEN "Clothes"
WHEN REGEXP_MATCH(Landing Page,"^.*(boot|sandal|sneaker") THEN "Shoes" ELSE "Undefined"
End
Dollar Sign $
It means text ends with.
color$ matches red color and green color, but not color yellow.
Practical Use in Google Data Studio
You want to run a URL performance report of your website, but some of your links end with .php and you want to standardize all the URLs by removing .php only from the end.
REGEXP_REPLACE(Landing Page, "(.php$)", "")
Question Mark ?
It matches the preceding character zero or one time. In other words, it means the previous character is optional.
colo?r matches colr and color, but not colour.
Practical Use in Google Data Studio
You want to run a keywords performance report of your branded and non-branded terms.
Optimization ?up matches both; optimization up with space and optimizationup without space.
CASE
WHEN REGEXP_MATCH(Keyword,".*(optimization ?up).*$") THEN "Paid"
ELSE "Organic"
End
Square brackets []
Matches the enclosed characters in any order anywhere in a string.
col[oau]r matches color, colar and colur, but not coloaur.
Square brackets are very powerful when they are combined with dashes. Dashes create a range of characters between the brackets.
[a-z] matches any lowercase letter between a to z.
[0-9] matches any number from 0 to 9.
Practical Use in Google Data Studio
Let’s assume you sell cars and you want to create a performance report for all the sold car models between 2010 and 2017. You can create a filter for your query using the regular expression below.
Parentheses ()
Matches the enclosed characters in exact order anywhere in a string. It also uses to group characters or string together.
g(oo)gle matches only google.
The power of parentheses lies in combing it with different metacharacters.
Practical Use in Google Data Studio
Let’s say you want to run a report of the categories included in your campaigns names. Your campaign naming looks like below:
SN-EN_Category:Cars_Exact
SN-EN_Category:Bikes_BMM
To extract the category name from the campaign name, we can write something as follows:
REGEXP_EXTRACT(Campaign, "Category:(.*?[^_]*)")
.*? means capture everything after Category: The ? is added to make .* non-greedy by matching little data.
[^_]* to stop at the _ underscore. Please note when the caret comes between the square brackets [^], it means doesn’t contain.

Hey Ahmed,
Thanks for your article that helped me fine up my Data Studio report 🙂
I have a question :
In your last example you illustrate with an formula that allows to extract everything AFTER some specific text.
What would be the formula to extract everything BEFORE some specific text?
For example in ‘SN-EN_Category:Cars_Exact’ I’d like to keep ‘SN-EN_Category:Cars’ and remove ‘_Exact’?
Thank you in advance for your help!
Hi Patrick
Thanks for your comment.
I would use something like this REGEXP_EXTRACT(Campaign, “(.*?[^_]*)”) to extract everything before the underscore _
Let me know if you need anything else.
Best,
Ahmed
it isn’t working. I have a null result… :/
Can you please share with me the exact name of your campaign or the field you want to extract the text from?
actually I’m trying to strip the ‘Page Title’ dimension from Google Analytics (so ‘Title’ field from Google Data Studio.
let’s says I have 4 pages with the following titles:
‘Home | Mywebsite’
‘About | Mywebsite’
‘Services | Mywebsite’
‘Contact | Mywebsite’
so I’d like to display only ‘Home’, ‘About’, ‘Services’ & ‘Contact’
what would be the formula to remove ‘| Mywebsite’ from my report?
thank you in advance!
Hi Partick,
This should work REGEXP_EXTRACT(Page Title, “(^.*?[^|]*)”)
I’ve tested it out on my blog pages, see below:
https://goo.gl/photos/PYAWiW5gK1aXfqJQ6
Great article with great insights. In this line we’ve developed a Free Google Data Studio Template for Ecommerce. You just have to connect your own Google Data sources and start using it here: https://goo.gl/O04k38. What do you think?
I want to group 3 landing pages to return a single value (there are 3 URLs that are all the home page) but I want to return the actual Landing Page value in all other cases. I tried:
CASE
WHEN REGEXP_MATCH(Landing Page,”^\/(en-us((\/index)?\.html)|index\.html)”) THEN “Home Page”
ELSE (Landing Page)
End
However, it doesn’t seem to like my ELSE statement. Any ideas?
Hello,
Add the false condition before ELSE and it should work:
CASE
WHEN REGEXP_MATCH(Landing Page,”^\/(en-us((\/index)?\.html)|index\.html)”) THEN “Home Page”
WHEN REGEXP_MATCH(Landing Page,”^.*[^index].*”) THEN (Landing Page)
ELSE “Undefined”
End
Let me know hot it goes.
Best,
Ahmed
Thanks for sharing! I have a question – maybe you or someone here might help me to figure out whats the best way to manage it with data studio. I have a field that could contain one ID or even an aggregation of IDs in the data source but I want to separate them. So if the value is “1” it should count to “abc”, if the value is “2” to “xyz” and so on…. same with the value “1,2,3”. That works so far but if I manage the separation with a case statement the “1,2,3” does of course count for “abc” but not for “xyz” anymore. Furtheron “2,1,3” doesn’t count for “abc”. I’m thankful for any help!
Hi Tony,
I am sorry, I haven’t fully understood what you’re trying to achieve here, can you please elaborate more with examples?
Can you also share the function you’ve written?
Thanks,
Ahmed
Hi,
I’d like to create custom dimension where I;ve got 3 types of campaign:
1. Search
2. GDN
3. APP
All campaigns have a prefix:
1. Search starts [Search]…
2. GDN starts [GDN]…
3. APP starts [APP]…
I create regex:
CASE
WHEN REGEXP_MATCH(Campaign, “^.*(Search).*”) THEN “Search”
WHEN REGEXP_MATCH(Campaign type, “^.*(GDN).*”) THEN “GDN”
WHEN REGEXP_MATCH(Campaign type, “^.*(APP).*”) THEN “APP”
ELSE “Other” END
But there is error
Unable to aggregate ratio metrics in the request. Please select another metric.
Error ID: 068984b8
In standard regex I’d use “\” to negation brace but in DC it’s not working :/
So I use .* to work around the problem with brackets.
Any idea what I working wrong?
Hi Xeo,
You can use double \\ to escape characters, please try it and let me know how it goes.
Best,
Ahmed
Hi @Ahmed Ali,
thank you for your reply. Here my problem with an example – hope that makes things clearer…
I try to extract different numbers out of a row of numbers.
E.g.: I try to extract 8 and use
CASE WHEN REGEXP_MATCH(Text “Source”, “^.*8.*”) THEN “Hello” ELSE “Others” END
or maybe just CASE WHEN REGEXP_MATCH(Text “Id [Courses]”, “.*8.*”) THEN “Hello” ELSE “Others” END
For my understanding these expressions should extract the 8 out of every series but it doesn’t… I have the following options with 8 in the series:
39x(7,8) + 3x(8) + 2x(8,1) + 3x(8,9) + 1x(11,8) + 3x(11,7,8) + 11x(11,7,8,9) + 31x(11,7,8,9,10) + 1x(11,7,8,9,10,12) + 2x(7,8,10) + 224x(7,8,9) + 13x(7,8,9,10) = 334 times the 8 in a series.
But my result is 48 so my assumption is that only (7,8), (8), (8,1), (8,9) and (11,8) are detected…
In the data studio forum I got a reply to try this:
CASE
WHEN REGEXP_MATCH(Text “Source”, “x\\([^8\s]*8[^8\s]*\\)”) THEN “Hello”
ELSE “0”
END
but unfortunetly data studio is not parsing this function.
I also want to detect the othere numbersfrom above(7,9,11,…) so if there is a good solution to make this I’ll be gratefull
Do you or does any body know why? :/ Thank you for any advice.
P.S. The hint is working properly here
https://regex101.com/r/lMjiRs/3
And my solution as well (https://regex101.com/r/lMjiRs/3) even if the “88” is of course also detected then – but that is not a possible value in my example anyway