The C++11 standard introduced user-defined string suffixes. It also added regular expressions to the C++ language as a standard feature. I wanted to have fun and see whether we could combine these features.
Regular expressions are useful to check whether a given string matches a pattern. For example, the expression \d+ checks that the string is made of one or more digits. Unfortunately, the backlash character needs to be escaped in C++, so the string \d+ may need to be written as "\\d+" or you may use a raw string: a raw string literal starts with R"( and ends in )" so you can write R"(\d+)". For complicated expressions, a raw string might be better.
A user-defined string literal is a way to specialize a string literal according to your own needs. It is effectively a convenient way to design your own “string types”. You can code it up as:
myclass operator"" _mysuffix(const char *str, size_t len) { return myclass(str, len); }
And once it is defined, instead of writing myclass("mystring", 8), you can write "mystring"_mysuffix.
In any case, we would like to have a syntax such as this:
bool is_digit = "\\d+"_re("123");
I can start with a user-defined string suffix:
convenience_matcher operator "" _re(const char *str, size_t) { return convenience_matcher(str); }
I want my convenience_matcher to construct a regular expression instance, and to call the matching function whenever a parameter is passed in parenthesis. The following class might work:
#include <regex> struct convenience_matcher { convenience_matcher(const char *str) : re(str) {} bool match(const std::string &s) { std::smatch base_match; return std::regex_match(s, base_match, re); } bool operator()(const std::string &s) { return match(s); } std::regex re; };
And that is all. The following expressions will then return a Boolean value indicating whether we have the required pattern:
"\\d+"_re("123") // true "\\d+"_re("a23") // false R"(\d+)"_re("123") // true R"(\d+)"_re("a23") // false
I have posted a complete example. It is just for illustration and I do not recommend using this code for anything serious. I am sure that you can do better!
The very first sentence contains a typo ‘used-defined’ which should be changed to ‘user-defined’. The second paragraph contains another typo ‘so you can write R”(\d+)” ‘ which should be changed to ‘so you can write R”(\d+)” ‘
Thanks.
Thanks Daniel! Do you know of any practical applications for operator””? Operator overloading has become frowned upon for its obscurity, but this one has the potential to carry meaningful names.
I think that’s different from standard operator overloading. You have to specify a suffix, so you can’t use it by accident.
We use it in the simdjson library.
Off topic, but what’s your opinion of adding operator overloading to C, in a way that the user would name the function that implements that operator, and therefore doesn’t require name mangling?
It’s an idea I’ve been kicking around.
Operator overloading in C or in C++?
In regular, old fashioned C.
I’m thinking something like: _Operator = UTF8String_Init;
Where UTF8String_Init is a previously declared function.
Implementation details could still be hidden, the _Operator declaration could be in a header too, and no name mangling necessary since the function has already been named by the programmer.
No need for a class to contain it either, since the types could be desuced from the parameters of the named function e.g: UTF8String UTF8String_Init(char8_t *Characters);
And for strings, I don’t see why there couldn’t be multiple variants for the same operator.
Like:
UTF8String UTF8String_InitFromChars(char8_t *Chars);
UTF8String UTF8String_InitFromChar(char8_t Char);
_Overload = UTF8String_InitFromChar;
_Overload = UTF8String_InitFromChars;
Basically, soft function overloading, but with better names.