For example, AnyOf("\r\n").Negate() creates a CharMatcher that matches any character except return or newline. You can then do things like cm.IndexIn(str) or cm.CountIn(str)
Some of this you can do directly with Go libraries. The unicode package provides some standard "matchers" like IsDigit and IsLetter. And the strings package has functions like IndexFunc and TrimFunc that take predicates.
But they don't do everything that CharMatcher does, so as an exercise I thought I'd try implementing something like CharMatcher in Go.
My first approach was basically an object-oriented style like I'd use in Java with CharMatch as an interface.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
type CharMatch interface { | |
Match(c rune) bool | |
} | |
type is_cm struct { | |
c rune | |
} | |
func (cm is_cm) Match(c rune) bool { | |
return c == cm.c | |
} | |
func Is(c rune) CharMatch { | |
return is_cm{c} | |
} |
My next thought was to get rid of the interface and have a generic struct containing a matching function as a member. This uses closures to store the matcher parameters rather than structs.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
type CharMatch struct { | |
fn func(rune) bool | |
} | |
func (cm CharMatch) Match(c rune) bool { | |
return cm.fn(c) | |
} | |
func Is(c rune) CharMatch { | |
return CharMatch{ func (c2 rune) bool { return c == c2 }} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
type CharMatch func(rune) bool | |
// Predefined CharMatch's | |
var ( | |
SPACE CharMatch = AnyOf(" \t\r\n") | |
DIGIT CharMatch = InRange('0', '9') | |
LETTER CharMatch = unicode.IsLetter | |
LOWER CharMatch = unicode.IsLower | |
UPPER CharMatch = unicode.IsUpper | |
) | |
// Match returns true if the character matches, otherwise false | |
func (cm CharMatch) Match(c rune) bool { | |
return cm(c) | |
} | |
// Is returns a CharMatch that matches a specific character | |
func Is(c1 rune) CharMatch { | |
return func(c2 rune) bool { return c1 == c2 } | |
} | |
// AnyOf returns a CharMatch that matches any character in a string | |
func AnyOf(s string) CharMatch { | |
return func(c rune) bool { return strings.ContainsRune(s, c) } | |
} | |
// InRange returns a CharMatch that matches any character in the range (inclusive) | |
func InRange(from rune, to rune) CharMatch { | |
return func(c rune) bool { return from <= c && c <= to } | |
} | |
// Negate returns a CharMatch that matches any character not matched by this one | |
func (cm CharMatch) Negate() CharMatch { | |
return func(c rune) bool { return !cm.Match(c) } | |
} | |
// Or returns a CharMatch that matches any character that matches either CharMatch | |
func (cm1 CharMatch) Or(cm2 CharMatch) CharMatch { | |
return func(c rune) bool { return cm1.Match(c) || cm2.Match(c) } | |
} | |
// CountIn returns the number of characters in the string that match | |
func (cm CharMatch) CountIn(s string) int { | |
n := 0 | |
for _, c := range s { | |
if cm.Match(c) { | |
n++ | |
} | |
} | |
return n | |
} | |
// IndexIn returns the index of the first character that matches | |
// or -1 if no match is found | |
func (cm CharMatch) IndexIn(s string) int { | |
return strings.IndexFunc(s, cm) | |
} |
IndexIn is an example of a method that just wraps a strings package function, whereas CountIn has no strings equivalent.
The tests give some examples of how it's used.
One potential drawback of this approach is that the matcher parameters are "buried" in closures. This makes it impossible to do any processing or optimization (like the Guava CharMatcher precomputed method). For example, Is('a').Or(Is('b')) could be folded into AnyOf('ab'). If you wanted to do this, I think you'd have to go back to using structs (like my first approach).
No comments:
Post a Comment