Friday, July 29, 2011

Efficient Regex Pattern for Getting Hashtags

After digging around the Internet for a while and not finding a regex pattern that was able to produce all of the hashtags in a String, I finally created my own based on information I gathered from a few other places.


My sources include the following:

I took this information and created a method in Salesforce to grab all of the hashtags from a String and return it in a Set, as shown below.

 * Get the Set of hashtags (including
 * the '#' character) used within a String in
 * all lower case, for ease of comparison.
 * @param  text The String text to analyze.
 * @return      The Set of hashtags
 *              used within the text.
public static Set getHashtagSet(
        String text) {
    // Instantiate the resulting set.
    Set hashtagSet = new Set();
    // Only look for hashtags if text is given.
    if (text != null) {
        Pattern hashtagPattern = Pattern.compile(
        Matcher hashtagMatcher =
        while (hashtagMatcher.find()) {
        }   // while (hashtagMatcher.find())
    }   // if (text != null)
    // Return the results.
    System.debug('hashtagSet = ' + hashtagSet);
    return hashtagSet;
}   // public Set getHashtagSet(String)