Fun Stuff > CLIKC
A programming thread!
Pilchard123:
Try http://www.regexr.com/ - it's not perfect (it uses a JavaScript regex engine which can be a bit flaky with lookarounds), but does the job for most things I've needed.
EDIT:
Must it be all regex? Could you use something like (code not checked, but hopefully you get the idea)
--- Code: ---String toGetWords = "a long string of tokens that blah blah blah blah blah"
String delimiters = "a string containing all the delimiter characters"
StringTokenizer st = new StringTokenizer(toGetWords, delimiters)
while(st.hasMoreTokens()){
String token = st.nextToken();
//Trim trailing hyphen, or discard the token if the trailing hyphen disqualifies it.
//Then use a regex to check if the token contains [A-ZÄÖÜ], case insensitive.
}
--- End code ---
Pilchard123:
Doublepost for great justice!
http://codegolf.stackexchange.com/
ankhtahr:
I think it should be relatively easy, My idea is to allow hyphens only if they are followed by another symbol. I just don't know, how to put this into a regex. And I feel like the other solution would stumble over a few other cases.
Huh, I just noticed that my regex has another problem. Words which start with a number won't be recognised.
Edit: found a solution:
--- Code: ---\d*[a-zäöü]+([a-zäöü0-9]|\-[a-zäöü0-9])*
--- End code ---
Pilchard123:
Have look at lookaheads, though I can't remember how to also capture those.
Tea For The Tillerman:
--- Quote from: ankhtahr on 29 Aug 2014, 11:28 ---Anyone here who knows their (Java) regexes better than I do?
I need to extract every "word" from a very long string (the strings may only contain the symbols mentioned in the description here). A word delimiter is either a space or any of the following symbols:
--- Code: ---.,;:_-?!
--- End code ---
Words which are connected with a hyphen are considered one word. (e.g. this-is-all-one-word) A word may contain numbers and umlauts, but must consist of at least one letter. My current regex for that is this one:
--- Code: ---[A-ZÄÖÜ]+[^.,;:_?!\ ]*
--- End code ---
(I use CASE_INSENSITIVE of course)
I still need a way to cut off a hyphen at the end of a word. I tried it with a "-?" at the end, putting the rest in parenthesis, but that doesn't work of course, because the "not these special characters" is greedy.
--- End quote ---
If this regex works:
--- Code: ---[A-ZÄÖÜ]+[^.,;:_?!\ ]*
--- End code ---
(gonna call this block "word" from now on)
Couldn't you write something like this:
--- Code: ---<word>[-<word>]*
--- End code ---
Basically saying "must have one word, but can have multiples as long as subsequent words start with a hyphen".
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version