Fun Stuff > CLIKC

A programming thread!

<< < (29/37) > >>

Pilchard123:
Try http://www.regexr.com/ - it's not perfect (it uses a JavaScript regex engine which can be a bit flaky with lookarounds), but does the job for most things I've needed.


EDIT:

Must it be all regex? Could you use something like (code not checked, but hopefully you get the idea)

--- Code: ---String toGetWords = "a long string of tokens that blah blah blah blah blah"
String delimiters = "a string containing all the delimiter characters"
StringTokenizer st = new StringTokenizer(toGetWords, delimiters)

while(st.hasMoreTokens()){
    String token = st.nextToken();
    //Trim trailing hyphen, or discard the token if the trailing hyphen disqualifies it.

    //Then use a regex to check if the token contains [A-ZÄÖÜ], case insensitive.
}

--- End code ---

Pilchard123:
Doublepost for great justice!

http://codegolf.stackexchange.com/

ankhtahr:
I think it should be relatively easy, My idea is to allow hyphens only if they are followed by another symbol. I just don't know, how to put this into a regex. And I feel like the other solution would stumble over a few other cases.

Huh, I just noticed that my regex has another problem. Words which start with a number won't be recognised.

Edit: found a solution:

--- Code: ---\d*[a-zäöü]+([a-zäöü0-9]|\-[a-zäöü0-9])*
--- End code ---

Pilchard123:
Have look at lookaheads, though I can't remember how to also capture those.

Tea For The Tillerman:

--- Quote from: ankhtahr on 29 Aug 2014, 11:28 ---Anyone here who knows their (Java) regexes better than I do?

I need to extract every "word" from a very long string (the strings may only contain the symbols mentioned in the description here). A word delimiter is either a space or any of the following symbols:
--- Code: ---.,;:_-?!
--- End code ---
Words which are connected with a hyphen are considered one word. (e.g. this-is-all-one-word) A word may contain numbers and umlauts, but must consist of at least one letter. My current regex for that is this one:

--- Code: ---[A-ZÄÖÜ]+[^.,;:_?!\ ]*
--- End code ---
(I use CASE_INSENSITIVE of course)
I still need a way to cut off a hyphen at the end of a word. I tried it with a "-?" at the end, putting the rest in parenthesis, but that doesn't work of course, because the "not these special characters" is greedy.

--- End quote ---

If this regex works:

--- Code: ---[A-ZÄÖÜ]+[^.,;:_?!\ ]*
--- End code ---
(gonna call this block "word" from now on)

Couldn't you write something like this:

--- Code: ---<word>[-<word>]*
--- End code ---
Basically saying "must have one word, but can have multiples as long as subsequent words start with a hyphen".

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version