java - need to clean malformed tags using regular expression -


Searching for the proper regular expression for the following situations:

I have to clear some tags for free Flowing text For example, I have two important tags within the text: & lt; 2004: 04: 12 & gt; and & lt; Person's name & gt; Unfortunately, some tags include "& lt;" Or ">" delimiter

For example, some are as follows:

  1)  

I tried to use the following for position 1:

  string regex = "& lt; \\ d {4} - \\ D {2} - \\ d {2} \\ w * {2} [^ & gt;] "; String output = content.replaceAll (regex, "$ 0>");  

All examples of "2004: 04: 12" were found and the result was "& lt; 2004: 04: 12>". However, I need to eliminate the location before the finished tag.

Of course this is the best way. any suggestion.

Thanks

In fact, you have a negative look- forward, like this:

  string regex = "& lt; \\ d {4} - \\ d {2} - \\ d {2} (?! & Gt;)"; String output = content.replaceAll (regex, "$ 0>");  

This will help with numerical "tags", but any regex can not be intelligent enough to match an arbitrary name, then you either too should define the name can look like this, or deal with the fact that the only way to "name" tags is impossible.


Comments

Popular posts from this blog

windows - Heroku throws SQLITE3 Read only exception -

lex - Building a lexical Analyzer in Java -

python - rename keys in a dictionary -