regex - I need to remove Java Script tags using regular expressions and JRegex -


I have to remove all java script tags and content from the HTML code of the web pages, between the middle and style tags. So far I have come up with this expression:

  "(& lt; [\ r \ n \ t] * script ([\ r \ n \ t & gt;] | & gt; ;) {1,} ([\ r \ n \ t] | |) *? & Lt; / [\ r \ n \ t] * script [\ r \ n \ t] * & gt; Lt; [\ r \ n \ T] * noscript ([\ r \ n \ t & gt;] | & gt;) {1,} ([\ r \ n \ t] |.) *? & Lt; / [\ R \ n \ t] * Noscript [\ r \ n \ t] * & gt;) | (& lt; [\ r \ n \ t] * style ([\ r \ n \ t & gt; ] |>) {1,} ([\ r \ n \ T] | |) *? & Lt; / [\ r \ n \ t] * style [\ r \ n \ t] * & gt; ) " 

I use the JRGax library to work with regular expressions when I do test it in any ReJex tester Works fine, but once I run my program - all this crashes with this error report:

  Exceptions to the thread "thread-0" java.lang java. Use.regex.Pattern $ branch.match at java.util.regex.Pattern $ BmpCharProperty.match (unknown source) at java.util.regex.Pattern $ BranchConn.match (unknown source) .stackOverflowError java. Use.regex.Pattern $ java.util.regex.Pattern $ groupTail.match at java.util.regex.Pattern $ LazyLoop.match on GroupHead.match (unknown source) (unknown source) .atil .regex.Pattern $ BranchConn.match (unknown Source) at java.util.regex.Pattern $ CharProperty.match (Unknown Source) java.util.regex.Pattern $ branch.match (Unknown Source) java.util at .regex.Pattern $ GroupHead.match (unknown source) java .util.regex.Pattern at $ LazyLoop.match (unknown source) ...................... ............ < / Code> 

And it keeps going forever. Anybody can give me an advice on this - I would be very grateful

why not use it. Remove an HTML parser and simply & lt; Script & gt; and & lt; Style & gt; nodes


Comments

Popular posts from this blog

windows - Heroku throws SQLITE3 Read only exception -

lex - Building a lexical Analyzer in Java -

python - rename keys in a dictionary -