regex - I need to remove Java Script tags using regular expressions and JRegex -
I have to remove all java script tags and content from the HTML code of the web pages, between the middle and style tags. So far I have come up with this expression:
"(& lt; [\ r \ n \ t] * script ([\ r \ n \ t & gt;] | & gt; ;) {1,} ([\ r \ n \ t] | |) *? & Lt; / [\ r \ n \ t] * script [\ r \ n \ t] * & gt; Lt; [\ r \ n \ T] * noscript ([\ r \ n \ t & gt;] | & gt;) {1,} ([\ r \ n \ t] |.) *? & Lt; / [\ R \ n \ t] * Noscript [\ r \ n \ t] * & gt;) | (& lt; [\ r \ n \ t] * style ([\ r \ n \ t & gt; ] |>) {1,} ([\ r \ n \ T] | |) *? & Lt; / [\ r \ n \ t] * style [\ r \ n \ t] * & gt; ) "
I use the JRGax library to work with regular expressions when I do test it in any ReJex tester Works fine, but once I run my program - all this crashes with this error report:
Exceptions to the thread "thread-0" java.lang java. Use.regex.Pattern $ branch.match at java.util.regex.Pattern $ BmpCharProperty.match (unknown source) at java.util.regex.Pattern $ BranchConn.match (unknown source) .stackOverflowError java. Use.regex.Pattern $ java.util.regex.Pattern $ groupTail.match at java.util.regex.Pattern $ LazyLoop.match on GroupHead.match (unknown source) (unknown source) .atil .regex.Pattern $ BranchConn.match (unknown Source) at java.util.regex.Pattern $ CharProperty.match (Unknown Source) java.util.regex.Pattern $ branch.match (Unknown Source) java.util at .regex.Pattern $ GroupHead.match (unknown source) java .util.regex.Pattern at $ LazyLoop.match (unknown source) ...................... ............ < / Code>
And it keeps going forever. Anybody can give me an advice on this - I would be very grateful
why not use it. Remove an HTML parser and simply & lt; Script & gt;
and & lt; Style & gt;
nodes
Comments
Post a Comment