We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I search HTML documents and often get html tags or their parts in the highlighter results, like:
Абсолютная истина.<br />Не-противостояние и растворение напряжений</h1...Абсолютная истина</h4...В этом мире нет
Most often, these tags or tag pieces are:
</p <br /> </h1, </h2 and so on </em></strong>
I switched to using SentenceFragmenter (which is also more suitable for my needs):
results.fragmenter = highlight.SentenceFragmenter( maxchars=240, sentencechars='</>.!?', charlimit = None )
so it should filter all that out, but it doesn't work. I even tried to escape those characters like this:
sentencechars='\<\/\>.!?'
Nope. It seems I will have to resort to additional search and replace.
The text was updated successfully, but these errors were encountered:
Here's how I clean it: https://gist.github.com/chang-zhao/2a18dcab0b40e3011decefb65c91b4ca
Sorry, something went wrong.
Merge pull request mchaput#28 from jap/version-check
f2ebc71
use idiomatic python version check
No branches or pull requests
I search HTML documents and often get html tags or their parts in the highlighter results, like:
Most often, these tags or tag pieces are:
I switched to using SentenceFragmenter (which is also more suitable for my needs):
so it should filter all that out, but it doesn't work. I even tried to escape those characters like this:
sentencechars='\<\/\>.!?'
Nope. It seems I will have to resort to additional search and replace.
The text was updated successfully, but these errors were encountered: