Feature #4818
eit: allow use of scraping regex as a filter
0%
Description
Currently scraping stops on the first regex match, and the matched subpatterns are returned as the result of scraping.
For some scraping scenarios, it would help keep regexes simple if a regex could be flagged as a filter; on a match, the matched subpatterns are passed on to subsequent regexes as their input text rather than being returned as a result.
For example, consider a UK Freeview title continuation:
title: This Title continues...
summary: ...into the summary. Which has further text.
When processing the summary in this case, it simplifies regexes considerably if an early filter regex can remove "...into the summary. " from the text to be considered by subsequent regexes looking for subtitles etc.
(I have a change in preparation that does this).