Problems with regexp
Added by H. Fux about 5 years ago
Hi folks,
I try to create an Autorec that does a fulltext search that MUST contain the word "Tatort" and MUST contain at leat one of the words "Boerne" and "Thiel" (Germans will immediately understand what I want to achieve :-)). To my understanding the regexp
(?=.*Tatort)((?=.*Boerne)|(?=.*Thiel))
should do the trick. However, if I enter this regexp, the Autorec just "swallows" the regexp, i.e. the "Title" field is empty. I tried around, but this turned out difficult, since tvheadend seems to go into some infinite loop, eating up 100% CPU if I e.g. enter something like
(.*Thiel)|(.*Boerne)
I use version 4.2.4-dmo1~bpo9+1~rpt1 on Raspbian Stretch.
How would I achieve the above search?
Thanks for help!
- Hauke
Replies (9)
RE: Problems with regexp - Added by Em Smith about 5 years ago
You could try:
Tatort.*(Boerne|Thiel)
(Assuming Boerne or Thiel always occur after Tatort).
If that does not work, you could do two rules:
Tatort.*Boerne Tatort.*Thiel
If "Tatort" is always the first word then you should "anchor" it as "^Tatort".
RE: Problems with regexp - Added by H. Fux about 5 years ago
Thanks for the suggestions!
Tatort is not always the first word - I suppose... will need to think about it and look at few examples, It may well be...
Two rules: Would duplicate detection still work?
RE: Problems with regexp - Added by H. Fux about 5 years ago
- If I create a rule with full text search and regexp ".*Tatort", it gets really everything that has Tatort either in the title or in the description. Fine.
- If I now extend the rule to ".*Tatort.*Boerne" or change it to "Tatort.*Boerne", nothing is found any more.
- If I do a filter ".*Tatort.*Schule", I find a tv show that has both "Tatort" and "Schule" in the description
So I infer that the regexp is used on the title and the description, but seperatly, so either the title or the description must match the full regexp. For the search I'd like to achive the title would have "Tatort" in it, and the description either "Boerne" or "Thiel". Is there a way to ensure that both conditions are met?
Btw.: Is there some documentation about the tvheadend regexp? It does not seem to fully follow standard...
RE: Problems with regexp - Added by Em Smith about 5 years ago
I think 4.2 only uses POSIX regex. You might be able to "man regex" (or web search that phrase) to see the documentation. The developer/unstable branch (4.3) uses pcre (if available) which is probably the regex with which you are more familiar since it allows a much more advanced syntax.
As you infer, the regexp is applied separately to title, sub-title, and description and cannot operate across a combination of all three fields. The regex is an "unanchored" search, so you don't need ".*" at the start. "Tatort" will match anywhere on the line, so it will match "ABC Tatort DEF" or even "ABCTatortDEF" (mid-word).
Regular expressions tend to be "line based" (not "multi-line based"), so even searching across a multi-line description won't work for you. Unfortunately, combining the title/sub-title, and description inside the code would cause either memory usage to increase (since we would need to keep the originals for sending to clients and also keep the combined string), or a slowdown (if we re-generated the string for every EPG event every time we do a match), so it hasn't been implemented.
Two separate rules would apply their own duplicate matching policy. So, if you have "unique description" for both rules then each recording rule applies the duplicate policy and only one would be scheduled to record even if they both matched an EPG event.
BTW: I liked your article on IPTV and ffmpeg. I don't know if you'd be interested in adding a few notes/screenshots to the wiki in case your blog disappears?
https://tvheadend.org/projects/tvheadend/wiki/Automatic_IPTV_Network
RE: Problems with regexp - Added by H. Fux about 5 years ago
Hi Em,
thanks for the detailed explanations!
I can fully understand that the processing overhead for concatenating the title, subtitle and description lines would be considerable. Question: Would it be acceptable to offer regexp fields for title, subtitle and description seperately? In my case, this would already solve my "problem".
Regarding the wiki: Will have a look!
Cheers, Hauke
RE: Problems with regexp - Added by Em Smith almost 5 years ago
Initially, I liked your idea. But, I remembered there was a feature request (that I can't currently find) to also have multiple positive/negative searches something along the lines of "I want movies with title of "bob" and description must not have Deutschland". So, we end up with lots of fields and tickboxes for "must have/must not have." Also, if we added multiple regexp fields, it means people can't test them easily on the EPG grid and will have to manually create autorecs.
One alternative I can give a technical description for future reference/feature request. It might be to change the internal structure of title/sub-title/description from being three separate strings in to being one contiguously allocated buffer, with each sub-field as an offset in to the buffer and individually null terminated. On each EPG query we replace the null with a space, do the regex, then restore the space with a null. That means we only change a couple of bytes per query so is fast and a regex search would then match across title, subtitle, and description using the "(*ANYCRLF)" modifier. The only bigger overhead would be when EPG entries are changed since the whole title/sub-title/description would need to be regenerated, but EPG events rarely change.
RE: Problems with regexp - Added by H. Fux almost 5 years ago
Hi Em,
I just had a look on the Wiki page you pointed me to. I cannot find any possibility to edit the wiki page, and no article that describes how to register for this function. Can you help me here?
Btw.: The article on my German IPTV solution is a bit outdated. One of my readers pointed me to a much easier solution - the according tutorial can be found here: https://projects.webvoss.de/2019/03/23/le-potato-media-center-german-iptv-re-revisited/
Thanks!
Hauke
RE: Problems with regexp - Added by Em Smith almost 5 years ago
Sorry you can't edit it. I can't see how to add you to wiki editors.
Mark Clarkstone: Can you add wiki edit privileges for H. Fux?
Thanks.
RE: Problems with regexp - Added by Mark Clarkstone almost 5 years ago
Em Smith wrote:
Sorry you can't edit it. I can't see how to add you to wiki editors.
Mark Clarkstone: Can you add wiki edit privileges for H. Fux?
Thanks.
Done as requested. H Fux, Em is now your wiki mentor. hehe.