Bug #5172
episode and season from EIT scraping is 1 character short
100%
Description
I had EIT scraping from freesat and canal digital working with 4.3.1001~93ff1f4d (2018-1-25).
I installed 4.3-1292~g9b9ee6859~bionic yesterday and find that season and episode no longer scrape from EIT.
On ITV2 I see the Ellen DeGeneris show runs "A talk-variety show featuring special guests and celebrities and hosted by Ellen DeGeneres. [S] S15 Ep46"
but the episode is calculated as s01.e04
This is generic: the last character of the season and the episode are removed on all EIT based data.
I'm using "Over the air EIT freesat" with scraper configuration UK, and "EIT: dbv grabber" for Canal Digital with a parsing configuration that worked with 4.3.1001.
Files
History
Updated by g siviero over 6 years ago
- File Screenshot from 2018-08-08 17-01-01.png Screenshot from 2018-08-08 17-01-01.png added
- File Screenshot from 2018-08-08 17-01-57.png Screenshot from 2018-08-08 17-01-57.png added
- File Screenshot from 2018-08-08 17-02-33.png Screenshot from 2018-08-08 17-02-33.png added
- File Screenshot from 2018-08-08 17-08-52.png Screenshot from 2018-08-08 17-08-52.png added
Using Version 4.3-1288~g66d6161c5 and I also confirm that EIT season/episode scraping no longer works correctly.
I also noticed the final character missing from EPG title entries with scraper "OpenTV Sky IT" (Examples: Italian fishing TV, Euronews English HD, Euronews greek, TV5 Monde, teleSUR, Telepace HD, TVR International, ...).
Updated by g siviero over 6 years ago
On a system with version 4.3-1152~g2baa719-dirty season/episode works correctly.
Updated by g siviero over 6 years ago
From the epggrab trace I see that it looks like the scraper removes one character (the last?) from both episode and series, so for example:
Episodes and series with only one character are never matched
2018-08-09 10:17:19.407 [ TRACE]:epggrab: pattern "\[?S([0-9]+)\]?" matches '' from 'S2 Ep6 Mara e Carlo - In ogni episodio una coppia in crisi cerca di risolvere i propri problemi sessuali con l'aiuto di una squadra di esperti.'
For episodes and series with two characters only the first character is considered (S10 -> Season 1 | Ep24 -> Episode 2)
2018-08-09 10:17:19.407 [ TRACE]:epggrab: pattern "\[?S([0-9]+)\]?" matches '1' from 'S10 Ep24 - Incontriamo coppie desiderose di investire nel mercato immobiliare a trovare la casa giusta da ristrutturare e da affittare. Il makeover non e' mai stato tanto redditizio!' 2018-08-09 10:17:19.407 [ TRACE]:epggrab: pattern " ?[Ee]p\.? ?([0-9]+)" matches '2' from 'S10 Ep24 - Incontriamo coppie desiderose di investire nel mercato immobiliare a trovare la casa giusta da ristrutturare e da affittare. Il makeover non e' mai stato tanto redditizio!'
At the moment I don't know if the last character of the string passed for match searching is removed or if the repetition character "+" from the regex is ignored (so only the first matching character is considered).
Updated by g siviero over 6 years ago
The code is quite intricate, on a first look I would suggest to check for example:
src/epggrab/module/eitpatternlist.c
Revision bff42221 src/epggrab/module/eitpatternlist.c
line 146, before commit:
strncat(buf, matchbuf, size_buf - len - 1);
line 145, after commit:
strlcat(buf, matchbuf, size_buf - len);
and following lines.
Updated by Rob vh over 6 years ago
A trace epggrab illustrates where it goes wrong:
intention to grab the season as 12
2018-08-09 12:21:29.862 [ TRACE]:epggrab: pattern "\(s ?([0-9]+),? afl ?[0-9]+/[0-9]+\)" matches '1' from '(s 12, afl 11/21) (USA - 2013)'
intention to grab the year as 2013
2018-08-09 12:21:29.862 [ TRACE]:epggrab: pattern "\([a-zA-Z]* ?- ?([0-9][0-9][0-9][0-9])\)" matches '201' from '(s 12, afl 10/21) (USA - 2013)'
Updated by g siviero over 6 years ago
the problem must be in function eit_pattern_apply_list from file src/epggrab/module/eitpatternlist.c
Updated by g siviero over 6 years ago
Update:
the problem is in function eit_pattern_apply_list because it calls regex_match_substring from src/wrappers.c and there is the problem.
The commit bff42221 modified line 607-608
before:
memcpy(buf, regex->re_posix_text + regex->re_posix_match[number].rm_so, size); buf[size] = '\0';
after:
strlcpy(buf, regex->re_posix_text + regex->re_posix_match[number].rm_so, size);
and this seems to be the origin of the missing character. In fact if I return to the previous code, season and episode numbers are again correct.
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: eit: dtag 4D dlen 195 Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 69 74 61 0F 42 61 6E 63 6F 20 64 65 69 20 70 75 ita.Banco dei pu Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 67 6E 69 AF 53 74 2E 31 33 20 45 70 2E 38 20 27 gni.St.13 Ep.8 ' Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 47 75 65 72 72 61 27 20 2D 20 41 73 68 6C 65 79 Guerra' - Ashley Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 20 73 69 20 69 6E 74 72 6F 6D 65 74 74 65 20 74 si intromette t Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 72 61 20 64 75 65 20 69 6D 70 69 65 67 61 74 69 ra due impiegati Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 20 73 63 61 74 65 6E 61 6E 64 6F 20 75 6E 61 20 scatenando una Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 6C 6F 74 74 61 20 66 72 61 20 6C 6F 72 6F 20 64 lotta fra loro d Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 75 65 2E 20 49 6E 74 61 6E 74 6F 2C 20 53 65 74 ue. Intanto, Set Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 68 20 74 65 6E 74 61 20 64 69 20 67 75 61 64 61 h tenta di guada Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 67 6E 61 72 65 20 73 75 20 75 6E 20 61 66 66 61 gnare su un affa Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 72 65 20 73 75 20 61 6C 63 75 6E 69 20 6F 67 67 re su alcuni ogg Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 65 74 74 69 20 64 61 20 63 6F 6C 6C 65 7A 69 6F etti da collezio Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 6E 65 2E ne.
Aug 13 10:09:20 openHABianPi tvheadend[28945]: epggrab: pattern "\[?St\.([0-9]+)\]?" matches '13' from 'St.13 Ep.8 'Guerra' - Ashley si intromette tra due impiegati scatenando una lotta fra loro due. Intanto, Seth tenta di guadagnare su un affare su alcuni oggetti da collezione.' Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: extract season number 13 using eit
Aug 13 10:09:20 openHABianPi tvheadend[28945]: epggrab: pattern " ?[Ee]p\.? ?([0-9]+)" matches '8' from 'St.13 Ep.8 'Guerra' - Ashley si intromette tra due impiegati scatenando una lotta fra loro due. Intanto, Seth tenta di guadagnare su un affare su alcuni oggetti da collezione.' Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: extract episode number 8 using eit
Updated by g siviero over 6 years ago
From my tests a correction using the function strlcpy should be (src/wrappers.c line 607):
strlcpy(buf, regex->re_posix_text + regex->re_posix_match[number].rm_so, size+1);
Aug 13 10:23:42 openHABianPi tvheadend[1453]: epggrab: pattern "\[?St\.([0-9]+)\]?" matches '7' from 'St.7 Ep.13 - Gli agenti sospettano che una donna stia cercando di entrare in Australia illegalmente usando una carta d'identita' falsa. Un passeggero e' un po' troppo vago nelle sue risposte...' Aug 13 10:23:42 openHABianPi tvheadend[1453]: tbl-eit: extract season number 7 using eit
Aug 13 10:23:42 openHABianPi tvheadend[1453]: epggrab: pattern " ?[Ee]p\.? ?([0-9]+)" matches '13' from 'St.7 Ep.13 - Gli agenti sospettano che una donna stia cercando di entrare in Australia illegalmente usando una carta d'identita' falsa. Un passeggero e' un po' troppo vago nelle sue risposte...' Aug 13 10:23:42 openHABianPi tvheadend[1453]: tbl-eit: extract episode number 13 using eit
Updated by Jaroslav Kysela about 6 years ago
- Status changed from New to Fixed
- % Done changed from 0 to 100
Applied in changeset commit:tvheadend|771080aa77cc9de6dfa259b2d2416895e3c2667b.
Updated by Rob vh about 6 years ago
Ack. Season + Episode numbers scraped from EIT and Freesat are correct again.
Thank you very much!