Project

General

Profile

Bug #5172

episode and season from EIT scraping is 1 character short

Added by Rob vh over 6 years ago. Updated about 6 years ago.

Status:
Fixed
Priority:
Normal
Assignee:
Category:
EPG - Grabbers
Target version:
-
Start date:
2018-08-07
Due date:
% Done:

100%

Estimated time:
Found in version:
4.3-1292~g9b9ee6859~bionic
Affected Versions:

Description

I had EIT scraping from freesat and canal digital working with 4.3.1001~93ff1f4d (2018-1-25).
I installed 4.3-1292~g9b9ee6859~bionic yesterday and find that season and episode no longer scrape from EIT.

On ITV2 I see the Ellen DeGeneris show runs "A talk-variety show featuring special guests and celebrities and hosted by Ellen DeGeneres. [S] S15 Ep46"
but the episode is calculated as s01.e04

This is generic: the last character of the season and the episode are removed on all EIT based data.

I'm using "Over the air EIT freesat" with scraper configuration UK, and "EIT: dbv grabber" for Canal Digital with a parsing configuration that worked with 4.3.1001.


Files

History

#1

Updated by g siviero about 6 years ago

Using Version 4.3-1288~g66d6161c5 and I also confirm that EIT season/episode scraping no longer works correctly.

I also noticed the final character missing from EPG title entries with scraper "OpenTV Sky IT" (Examples: Italian fishing TV, Euronews English HD, Euronews greek, TV5 Monde, teleSUR, Telepace HD, TVR International, ...).

#2

Updated by g siviero about 6 years ago

On a system with version 4.3-1152~g2baa719-dirty season/episode works correctly.

#3

Updated by g siviero about 6 years ago

From the epggrab trace I see that it looks like the scraper removes one character (the last?) from both episode and series, so for example:

Episodes and series with only one character are never matched

2018-08-09 10:17:19.407 [  TRACE]:epggrab:   pattern "\[?S([0-9]+)\]?" matches '' from 'S2 Ep6 Mara e Carlo - In ogni episodio una coppia in crisi cerca di risolvere i propri problemi sessuali con l'aiuto di una squadra di esperti.'

For episodes and series with two characters only the first character is considered (S10 -> Season 1 | Ep24 -> Episode 2)

2018-08-09 10:17:19.407 [  TRACE]:epggrab:   pattern "\[?S([0-9]+)\]?" matches '1' from 'S10 Ep24 - Incontriamo coppie desiderose di investire nel mercato immobiliare a trovare la casa giusta da ristrutturare e da affittare. Il makeover non e' mai stato tanto redditizio!'
2018-08-09 10:17:19.407 [  TRACE]:epggrab:   pattern " ?[Ee]p\.? ?([0-9]+)" matches '2' from 'S10 Ep24 - Incontriamo coppie desiderose di investire nel mercato immobiliare a trovare la casa giusta da ristrutturare e da affittare. Il makeover non e' mai stato tanto redditizio!'

At the moment I don't know if the last character of the string passed for match searching is removed or if the repetition character "+" from the regex is ignored (so only the first matching character is considered).

#4

Updated by g siviero about 6 years ago

The code is quite intricate, on a first look I would suggest to check for example:

src/epggrab/module/eitpatternlist.c

Revision bff42221 src/epggrab/module/eitpatternlist.c

line 146, before commit:
strncat(buf, matchbuf, size_buf - len - 1);

line 145, after commit:
strlcat(buf, matchbuf, size_buf - len);

and following lines.

#5

Updated by Rob vh about 6 years ago

A trace epggrab illustrates where it goes wrong:

intention to grab the season as 12

2018-08-09 12:21:29.862 [  TRACE]:epggrab:   pattern "\(s ?([0-9]+),? afl ?[0-9]+/[0-9]+\)" matches '1' from '(s 12, afl 11/21) (USA - 2013)'

intention to grab the year as 2013

2018-08-09 12:21:29.862 [  TRACE]:epggrab:   pattern "\([a-zA-Z]* ?- ?([0-9][0-9][0-9][0-9])\)" matches '201' from '(s 12, afl 10/21) (USA - 2013)'

#6

Updated by g siviero about 6 years ago

the problem must be in function eit_pattern_apply_list from file src/epggrab/module/eitpatternlist.c

#7

Updated by g siviero about 6 years ago

Update:
the problem is in function eit_pattern_apply_list because it calls regex_match_substring from src/wrappers.c and there is the problem.

The commit bff42221 modified line 607-608

before:

memcpy(buf, regex->re_posix_text + regex->re_posix_match[number].rm_so, size);
buf[size] = '\0';

after:

strlcpy(buf, regex->re_posix_text + regex->re_posix_match[number].rm_so, size);

and this seems to be the origin of the missing character. In fact if I return to the previous code, season and episode numbers are again correct.

Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: eit:  dtag 4D dlen 195
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 69 74 61 0F 42 61 6E 63 6F 20 64 65 69 20 70 75 ita.Banco dei pu
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 67 6E 69 AF 53 74 2E 31 33 20 45 70 2E 38 20 27 gni.St.13 Ep.8 '
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 47 75 65 72 72 61 27 20 2D 20 41 73 68 6C 65 79 Guerra' - Ashley
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 20 73 69 20 69 6E 74 72 6F 6D 65 74 74 65 20 74  si intromette t
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 72 61 20 64 75 65 20 69 6D 70 69 65 67 61 74 69 ra due impiegati
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 20 73 63 61 74 65 6E 61 6E 64 6F 20 75 6E 61 20  scatenando una
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 6C 6F 74 74 61 20 66 72 61 20 6C 6F 72 6F 20 64 lotta fra loro d
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 75 65 2E 20 49 6E 74 61 6E 74 6F 2C 20 53 65 74 ue. Intanto, Set
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 68 20 74 65 6E 74 61 20 64 69 20 67 75 61 64 61 h tenta di guada
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 67 6E 61 72 65 20 73 75 20 75 6E 20 61 66 66 61 gnare su un affa
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 72 65 20 73 75 20 61 6C 63 75 6E 69 20 6F 67 67 re su alcuni ogg
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 65 74 74 69 20 64 61 20 63 6F 6C 6C 65 7A 69 6F etti da collezio
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit: 6E 65 2E                                        ne.
Aug 13 10:09:20 openHABianPi tvheadend[28945]: epggrab:   pattern "\[?St\.([0-9]+)\]?" matches '13' from 'St.13 Ep.8 'Guerra' - Ashley si intromette tra due impiegati scatenando una lotta fra loro due. Intanto, Seth tenta di guadagnare su un affare su alcuni oggetti da collezione.'
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit:   extract season number 13 using eit
Aug 13 10:09:20 openHABianPi tvheadend[28945]: epggrab:   pattern " ?[Ee]p\.? ?([0-9]+)" matches '8' from 'St.13 Ep.8 'Guerra' - Ashley si intromette tra due impiegati scatenando una lotta fra loro due. Intanto, Seth tenta di guadagnare su un affare su alcuni oggetti da collezione.'
Aug 13 10:09:20 openHABianPi tvheadend[28945]: tbl-eit:   extract episode number 8 using eit
#8

Updated by g siviero about 6 years ago

From my tests a correction using the function strlcpy should be (src/wrappers.c line 607):

    strlcpy(buf, regex->re_posix_text + regex->re_posix_match[number].rm_so, size+1);
Aug 13 10:23:42 openHABianPi tvheadend[1453]: epggrab:   pattern "\[?St\.([0-9]+)\]?" matches '7' from 'St.7 Ep.13 - Gli agenti sospettano che una donna stia cercando di entrare in Australia illegalmente usando una carta d'identita' falsa. Un passeggero e' un po' troppo vago nelle sue risposte...'
Aug 13 10:23:42 openHABianPi tvheadend[1453]: tbl-eit:   extract season number 7 using eit
Aug 13 10:23:42 openHABianPi tvheadend[1453]: epggrab:   pattern " ?[Ee]p\.? ?([0-9]+)" matches '13' from 'St.7 Ep.13 - Gli agenti sospettano che una donna stia cercando di entrare in Australia illegalmente usando una carta d'identita' falsa. Un passeggero e' un po' troppo vago nelle sue risposte...'
Aug 13 10:23:42 openHABianPi tvheadend[1453]: tbl-eit:   extract episode number 13 using eit
#9

Updated by Jaroslav Kysela about 6 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset commit:tvheadend|771080aa77cc9de6dfa259b2d2416895e3c2667b.

#10

Updated by Rob vh about 6 years ago

Ack. Season + Episode numbers scraped from EIT and Freesat are correct again.
Thank you very much!

Also available in: Atom PDF