Project

General

Profile

Feature #2584

Feature #4287: Feature: allow parsing season/episode from Title/Subtitle/Description

scraping season/episode info from the event description in EPG data, for EIT streams

Added by Rob vh almost 10 years ago. Updated about 7 years ago.

Status:
Fixed
Priority:
Normal
Assignee:
Category:
EPG - Grabbers
Target version:
Start date:
2015-01-01
Due date:
% Done:

0%

Estimated time:

Description

Feature 2270 implemented user specified parsing of the event description field in opentv EPG data, searching for unique strings that indicate the location of season and episode numbers.
It would be great if a similar parsing mechanism was available for EIT based titles. For example, Canal Digitaal stores a string
(s 3/afl 4)
to indicate Season 3.Episode 4.
If we could create a file with patterns in epggrab/eit/dict or so, and the eit parser could use it, that would be very helpful.

XMLTV does not help us out much, the networks seem to strip the season/episode info before sending it to tvgids.nl, tvgids.tv and others...
See also Feature 508 and 766 for a sample from Poland.


Files

episodes.pl (1.34 KB) episodes.pl Rob vh, 2015-01-04 14:48
epg.png (98.5 KB) epg.png Sample EIT EPG Damian Gołda, 2015-01-12 21:21

History

#1

Updated by Rob vh almost 10 years ago

From the json data in dvr/log I was able to parse the (dutch) description field and rename the recording files so that my nfs based clients can at least see the SnEnn info. Obviously this doesn't update the info that HTS clients will see, but then, I don't have those :P . The attached Perl script now runs every morning to rename the catch of the previous day.

#2

Updated by Damian Gołda almost 10 years ago

If I understand correctly, https://github.com/tvheadend/tvheadend/blob/master/src/epggrab/module/opentv.c#L391-L416 does for opentv more less the same what I've proposed 3 years ago for EIT in https://tvheadend.org/issues/766

The difference is that opentv.c performs macthing using summary and I need it using title.

#4

Updated by Rob vh almost 10 years ago

Damian, are the season and episodes number in the title? really?
For me they are in the description fields (see the perl program I attached).

#5

Updated by Damian Gołda almost 10 years ago

See attached screenshot with current EPG (from EIT):

Sample EIT EPG

You can see for example:
Title: "CSI: Kryminalne zagadki Las Vegas - s. V, odc. 7"
Summary: "W pokoju hotelowym znaleziono zwłoki dziewczyny. Została uduszona. Okazuje się, że była na imprezie, z której uprowadzono również jej koleżankę. Ojciec porwanej nie jest przejęty tym faktem."

"s. V, odc. 7" is abbreviation from "s" for "sezon/seria" (season), "odc" for "odcinek" (episode) and means: "season 5, episode 7"

Other examples:
  • "Fala zbrodni - s. III, odc. 31" - season 3, episode 31
  • "M jak miłość - odc. 1109" - episode 1109 (season unknown)
  • "Obsesje - odc. 5/6" - episode 5 from 6 total
  • "Graceland - s. I, odc. 7/13" - season 1, episode 7 from total 13
  • "Autostrada do nieba - odc. 3, Dotknąć księżyca" - episode 3, episode title: "Dotknąć księżyca"
Another example
  • "Libera - Przewodnik po sztuce - (s. III, odc. 2) - ERNA ROSENSTEIN" - numbers in parenthesis and episode title : "ERNA ROSENSTEIN"
Less common examples:
  • episode between slashes: "Muzeum Polskiej Piosenki - /14/ - "Dziwny jest ten świat" - Czesław Niemen"
  • episode number without "odc." abbreviation (odc. for odcinek/episode): "Polonia w Komie - (647) Emigracze 2"

All of them have episode and season numbers in title. Description/summary has no information about episodes.

It is common for Polish DVB-T EIT EPG.

And also common is using roman numerals (I, II, III, IV, V ...) for season number.

#6

Updated by Rob vh almost 10 years ago

wow. I understand why the new directory option in the autorec and timed recordings list is so useful in your country.

#7

Updated by Jaroslav Kysela about 9 years ago

  • Target version set to 4.4
#8

Updated by Jaroslav Kysela about 7 years ago

  • Status changed from New to Fixed
  • Parent task set to #4287

The latest 4.3 branch have this feature. #4287 , #4287, #4578 , #4592 and probably others..

Also available in: Atom PDF