XMLTV, Kazer & French categories
Added by Stephane Chauveau almost 11 years ago
SEE THE REPLY POSTS BELOW FOR AN UPDATED SCRIPT THAT CAN PROCESS ANY INPUT LANGUAGE.
The following information are mostly intended for french users of www.kazer.org but the scripts below can probably be adapted to other tv services. I am on Ubuntu/Linux using MythTV as frontend.
I assume in the following that the user has a Kazer account and that the tv_grab_fr_kazer command (from package xmltv-utils) is already configured. If so, running the following command should give you a nice XML file.
tv_grab_fr_kazer > tv.xml
Some XBMC themes such as Confluence can colorize the tv programs according to their categories but unfortunately that does not work well with Kazer because the categories are given in French instead of
using the names defined in ETSI standard EN 300 468.
Ideally, it should be possible to configure tvheaded to access other strings but this is not yet implemented (see the array _epg_genre_names in epg.c) so I made a quick and dirty perl script to translate the categories.
The first step is to create an executable script /usr/local/bin/tv_grab_fr_kazer_2 containing:
#!/bin/bash if [ "$1" == "--description" ] ; then echo "France (Kazer2)" elif [ "$#" == 0 ] ; then /usr/bin/tv_grab_fr_kazer | /usr/local/bin/category-filter.pl else /usr/bin/tv_grab_fr_kazer "$@" fiThe conditions for that script to be recognized as a grabber by xmltv are
- it must be executable and located in one of the $PATH directories used when running tvheadend
- its name must start by tv_grab_
XMTLV and Tvheadend shall now be aware of an new grabber named "France (Kazer2)" which can be checked from the command line by running the command tv_find_grabbers
$ tv_find_grabbers /usr/local/bin/tv_grab_fr_kazer_2|France (Kazer2) /usr/bin/tv_grab_ch_search|Switzerland (tv.search.ch) /usr/bin/tv_grab_es_laguiatv|Spain (laguiatv.com) /usr/bin/tv_grab_huro|Hungary/Romania ...
The file /usr/local/bin/category-filter.pl is given below. It is a perl script that reads an xml file from standard input, translates the categories and emits the result to standard output.
#!/usr/bin/perl -w # # The categories recognized by tvheadend (see epg.c) # my $MOVIE = "Movie / Drama"; my $THRILLER = "Detective / Thriller"; my $ADVENTURE = "Adventure / Western / War"; my $SF = "Science fiction / Fantasy / Horror"; my $COMEDY = "Comedy"; my $SOAP = "Soap / Melodrama / Folkloric"; my $ROMANCE = "Romance"; my $HISTORICAL = "Serious / Classical / Religious / Historical movie / Drama"; my $XXX = "Adult movie / Drama"; my $NEWS = "News / Current affairs"; my $WEATHER = "News / Weather report"; my $NEWS_MAGAZINE = "News magazine"; my $DOCUMENTARY = "Documentary"; my $DEBATE = "Discussion / Interview / Debate"; my $INTERVIEW = $DEBATE ; my $SHOW = "Show / Game show"; my $GAME = "Game show / Quiz / Contest"; my $VARIETY = "Variety show"; my $TALKSHOW = "Talk show"; my $SPORT = "Sports"; my $SPORT_SPECIAL = "Special events (Olympic Games; World Cup; etc.)"; my $SPORT_MAGAZINE = "Sports magazines"; my $FOOTBALL = "Football / Soccer"; my $TENNIS = "Tennis / Squash"; my $SPORT_TEAM = "Team sports (excluding football)"; my $ATHLETICS = "Athletics"; my $SPORT_MOTOR = "Motor sport"; my $SPORT_WATER = "Water sport"; my $KIDS = "Children's / Youth programmes"; my $KIDS_0_5 = "Pre-school children's programmes"; my $KIDS_6_14 = "Entertainment programmes for 6 to 14"; my $KIDS_10_16 = "Entertainment programmes for 10 to 16"; my $EDUCATIONAL = "Informational / Educational / School programmes"; my $CARTOON = "Cartoons / Puppets"; my $MUSIC = "Music / Ballet / Dance"; my $ROCK_POP = "Rock / Pop"; my $CLASSICAL = "Serious music / Classical music"; my $FOLK = "Folk / Traditional music"; my $JAZZ = "Jazz"; my $OPERA = "Musical / Opera"; my $CULTURE = "Arts / Culture (without music)"; my $PERFORMING = "Performing arts"; my $FINE_ARTS = "Fine arts"; my $RELIGION = "Religion"; my $POPULAR_ART = "Popular culture / Traditional arts"; my $LITERATURE = "Literature"; my $FILM = "Film / Cinema"; my $EXPERIMENTAL_FILM = "Experimental film / Video"; my $BROADCASTING = "Broadcasting / Press"; my $SOCIAL = "Social / Political issues / Economics"; my $MAGAZINE = "Magazines / Reports / Documentary"; my $ECONOMIC = "Economics / Social advisory"; my $VIP = "Remarkable people"; my $SCIENCE = "Education / Science / Factual topics"; my $NATURE = "Nature / Animals / Environment"; my $TECHNOLOGY = "Technology / Natural sciences"; my $DIOLOGY = $TECHNOLOGY my $MEDECINE = "Medicine / Physiology / Psychology"; my $FOREIGN = "Foreign countries / Expeditions"; my $SPIRITUAL = "Social / Spiritual sciences"; my $FURTHER_EDUCATION = "Further education"; my $LANGUAGES = "Languages"; my $HOBBIES = "Leisure hobbies"; my $TRAVEL = "Tourism / Travel"; my $HANDICRAF = "Handicraft"; my $MOTORING = "Motoring"; my $FITNESS = "Fitness and health"; my $COOKING = "Cooking"; my $SHOPPING = "Advertisement / Shopping"; my $GARDENING = "Gardening"; # # This is the # # # my %REPLACE=( "Météo" => $WEATHER , "Film" => $MOVIE , "Théâtre" => $PERFORMING, "Ballet" => $OPERA , "Clips" => $MUSIC , "Concert" => $MUSIC , "Court métrage" => $EXPERIMENTAL_FILM, "Débat" => $SOCIAL , "Dessin animé" => $CARTOON , "Divertissement" => $VARIETY , "Documentaire" => $DOCUMENTARY , "Drame" => $SOAP , "Émission" => 0, "Feuilleton" => $SOAP , "Fin" => 0, "Fin des programmes" => 0 , "Interview" => $INTERVIEW , "Jeu" => $GAME , "Jeunesse" => $KIDS , "Journal" => $NEWS , "Loterie" => 0 , "Magazine" => $MAGAZINE , "Opéra" => $OPERA , "Série" => $MOVIE , "Spectacle" => $PERFORMING , "Sport" => $SPORT , "Talk show" => $TALKSHOW , # "Téléfilm" => $MOVIE , "Télé-réalité" => $VARIETY , "Téléréalité" => $VARIETY , "Tiercé" => $SPORT , "Variétés" => $VARIETY , ) ; my $PRE = '<category lang=\"fr\">' ; my $POST = '</category>' ; sub myfilter { my ($a) = @_; if ( exists $REPLACE{$a} ) { return $REPLACE{$a} ; } else { print STDERR "Warning: Unmanaged category: '$a'\n" ; return $a ; } } while (<>) { my $line = $_ ; $line =~ s/($PRE)(.*)($POST)/"$1".myfilter("$2")."$3"/ge ; print $line; }
Assuming that you have generated a kazer xml file as indicated below, you can try the script manually as follow:
/usr/local/bin/category-filter.pl < tv.xml > new.xml
The resulting file new.xml should contain categories followind the ETSI standard EN 300 468.
Categories that were not recognized, if any, are printed on standard error.
The variables such as $MOVIE and $THRILLER are the EN 300 468 categories. They should not be modified.
The array %REPLACE can be modified. It provides the translations from the french categories to the EN 300 468 categories. Use 0 for categories that you do not care about. Be aware that tvheadend (or is that XBMC) does not manage sub-categories well. In practice, that mean that all categories from the same group will have the same color in XBMC.
The variables $PRE and $POST specify the regular expression used to perform the replacement. They may have to be modified if you want to adapt the script to another service than Kazer.
For information, the categories in Kazer xml files look like that
<category lang="fr">Magazine</category>
Using regular expressions to perform the replacements is uggly but simple. In the future, I may write a longer version using a proper XML parser and advanced features such as selecting the category according to multiple criterias (title, duration, channel, ... )
Replies (85)
RE: XMLTV, Kazer & French categories - Added by c0m m0n over 10 years ago
Great, gonna test that right now !
RE: XMLTV, Kazer & French categories - Added by c0m m0n over 10 years ago
Updated the script
#!/usr/bin/perl -w # # The categories recognized by tvheadend (see epg.c) # my $MOVIE = "Movie / Drama"; my $THRILLER = "Detective / Thriller"; my $ADVENTURE = "Adventure / Western / War"; my $SF = "Science fiction / Fantasy / Horror"; my $COMEDY = "Comedy"; my $SOAP = "Soap / Melodrama / Folkloric"; my $ROMANCE = "Romance"; my $HISTORICAL = "Serious / Classical / Religious / Historical movie / Drama"; my $XXX = "Adult movie / Drama"; my $NEWS = "News / Current affairs"; my $WEATHER = "News / Weather report"; my $NEWS_MAGAZINE = "News magazine"; my $DOCUMENTARY = "Documentary"; my $DEBATE = "Discussion / Interview / Debate"; my $INTERVIEW = $DEBATE ; my $SHOW = "Show / Game show"; my $GAME = "Game show / Quiz / Contest"; my $VARIETY = "Variety show"; my $TALKSHOW = "Talk show"; my $SPORT = "Sports"; my $SPORT_SPECIAL = "Special events (Olympic Games; World Cup; etc.)"; my $SPORT_MAGAZINE = "Sports magazines"; my $FOOTBALL = "Football / Soccer"; my $TENNIS = "Tennis / Squash"; my $SPORT_TEAM = "Team sports (excluding football)"; my $ATHLETICS = "Athletics"; my $SPORT_MOTOR = "Motor sport"; my $SPORT_WATER = "Water sport"; my $KIDS = "Children's / Youth programmes"; my $KIDS_0_5 = "Pre-school children's programmes"; my $KIDS_6_14 = "Entertainment programmes for 6 to 14"; my $KIDS_10_16 = "Entertainment programmes for 10 to 16"; my $EDUCATIONAL = "Informational / Educational / School programmes"; my $CARTOON = "Cartoons / Puppets"; my $MUSIC = "Music / Ballet / Dance"; my $ROCK_POP = "Rock / Pop"; my $CLASSICAL = "Serious music / Classical music"; my $FOLK = "Folk / Traditional music"; my $JAZZ = "Jazz"; my $OPERA = "Musical / Opera"; my $CULTURE = "Arts / Culture (without music)"; my $PERFORMING = "Performing arts"; my $FINE_ARTS = "Fine arts"; my $RELIGION = "Religion"; my $POPULAR_ART = "Popular culture / Traditional arts"; my $LITERATURE = "Literature"; my $FILM = "Film / Cinema"; my $EXPERIMENTAL_FILM = "Experimental film / Video"; my $BROADCASTING = "Broadcasting / Press"; my $SOCIAL = "Social / Political issues / Economics"; my $MAGAZINE = "Magazines / Reports / Documentary"; my $ECONOMIC = "Economics / Social advisory"; my $VIP = "Remarkable people"; my $SCIENCE = "Education / Science / Factual topics"; my $NATURE = "Nature / Animals / Environment"; my $TECHNOLOGY = "Technology / Natural sciences"; my $DIOLOGY = $TECHNOLOGY; my $MEDECINE = "Medicine / Physiology / Psychology"; my $FOREIGN = "Foreign countries / Expeditions"; my $SPIRITUAL = "Social / Spiritual sciences"; my $FURTHER_EDUCATION = "Further education"; my $LANGUAGES = "Languages"; my $HOBBIES = "Leisure hobbies"; my $TRAVEL = "Tourism / Travel"; my $HANDICRAF = "Handicraft"; my $MOTORING = "Motoring"; my $FITNESS = "Fitness and health"; my $COOKING = "Cooking"; my $SHOPPING = "Advertisement / Shopping"; my $GARDENING = "Gardening"; # # This is the # # # my %REPLACE=( "Météo" => $WEATHER , "Film" => $MOVIE , "Théâtre" => $PERFORMING, "Ballet" => $OPERA , "Clips" => $MUSIC , "Concert" => $MUSIC , "Court métrage" => $EXPERIMENTAL_FILM, "Débat" => $SOCIAL , "Dessin animé" => $CARTOON , "Divertissement" => $VARIETY , "Documentaire" => $DOCUMENTARY , "Drame" => $SOAP , "Émission" => 0, "Feuilleton" => $SOAP , "Fin" => 0, "Fin des programmes" => 0 , "Interview" => $INTERVIEW , "Jeu" => $GAME , "Jeunesse" => $KIDS , "Journal" => $NEWS , "Loterie" => 0 , "Magazine" => $MAGAZINE , "Opéra" => $OPERA , "Série" => $MOVIE , "Spectacle" => $PERFORMING , "Sport" => $SPORT , "Talk show" => $TALKSHOW , "Téléfilm" => $MOVIE , "Télé-réalité" => $VARIETY , "Téléréalité" => $VARIETY , "Tiercé" => $SPORT , "Divers" => $VARIETY , "Variétés" => $VARIETY , "Emission politique" => $SOCIAL, "Politique" => $SOCIAL , "Divers" => $VARIETY , "Religion" => $HISTORICAL , "Musique" => $MUSIC , "Fitness" => $FITNESS , "Sports" => $SPORT , "Clip" => $MUSIC , "Anime" => $CARTOON , "Humour" => $COMEDY , ) ; my $PRE = '<category lang=\"fr\">' ; my $POST = '</category>' ; sub myfilter { my ($a) = @_; if ( exists $REPLACE{$a} ) { return $REPLACE{$a} ; } else { print STDERR "Warning: Unmanaged category: '$a'\n" ; return $a ; } } while (<>) { my $line = $_ ; $line =~ s/($PRE)(.*)($POST)/"$1".myfilter("$2")."$3"/ge ; print $line; }
RE: XMLTV, Kazer & French categories - Added by c0m m0n over 10 years ago
Confirmed working for HTS Tvheadend 3.9.788~g385c190~trusty
Thanks !
RE: XMLTV, Kazer & French categories - Added by Nicolas Rioja over 9 years ago
Hi,
I´m running Tvheadend 3.4.27~gfbda802~precise in Ubuntu 12.04
I was searching for the file epg.c but I didn´t find it.
So I have the next question:
Can I run your script with my xml file generated with WebGrabPlus instead of XMLTV?
Thanks
RE: XMLTV, Kazer & French categories - Added by Nicolas Rioja over 9 years ago
Nicolas Rioja wrote:
Can I run your script with my xml file generated with WebGrabPlus instead of XMLTV?
Ok I answer to my self... yes, It´s possible.
RE: XMLTV, Kazer & French categories - Added by Nicolas Rioja over 9 years ago
Stephane Chauveau wrote:
Categories that were not recognized, if any, are printed on standard error.
Hi Stephane,
Could you tell me how to modify your script in order to send the standard errors in a file instead of to be printed in the screen?
My intention is check that file after several days with the erros of the new categories and collect them to be added in the script and categorized correctly.
I´ve been trying several solutions to do it by myself from this website:
http://stackoverflow.com/questions/10682087/how-to-redirect-console-output-to-a-text-file
But, due that my knowledge with Perl is 0 I didn´t have success.
Stephane Chauveau wrote:
Using regular expressions to perform the replacements is uggly but simple. In the future, I may write a longer version using a proper XML parser and advanced features such as selecting the category according to multiple criterias (title, duration, channel, ... )
By the way... did you have any progress with the longer version of your script?
Well,that´s all... congratulations for your script.
Thanks in advance for your help.
Regards
RE: XMLTV, Kazer & French categories - Added by David jrm about 9 years ago
c0m m0n wrote:
Confirmed working for HTS Tvheadend 3.9.788~g385c190~trusty
Thanks !
Hi,
I would really appreciate if someone could help me to identify what is the root cause of get several categories not recognized, despite these are included in the .pl file
Please find attached 2 files. Categorias.pl and warning errors.txt where there are several categories included in the pl script.
Is there anything wrong in the script or so?
Thanks!
categorias.pl (6.28 KB) categorias.pl | |||
warning errors.txt (24.6 KB) warning errors.txt |
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
The changes you made to the script are looking good. I cannot reproduce the problem
Your Perl script is a DOS file so I assume that you are on Windows. I am on Linux so there could be an difference with the character encoding. However, most of the failing categories do not contain any special characters so this is unlikely to be an encoding issue.
Could you post a sample XML input file that shows the problem?
RE: XMLTV, Kazer & French categories - Added by David jrm about 9 years ago
Thanks for the quick response Stephane
Im running on Linux as well.... Please see attached the xml file
May I created the file once again directly in linux? For this, I created the file in windows and then get a copy from linux, but not sure if this is causing this issue.....
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
When I run "perl categorias.pl < ../guide.xml > /dev/null" the only reported errors are for the categories 'Cultural\/Educativo', 'Debates', 'Magazines' and 'Todas'.
The last 3 are not handled by the perl script (there is 'Debate' and 'Magazine' without the trailing 's')
For the 'Cultural\/Educativo' the error is simply that the character \ is escaped by Perl and so it must be doubled:
...
"Cultural\\/Educativo" => $VARIETY ,
...
RE: XMLTV, Kazer & French categories - Added by David jrm about 9 years ago
Well, you are getting only these because I sent you just an example . Let me change the one for the double "\" (thanks for that) and will recheck once again. If the outcome is the same, could I send you the full xml file?
Thx
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
Ok.
If you want to reduce the size of the xml file, keep only the 'category' lines
grep category guide.xml > guide2.xml
RE: XMLTV, Kazer & French categories - Added by David jrm about 9 years ago
same..
See attached both the pl with the above included and the categories file.xml
categorias.pl (6.36 KB) categorias.pl | |||
guide2.xml (190 KB) guide2.xml |
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
For me, the only missing category is Todas
- perl categorias.pl guide2.xml > /dev/null
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Warning: Unmanaged category: 'Todas'
Are you seeing something different?
RE: XMLTV, Kazer & French categories - Added by David jrm about 9 years ago
Really strange.. I was getting several unmanaged like the ones in my first post. As you confirmed this, I reinstalled perl and it is working now (I included an exception to Todas), so not sure why this was happening.....
Thanks again for your time and nice to see this working
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
If you really see more errors, that could be caused by
(1) a different version of Perl. Mine is v5.18.2 but this is very simple Perl code so I do not expect any differences in Perl 5. I cannot say for Perl 6
(2) the perl script being in DOS format. I know that a lot of Linux scripting langages do not really like DOS files (e.g. bash). However, I would expect a more explicit error if that was the case for Perl
(3) a problem with your Locale settings. They can affect how regular expressions are matched. I assume that Perl could also use the locale setting as the default encoding for ile. Try doing "export LC_ALL=C"
RE: XMLTV, Kazer & French categories - Added by David jrm about 9 years ago
great! understood and thanks a lot for your help!
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
Then that could be a problem with the way perl is caching precompiled bytecodes. I think that Perl 6 is now doing that. The symptoms are consistent with running a bytecode corresponding to an old obsolete perl script. Reinstalling perl probably caused the bytecode to be invalidated.
ps: That kind of strange problems are very common on old systems with a faulty clock (e.g. the battery on the motherboard is dead). If the system clock is reset at each reboot then old bytecode files may actually have a newer date than the recently modified perl script.
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
Using a network filesystem (NFS, SMB/CIFS, ...) with a badly synchronized clock could also cause similar issues.
RE: XMLTV, Kazer & French categories - Added by David jrm about 9 years ago
The system is pretty new and as far as I know it doesn't have clock issues at all... what is clear is that pearl reinstallation fix the issue.. I have ran the script several times with different files and it works perfectly always
Will take note of the above if I see further issues, but from now it is up & working
Thx
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
For Nicolas,
The easiest way to log the errors is to redirect the error output stream (number 2) to a file.
For example, you can run the script as follow
/usr/local/bin/category-filter.pl 2> /tmp/category-filter.log
or if you want to APPEND to the log file, use a double >> instead
/usr/local/bin/category-filter.pl 2>> /tmp/category-filter.log
The alternative is to modify the perl script itself.
Add the following line at the beginning of the script to open the log file in append mode:
open(LOG, ">>", "/tmp/category-filter.log") or die "Can't open LOG file: $!";
Then clone the 'print STDERR' line and replace 'STDERR' by 'LOG':
print STDERR "Warning: Unmanaged category: '$a'\n" ;
print LOG "Warning: Unmanaged category: '$a'\n" ;
If you do not want to repeat the same error hundreds or thousands of times, you can memorize the wrong categories as follow to emit a single error for each.
At the beginning of the script, create an empty map:
my %BAD ;
Then modify the prints to STDERR and LOG as follow
if ( ! exists $BAD{$a} ) {
print STDERR "Warning: Unmanaged category: '$a'\n" ;
print LOG "Warning: Unmanaged category: '$a'\n" ;
# Record in BAD map so next error won't produce a message
$BAD{$a} = 1 ;
}
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau about 9 years ago
I made a syntax error in the open command. I edited the previous post but no email is sent so I repeat the change here
open(LOG, ">>", "/tmp/category-filter.log") or die "Can't open LOG file: $!";
RE: XMLTV, Kazer & French categories - Added by Nicolas Rioja almost 9 years ago
Thank you very much Stephane.
I really appreciate your help.
Best regards.
RE: XMLTV, Kazer & French categories - Added by Pablo R. almost 9 years ago
Could I use this script with multiple languages at once, for example English and French at the same time? And how?
Thanks.
RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau almost 9 years ago
I am not sure to understand what you mean by "multiple languages at once".
Do you mean that the XML file contains <category> tags with different lang values?
If so, it should be possible to modify the the script to take the lang value into account. The attached file is a quick attempt.
The idea with those modifications is that the myfilter function now receives the language (as defined by the lang attribute in the <category> tag) and the value. The entries in %REPLACE should now be prefixed
with the "language:"
Remark: Be aware that $PRE now contains 2 hidden levels of level of "( )" so in the $line regex, $1 is $PRE, $3 is the lang value, $4 is the category name and $5 is the $POST value.
I also modified the myfilter function to ignore missing entries for the 'en' language since most of them are likely to remain identical. If you prefer to have an error message for all missing entries, comment those 2 lines and all known English categories in %REPLACE.
I also assumed that the lang attribute may be omitted for 'en' but I do not have any examples so that may not work as expected.
category-filter.pl (8.07 KB) category-filter.pl |