Feature #4615
Use fuzzy logic to link epg data source to channel
0%
Description
It would be nice if the epg source could automatically link to the channel based on more than just the channel name. For example have it read the channel number and link that way. Or even have it use fuzzy matching to match part of the channel name. In the US, the ATSC channel name is often different from the name used on channel guides (and data sources like schedules direct) so they don't autoimatically link - you have to manually set the epg source. If it could match based on channel number this source mapping process would be automated once the grabber is enabled.
Files
History
Updated by Em Smith about 7 years ago
Schedules Direct knows what channels I can receive from which satellite, several alternative names for the channel and what channel numbers they should have.
I'm wondering if they also have tuning information.
If the tuning information were supplied then presumably everything would be simpler to match since it is a one-to-one comparison. Maybe fire the question up to SD?
So they know my channel is on this satellite on this frequency with this id and has been assigned xyz.json.schedulesdirect.org. The tv grabber could put it in the url field, maybe use a format similar to the urls used for sat>ip tuning. Tvheadend then just compares the two and has a direct match, falling back to using existing logic.
Maybe it's far more complicated than that for channels over an aerial.
I've seen wiki pages elsewhere that people have manually created name/id to xmltvid mappings. I wonder if a simple solution would be to have an ability to parse a config file afterwards such as for channel "Dave" we'd have name, frequency, polarity, pid, xmltvid, or whatever is necessary for tuning:
Dave:11523.25:H:7620:xyz.json.schedulesdirect.org
The one problem with channel number matching is that it's quite common (here) to have multiple channels with the same channel number! So I have several completely different channels with the same number due to having an aerial and satellite; and people often have several feeds from different satellites.
With xmltv you can get your channel numbers from the xmltv file (channel number heuristics). So it matches the name and then assigns the channel number. I guess ATSC scan is giving you the channel number over the air so you have that option set to disabled.
Updated by edit4ever ! about 7 years ago
Maybe we can make this an option to what level of fuzzy logic can be used to match. I understand that in Europe you mix a lot of different TV sources so the channel number matching might not be best. But in the US, we need an easier way to automate setting systems up (and specifically epg sources) in the 210 different ATSC TV markets. Unfortunately our TV stations do not provide real epg data in the ATSC feed - just generic show title data - so wee need to use other sources for epg guides.
The heuristics won't help if the epg source isn't already assigned to the channel.
As an example of our challenge - here are a few stations:
ATSC Number ATSC Channel Name Schedules Direct (and Zap2it) Name
39.1 KNSD-DT 39.1 KNSDDT
39.2 COZI-TV 39.2 KNSDDT2
15.1 KPBSHD 15.1 KPBSDT
Stations don't use their FCC ID in the transmitted ATSC name - they can make it say whatever they want. So matching by name won't work here.
Hopefully adding a channel number matching option wouldn't be too hard. Mmatch would be with local channel number and local minor channel number - which I believe is what makes the initial mapped channel number - with the xmltv display-name (which contains the number).
Updated by Em Smith about 7 years ago
That's very interesting.
I haven't looked at that specific code so I don't know if I can help. But I might have some time next weekend to take a quick look since I wanted to see if I could make the remainder of my mappings work (many don't match), either in the code or modifying the config files outside tvh. I can't promise anything but it might be easy enough. Are you able to compile and test a patch?
Unfortunately I have to ask a few basic questions just to make sure I understand.
Are you using tv_grab_zz_sdjson_sqlite? I'm guessing the zap2it name is where you configure it via the SD web page instead of through the tv_grab_zz_sdjson_sqlite.
Can you get access to the xmltv file? Does SD also have multiple display-names for your channels and the channel number as a separate field or is it only in the first display name or only in the channel id field?
For example I have the following right near the top of the output file where "308" and "12" are the channel numbers for the two channels:
<channel id="I34230.json.schedulesdirect.org"> <display-name lang="en">Comedy Central Extra</display-name> <display-name lang="en">COMCNX</display-name> <display-name lang="en">308</display-name> <icon src="https://s3.amazonaws.com/schedulesdirect/assets/stationLogos/s34230_h3_aa.png" width="360" height="270" /> </channel> <channel id="I24305.json.schedulesdirect.org"> <display-name lang="en">Dave</display-name> <display-name lang="en">DAVE</display-name> <display-name lang="en">12</display-name> <icon src="https://s3.amazonaws.com/schedulesdirect/assets/stationLogos/s24305_h3_aa.png" width="360" height="270" /> </channel>
That was retrieved via:
tv_grab_zz_sdjson_sqlite --days 1 --output /tmp/sample1.xml; head -30 /tmp/sample1.xml
Whereas if I sign up with zip code 90210 and select "OTA Aerial" using the command line tool then I get for example:
<channel id="I73580.json.schedulesdirect.org"> <display-name lang="en">KIIOLD (KIIO-LD)</display-name> <display-name lang="en">KIIOLD</display-name> <display-name lang="en">10.1</display-name> </channel> <channel id="I100092.json.schedulesdirect.org"> <display-name lang="en">KDOCDT7 (KDOC-DT7)</display-name> <display-name lang="en">KDOCDT7</display-name> <display-name lang="en">56.7</display-name> </channel>
So the channel number is a separate field for that too (assuming 10.1 and 56.7 are channels). So I'm wondering if your data is the same when you use zap2it ids, i.e., with a separate field containing the channel number. Could you paste an example?
Final question: in configuration->dvb inputs->services, click on KNSD-DT, then "expert", then "read only info", presumably "local channel number" is 39, "local channel minor" is 1, and "service id" and "ATSC source id" are (effectively) random (not immediately recognizable as relating to the channel).
Updated by edit4ever ! about 7 years ago
Thanks for taking a look - here are the answers to your questions.
I'm not using tv_grab_zz_sdjson_sqlite - I actually use a combination for different testing of two grabbers that I built for the LibreELEC platform. One uses zap2it data and one uses schedules direct data.
Both generate multiple display names for the channels. Here is a sample of the KNSD info:
<channel id="I39.1.21213.schedulesdirect.org">
<display-name>39.1 KNSDDT</display-name>
<display-name>39.1</display-name>
<display-name>40 KNSDDT fcc</display-name>
<display-name>KNSDDT</display-name>
<display-name>KNSDDT (KNSD-DT)</display-name>
</channel>
<channel id="I21213.labs.zap2it.com">
<display-name>39.1 KNSDDT</display-name>
<display-name>39.1</display-name>
<display-name>KNSDDT</display-name>
</channel>
Yes local channel number 39, local channel minor 1, and service id/source id are different or each channel/subchannel. (Those are stream identifiers in the ATSC signal and aren't used in channel mapping)
One other though is even though tvh doesn't internally use the number field from the channel info window (it is created to be used by the client) it effectively creates the real channel number that we want to match to. I assume it just adds the major/minor numbers together from the service info. I would think accessing that channel number field and matching to it should be possible.
Let me know if you need anything else. While I can build a grabber in python - I don't know the first thing about digging into the tvh code!
Updated by Em Smith about 7 years ago
Looks like the code probably already exists, just doesn't work for atsc channels.
In the xmltv parsing code we have:
save |= epggrab_channel_set_number(ch, atoi(name), 0);
where atoi(name) is converting the major number to an integer, and 0 is the hard-coded minor number.
So then the epggrab_channel_match_number fails since it doesn't match the channel number.
Basically I think it just needs to parse "major.minor" as well as European-style "major", should be easy enough. I'll try and do a patch in a couple of days for you to try. Then if it doesn't work we'll have to add some debug.
Updated by Em Smith about 7 years ago
- File 0001-xmltv-Parse-atsc-style-numbers.-4615.patch 0001-xmltv-Parse-atsc-style-numbers.-4615.patch added
Try the attached patch. I ran it in a small test harness and it seemed to work for "59", "59.2", "59.2 bob" and "bob", but can't test the actual patch in the code.
The only difference is that the old code handled channel 0, the new code assumes channel can't be zero.
Although I just noticed that in your recent example for knsddt it has 39.1 and '40 knsddt fcc', so I expect it will get channel 40. Is that example typical?
Updated by edit4ever ! about 7 years ago
I'll try to setup a build environment later today - I've gotten used to getting my tvh builds from CvH at the LibreELEC project.
As for the channel 40 issue. For schedules direct - they include the fcc channel name and transmitting frequency - so yes it is common. Unfortunately we've done a great job of building a horrible over the air tv system in the US - including not going with cofdm like europe when we transitioned to digital. As part of that mess - stations got to keep using their old analog channel numbers so the public wouldn't be confused during the tranisition. As a result, stations that were assigned to new digital transmitting frequencies - in this example KNSD transmited analog on RF channel 39 and then switched to transmitting digital on RF channel 40 - kept using their analog channel number as the name of the station so the digital tv would show the old number. So even though KNSD transmits their digital signal on RF channel 40 - it shows on your system as channel 39.1 - doh!
It seems scheules direct has chosen to include the fcc transmitting frequency reference as one of the names of the stations. Other guide sources do not include this name - so we'll see what happens when I get the patch built. As long as it starts with the major.minor that tvh got from the service and looks for a match to that number - it should work fine. As all the fcc names will only have a number without a .minor as they are a pure transmitting channel reference.
Thanks for giving this a go!
Updated by Em Smith about 7 years ago
Unfortunately it looks like it takes the last one it finds. So if it doesn't work add the following just before the save around line 697 in src/epggrab/module/xmltv.c:
if (!strstr(cur, " fcc"))
so it becomes:
if (!strstr(cur, " fcc")) save |= epggrab_channel_set_number(ch, major, minor);
Updated by edit4ever ! about 7 years ago
OK this worked for zap2it generated xmltv.xml files (which doesn't have the fcc transmitting number) - so now I will go and restup my schedules direct account and test!
Nicely done!!
Updated by edit4ever ! about 7 years ago
And success #2! WOrked with schedules direct and with only the original patch!
This is great - hopefully this can be added to the next commit.
Thanks for your help!!
Updated by Em Smith about 7 years ago
That's fantastic news. I did a bit more testing and it uses the first channel name which is why it correctly worked for 39.1 vs. 40.
I've submitted the pull request.
As you say the atsc does seem a bit of a mess, having channel 40 also be called 39.1 rather than just rename it once. Whereas here, channels change numbers and frequency every year-or-so; and some rename even more frequently such as Movies channel->Christmas Movies->Star Wars Movies.
Updated by Kick4U 2 almost 7 years ago
Any news on this request? I would really like to see this implemented soon.
Updated by Kick4U 2 almost 7 years ago
Any news on this request? I would really like to see the minorchannels being updated within TVHeadend.
Thanks!