Actions
Feature #4577
openEIT scraper basic test harness
Start date:
2017-09-09
Due date:
% Done:
0%
Estimated time:
Description
The #4287 implemented a basic EIT scraper relying on regex.
I've WIP on a very basic test harness to check we extract the correct data since I think it will be easy to fix a regex for one programme or region but accidentally break it for another. Having samples of what we were trying to extract should help avoid breakage and help to craft configurations for other regions since at least we can receive examples even if there is no working scraper for them.
Is saving in git a complete programme title/subtitle/description from the EIT "fair use" under copyright? I assume so.
Basic format of the JSON test file is like this, so from an EIT summary we expect to extract S13E11.
{
"summary": "S13, E11. Lorem Ipsum",
"season": "13", "episode": "11"
}
And from this we expect S5E31.
{
"summary" : "Lorem Ipsum. (S5, E31)",
"season": "5", "episode": "31"
}
Actions