Mailing List Archive

Mythfilldatabase and Shepherd (tv_grab_au)
Hello,

I was asked to post here regarding why Shepherd (tv_grab_au) relies on
the '--graboptions' arg that was recently removed from mythfilldatabase:

http://code.mythtv.org/trac/ticket/9853

Specifically, I was asked why Shepherd sets up a custom cron job on the
user's system to call mythfilldatabase, rather than relying on MythTV's
in-built scheduling system.

Firstly, I should say that we have a workaround now, so this is not a
request to keep --graboptions or anything. We can deal with that fine.
I'm just responding to stuartm's request for info.

So: there are a few reasons. Some may be obsolete now, because Shepherd
was written years ago and I haven't followed MythTV's development very
closely. But at the time, at least, this is why:

(1) By default, MythTV triggers mythfilldatabase at 2am. Shepherd phones
home with stats, and our graphs were showing an order of magnitude spike
in usage at 2am, as a great many Shepherd users Australia-wide all hit
the datasources at once. Please bear in mind that each Shepherd user is
not simply downloading one XMLTV file, but rather compiling XMLTV by
scraping dozens or hundreds or thousands of different web pages.

This behavior was a problem both for the datasource (which could be
overwhelmed with traffic) and for us. It's a problem for us because TV
guide data is not freely available in Australia: it's fiercely defended
by the TV networks, who don't want it to end up in home theater PCs. In
a nutshell, the only way to get high-quality TV guide data into an
Australian home theater PC has been to scrape the networks' web pages,
but they actively block any scrapers they detect. If a bunch of scrapers
hit them at precisely 2am, that's easy to block.

This is a key point that is often overlooked by non-Australians, who
don't appreciate the different environment here. For example, it would
make a lot of sense for us to run just one scraper to gather TV guide
data each day, convert it into XMLTV, and offer that for download to all
Australian users. However, that would be illegal under Australian
copyright law.

I point this out because in my experience, people overseas tend to
respond to our situation with sentiments like, "That copyright law is
stupid, you should get it changed," or similarly accurate but wildly
unhelpful observations. The reality is that before Shepherd, Australians
had no reliable high-quality source of TV guide data. We do it this way
because it's been our only option.

(2) MythTV assumes it will be powered on for the scheduled MFDB run, and
if it's not--e.g. it's a system that shuts down when idle--it skips that
day. Users could thus see a dwindling supply of guide data "days" and
think something was wrong with Shepherd. It's a particular problem when
combined with (1) above, because systems that auto-poweroff are often
off at 2am.

(This seems like a universal problem, not Australia-specific, so very
possibly it's been addressed since I last looked at it.)

(3) Similarly, MythTV is locked to one grab per day. Users with
unreliable internet connections tend to have Shepherd time out
occasionally (it takes a long time to run, often hours), and thus
encounter the same shrinking number of guide data days problem as above.
Shepherd's cron job, by contrast, runs once per hour, so that it can try
again more quickly in the event of a transient network failure.

(4) It's very hard to configure grabbers via mythtv-setup. For example,
every time you go into the relevant MythTV Settings page, a terminal
window is triggered running 'tv_grab_au --configure'. On my system, at
least, this terminal window was invisible until you exited mythtv-setup.
Also, the process of matching channels in MythTV to XMLTVIDs in Shepherd
was very torturous and sometimes involved races between the two
applications.

Once Shepherd became relatively stable, we found that the great majority
of our mailing list traffic was requests for help configuring MythTV,
due to the issues listed above. I didn't see this as my role, or
Shepherd's role; Shepherd's job is simply to deliver an XMLTV file.
However, people saw Shepherd-MythTV integration problems as Shepherd
problems, and eventually I gave in and added some code to automatically
configure MythTV to run Shepherd. This entails:

- creating a tv_grab_au symlink to Shepherd in a relevant path

- scanning the user's MythTV DB for channels, figuring out matches with
Shepherd TV guide data channels, and setting XMLTVIDs appropriately

- turning off MythTV scheduled updates

- setting up a cron job to run Shepherd once per hour at a randomized
time. This is where we used '--graboptions' to pass a '--daily' argument
to Shepherd, when that became its new default behavior. Our fix for the
removal of --graboptions from MFDB will simply be to alter Shepherd such
that --daily is implied.

This has worked very well, as we no longer see so many emails from
people needing help configuring MythTV.

Our #1 user complaint today is that once Shepherd is installed and the
user tries to run it via 'mythfilldatabase', it seems to hang, because
mythfilldatabase suppresses all Shepherd output. (Shepherd can take
several hours to complete its first run.) At this point, some people
give up and terminate the process, then write to us seeking help on what
went wrong. So if I do have a request, it is that MFDB stop suppressing
grabber output on the command-line, so users can see that it is actually
doing something.

Thanks,

Max.

_______________________________________________
mythtv-dev mailing list
mythtv-dev@mythtv.org
http://www.mythtv.org/mailman/listinfo/mythtv-dev
Re: Mythfilldatabase and Shepherd (tv_grab_au) [ In reply to ]
On 07/16/2011 05:17 PM, Max Barry wrote:
> I was asked to post here regarding why Shepherd (tv_grab_au) relies on
> the '--graboptions' arg that was recently removed from mythfilldatabase:
>
> http://code.mythtv.org/trac/ticket/9853
>
> Specifically, I was asked why Shepherd sets up a custom cron job on the
> user's system to call mythfilldatabase, rather than relying on MythTV's
> in-built scheduling system.
>
> Firstly, I should say that we have a workaround now, so this is not a
> request to keep --graboptions or anything. We can deal with that fine.
> I'm just responding to stuartm's request for info.
>
> So: there are a few reasons. Some may be obsolete now, because Shepherd
> was written years ago and I haven't followed MythTV's development very
> closely. But at the time, at least, this is why:
>
> (1) By default, MythTV triggers mythfilldatabase at 2am.

"Automatically run mythfilldatabase" defaults to enabled
"mythfilldatabase run frequency (days)" defaults to 1 (daily runs)
"mythfilldatabase execution start" defaults to 2
"mythfilldatabase execution end" defaults to 5

which leaves a 3hr window to run mythfilldatabase. Coupled with:

> Shepherd phones
> home with stats, and our graphs were showing an order of magnitude spike
> in usage at 2am, as a great many Shepherd users Australia-wide all hit
> the datasources at once. Please bear in mind that each Shepherd user is
> not simply downloading one XMLTV file, but rather compiling XMLTV by
> scraping dozens or hundreds or thousands of different web pages.
>

http://code.mythtv.org/trac/ticket/2194#comment:7

runs should be spread out within that window.

That said, I'm all for changing the defaults for execution start and end
to 0/23 (meaning it's allowed to run at any time of day--so runs would
be spread out over the whole day, where each would be "seeded" by the
time the user first ran mythbackend). I'd prefer to see
mythfilldatabase configured to just work when users run
mythbackend--without having to realize there's a buried setting that
limits it to only run in a small time period. Then, any users who
choose inappropriate/underpowered systems to run the master backend
and/or database server can find the settings and limit the run. IMHO,
we should default to a configuration that assumes properly-spec'ed
systems. :)

> (2) MythTV assumes it will be powered on for the scheduled MFDB run, and
> if it's not--e.g. it's a system that shuts down when idle--it skips that
> day. Users could thus see a dwindling supply of guide data "days" and
> think something was wrong with Shepherd. It's a particular problem when
> combined with (1) above, because systems that auto-poweroff are often
> off at 2am.
>
> (This seems like a universal problem, not Australia-specific, so very
> possibly it's been addressed since I last looked at it.)

http://code.mythtv.org/trac/ticket/4961#comment:2

I'll let someone else respond to the other issues.

Mike
_______________________________________________
mythtv-dev mailing list
mythtv-dev@mythtv.org
http://www.mythtv.org/mailman/listinfo/mythtv-dev
Re: Mythfilldatabase and Shepherd (tv_grab_au) [ In reply to ]
On 07/18/2011 04:01 PM, Michael T. Dean wrote:
> On 07/16/2011 05:17 PM, Max Barry wrote:
>> Shepherd phones
>> home with stats, and our graphs were showing an order of magnitude spike
>> in usage at 2am, as a great many Shepherd users Australia-wide all hit
>> the datasources at once. Please bear in mind that each Shepherd user is
>> not simply downloading one XMLTV file, but rather compiling XMLTV by
>> scraping dozens or hundreds or thousands of different web pages.
>>
> http://code.mythtv.org/trac/ticket/2194#comment:7
>
> runs should be spread out within that window.
>
> That said, I'm all for changing the defaults for execution start and end
> to 0/23 (meaning it's allowed to run at any time of day--so runs would
> be spread out over the whole day, where each would be "seeded" by the
> time the user first ran mythbackend).

Oh, and I should probably mention:

Run mythfilldatabase at time suggested by the grabber.
If enabled, allow a DataDirect guide data provider to specify the next
download time in order to distribute load on their servers.
mythfilldatabase Execution Start/End times are also ignored.

which defaults to enabled. Currently, it only works with Schedules
Direct data, but if the XMLTV community wants to come up with a
standardized approach for providing this data, I can add support for it
into mythfilldatabase. Then, each grabber can choose a time using
whatever approach they want (random times, phone home to the Shepard
server and ask it for the recommended next run time, ...).

Mike
_______________________________________________
mythtv-dev mailing list
mythtv-dev@mythtv.org
http://www.mythtv.org/mailman/listinfo/mythtv-dev
Re: Mythfilldatabase and Shepherd (tv_grab_au) [ In reply to ]
Hi Mike,

mtdean wrote:
> On 07/16/2011 05:17 PM, Max Barry wrote:
> That said, I'm all for changing the defaults for execution start and end
> to 0/23 (meaning it's allowed to run at any time of day--so runs would
> be spread out over the whole day, where each would be "seeded" by the
> time the user first ran mythbackend). I'd prefer to see
> mythfilldatabase configured to just work when users run
> mythbackend--without having to realize there's a buried setting that
> limits it to only run in a small time period. Then, any users who
> choose inappropriate/underpowered systems to run the master backend
> and/or database server can find the settings and limit the run. IMHO,
> we should default to a configuration that assumes properly-spec'ed
> systems. :)

I agree, for what it's worth. I can't imagine any grabber is more
CPU-intensive than Shepherd (which has to laboriously compile XMLTV from
web pages), and I don't notice slowdowns on my six-year-old hardware
even though it runs at all different times of the day.

>> (2) MythTV assumes it will be powered on for the scheduled MFDB run, and
>> if it's not--e.g. it's a system that shuts down when idle--it skips that
>> day. Users could thus see a dwindling supply of guide data "days" and
>> think something was wrong with Shepherd. It's a particular problem when
>> combined with (1) above, because systems that auto-poweroff are often
>> off at 2am.
>>
>> (This seems like a universal problem, not Australia-specific, so very
>> possibly it's been addressed since I last looked at it.)
>
> http://code.mythtv.org/trac/ticket/4961#comment:2

That still requires the system be on at some point during the MFDB valid
window, though. Assuming a default install, if the user's system tends
to be powered down from 2-5am (as mine is), it will run out of guide data.

I guess users with auto-shutdown systems should have been making sure
they changed their MFDB valid window to 0/23, but for whatever reason
some people clearly don't know that.

> Oh, and I should probably mention:
>
> Run mythfilldatabase at time suggested by the grabber.
> If enabled, allow a DataDirect guide data provider to specify the next
> download time in order to distribute load on their servers.
> mythfilldatabase Execution Start/End times are also ignored.
>
> which defaults to enabled. Currently, it only works with Schedules
> Direct data, but if the XMLTV community wants to come up with a
> standardized approach for providing this data, I can add support for it
> into mythfilldatabase. Then, each grabber can choose a time using
> whatever approach they want (random times, phone home to the Shepard
> server and ask it for the recommended next run time, ...).

That wouldn't be ideal for us since we're not the datasource; we don't
have any special knowledge about when would be a good time to run. We
just don't want everybody piling on to a particular source at once.
Changing the default MFDB window to 0/23 would be simpler and better
from our point of view.

Max.

_______________________________________________
mythtv-dev mailing list
mythtv-dev@mythtv.org
http://www.mythtv.org/mailman/listinfo/mythtv-dev
Re: Mythfilldatabase and Shepherd (tv_grab_au) [ In reply to ]
On Tuesday 19 Jul 2011 11:42:00 Max Barry wrote:
> That wouldn't be ideal for us since we're not the datasource; we don't
> have any special knowledge about when would be a good time to run. We
> just don't want everybody piling on to a particular source at once. Changing
> the default MFDB window to 0/23 would be simpler and better
> from our point of view.

I assumed that this is in effect already what the scripts you use do, or do you
just ask users to setup the cron job at a random time? The time you feed back
to MythTV doesn't have to be provided by the data source, it could just come
from the grabber which generates a relatively random time ~24 from the last
run.

The way this would be implemented in xmltv would be through the --capabilities
API - http://wiki.xmltv.org/index.php/XmltvCapabilities. A new
'suggestnextrun' capability for example.

The ultimate aim here is to simplify things for the end-user so that all xmltv
grabbers can operate in exactly the same way and with minimal configuration.
For most xmltv grabbers this means simply by selecting them from a list of
available grabbers in mythtv-setup and selecting which channels to grab data
for. This is also cross-platform, it should work the same way on Windows, BSD
and OSX.
--
Stuart Morgan
MythTV
_______________________________________________
mythtv-dev mailing list
mythtv-dev@mythtv.org
http://www.mythtv.org/mailman/listinfo/mythtv-dev
Re: Mythfilldatabase and Shepherd (tv_grab_au) [ In reply to ]
Hi Stuart,

stuart at tase wrote:
> On Tuesday 19 Jul 2011 11:42:00 Max Barry wrote:
>> That wouldn't be ideal for us since we're not the datasource; we don't
>> have any special knowledge about when would be a good time to run. We
>> just don't want everybody piling on to a particular source at once. Changing
>> the default MFDB window to 0/23 would be simpler and better
>> from our point of view.
>
> I assumed that this is in effect already what the scripts you use do, or do you
> just ask users to setup the cron job at a random time?

During installation, Shepherd creates a cron job for the user, which by
default looks like this:

21 * * * * nice /usr/bin/mythfilldatabase > /dev/null

The "21" above is randomized.

Until yesterday, we used --graboptions to send a '--daily' flag to
Shepherd, but we've removed that, since --graboptions is no longer
supported.

> The time you feed back
> to MythTV doesn't have to be provided by the data source, it could just come
> from the grabber which generates a relatively random time ~24 from the last
> run.
>
> The way this would be implemented in xmltv would be through the --capabilities
> API - http://wiki.xmltv.org/index.php/XmltvCapabilities. A new
> 'suggestnextrun' capability for example.

Would that override the MFDB valid window? I.e. if a user has a default
install, with a valid window of 2-5am, and Shepherd responds with some
--capabilities flag that says, "Run at 6:32am," which would take
precedence? Either way may have problems.

> The ultimate aim here is to simplify things for the end-user so that all xmltv
> grabbers can operate in exactly the same way and with minimal configuration.
> For most xmltv grabbers this means simply by selecting them from a list of
> available grabbers in mythtv-setup and selecting which channels to grab data
> for. This is also cross-platform, it should work the same way on Windows, BSD
> and OSX.

Right, and we certainly tried to do this, but couldn't find a way to
work with mythtv-setup that wasn't filled with gotchas for users.
Matching grabber XMLTV IDs to MythTV channels, for example, was a common
source of anguish: users had to scan channels, then select a grabber,
configure channels there, and carefully copy each channel's XMLTV ID
back into the appropriate place in MythTV.

I just had a quick browse of the MythTV wiki, to see whether things have
changed, but it seems not. (Please correct me if I'm wrong!) The UK
grabber's installation instructions, for example, include this:

http://www.mythtv.org/wiki/Uk_xmltv#The_next_step

... which is exactly what we used to ask our users to do. I also see
they advise users NOT to run the grabber config from within
mythtv-setup: we encountered a few problems with that, too.

We ended up automating the installation process to remove error-prone
gotchas such as setting up symlinks, matching XMLTV IDs, and modifying
scheduling. It's a little brittle, because it relies upon certain fields
in the MythTV DB (like MythFillEnabled) not changing, but it has made
life a lot easier for us and our users.

Max.

_______________________________________________
mythtv-dev mailing list
mythtv-dev@mythtv.org
http://www.mythtv.org/mailman/listinfo/mythtv-dev
Re: Mythfilldatabase and Shepherd (tv_grab_au) [ In reply to ]
On 20/07/2011 10:25 AM, Max Barry wrote:
> We ended up automating the installation process to remove error-prone
> gotchas such as setting up symlinks, matching XMLTV IDs, and modifying
> scheduling. It's a little brittle, because it relies upon certain fields
> in the MythTV DB (like MythFillEnabled) not changing, but it has made
> life a lot easier for us and our users.

And a fantastic job you did too, Max. Thank you immensely for shepherd.

Johan

_______________________________________________
mythtv-dev mailing list
mythtv-dev@mythtv.org
http://www.mythtv.org/mailman/listinfo/mythtv-dev