Project

General

Profile

Bug #4537

Out of memory: Kill process 9197 (tvheadend) score 41 or sacrifice child

Added by cy clic about 7 years ago. Updated about 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Crashes
Target version:
-
Start date:
2017-08-18
Due date:
% Done:

0%

Estimated time:
Found in version:
HTS Tvheadend 4.2.3-20~g407c8a3 and earlier
Affected Versions:

Description

tvheadend service dies on pi 2 server, kodi client tv can hang (with circle),
9981 access of course fails also.
If #service tvheadend restart - then it does correctly restart until next death.
Fails on "HTS Tvheadend 4.2.3-20~g407c8a3" - Raspberry Pi 2 - Ubuntu 16.4 LTS
Earlier versions have also had similar death.
Client is kodi on pi 3 - does not require restart.
i.e. when server has manual service tvheadend restart then kodi comes back to life.
Suspicion following dvr activity e.g. reservation, start or stop.

Same version does not seem to fail on Intel i7 - Kubuntu 16.04 LTS


Files

Truncated Syslog up to tvheadend dies.txt (28.2 KB) Truncated Syslog up to tvheadend dies.txt cy clic, 2017-08-18 20:27
tvheadend-status-summary.txt (3.23 KB) tvheadend-status-summary.txt cy clic, 2017-08-30 20:10
tvheadend-syslog-grep (112 KB) tvheadend-syslog-grep cy clic, 2017-08-30 20:10
memory-log-clean (35.6 KB) memory-log-clean cy clic, 2017-09-10 21:32

History

#1

Updated by Mark Clarkstone about 7 years ago

cr c wrote:
... snip ...

Are you doing recordings?

#2

Updated by cy clic about 7 years ago

Mark Clarkstone wrote:

Are you doing recordings?

Affirmative - Recordings are good

#3

Updated by Mark Clarkstone about 7 years ago

cr c wrote:

Mark Clarkstone wrote:

Are you doing recordings?

Affirmative - Recordings are good

Ok, what "Cache scheme" are you using for the profile? Try with it set to "Don't keep".

#4

Updated by cy clic about 7 years ago

Mark Clarkstone wrote:
Ok, what "Cache scheme" are you using for the profile? Try with it set to "Don't keep".

Cache was set to System - as per help doc
I have set to Don't Keep as test per your suggestion

#5

Updated by cy clic about 7 years ago

With cache at "Don't keep" it still fails with status "(Exited)".

One death had last log relating to our well known 11425H:

Aug 24 13:01:00 h1 tvheadend13597: subscription: 028B: "" subscribing to mux "11425H", weight: 100, adapter: "SAT>IP DVB-S Tuner #5 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 24 13:11:00 h1 tvheadend13597: subscription: 028B: "" unsubscribing

as the last log.

Whether there was a channel left playing out seems irrelevant.

One had end of recording (unsubscribe) as last log

One had sequence
Aug 24 01:01:45 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:04:40 h1 tvheadend10552: epggrab: UK: Freesat - data completion timeout for 11126.5V in 28.2E
Aug 24 01:04:40 h1 tvheadend10552: epggrab: EIT: DVB Grabber - data completion timeout for 11126.5V in 28.2E
Aug 24 01:07:07 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:07:35 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:07:45 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:09:30 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:09:51 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:10:23 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:11:00 h1 tvheadend10552: subscription: 0298: "" unsubscribing
as last log.

Typical time from restart to death was around 7 hours.

Any help?

#6

Updated by Mark Clarkstone about 7 years ago

cr c wrote:

With cache at "Don't keep" it still fails with status "(Exited)".

One death had last log relating to our well known 11425H:

Aug 24 13:01:00 h1 tvheadend13597: subscription: 028B: "" subscribing to mux "11425H", weight: 100, adapter: "SAT>IP DVB-S Tuner #5 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 24 13:11:00 h1 tvheadend13597: subscription: 028B: "" unsubscribing

as the last log.

Whether there was a channel left playing out seems irrelevant.

One had end of recording (unsubscribe) as last log

One had sequence
Aug 24 01:01:45 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:04:40 h1 tvheadend10552: epggrab: UK: Freesat - data completion timeout for 11126.5V in 28.2E
Aug 24 01:04:40 h1 tvheadend10552: epggrab: EIT: DVB Grabber - data completion timeout for 11126.5V in 28.2E
Aug 24 01:07:07 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:07:35 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:07:45 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:09:30 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:09:51 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:10:23 h1 tvheadend10552: mpegts: too much queued table input data (over 2MB), discarding new
Aug 24 01:11:00 h1 tvheadend10552: subscription: 0298: "" unsubscribing
as last log.

Typical time from restart to death was around 7 hours.

Any help?

Could you try upgrading to unstable to see if it happens there? I've just installed 4.3.358 on my Banana Pi M1 to test.

#7

Updated by cy clic about 7 years ago

_Mark Clarkstone wrote:

Could you try upgrading to unstable to see if it happens there? I've just installed 4.3.358 on my Banana Pi M1 to test._

I now have 4.3-357~g3df769a8c~xenial running.
(does 358 have build failed?)
I will let 357 run overnight

#8

Updated by Mark Clarkstone about 7 years ago

cr c wrote:

_Mark Clarkstone wrote:

Could you try upgrading to unstable to see if it happens there? I've just installed 4.3.358 on my Banana Pi M1 to test._

I now have 4.3-357~g3df769a8c~xenial running.

> (does 358 have build failed?)

I will let 357 run overnight

Not a failure as such, the build script pulls in some previously compiled libs from bintray but we've hit our quota so it's failing for the moment.

#9

Updated by cy clic about 7 years ago

Per your request:
Testing 4.3-357~g3df769a8c~xenial on pi2 satip 4 tuners, recording to smb, client kodi on pi3:
First run lasted 19 hrs - failed
Second run lasted 1 day 14 hours - failed
Both failed showing status "(Exited)"
First was not playing out TV at exit
Second was playing out TV at exit - which remained frozen
Both most recent subscription was "epggrab"

It seems therefore that there exists a superficially similar fail mode in the 4.3.357.
Perhaps there is a difference in that the fail occurs after a longer period of operation. (3 to 6 times longer).

#10

Updated by Mark Clarkstone about 7 years ago

cr c wrote:

Per your request:
Testing 4.3-357~g3df769a8c~xenial on pi2 satip 4 tuners, recording to smb, client kodi on pi3:
First run lasted 19 hrs - failed
Second run lasted 1 day 14 hours - failed
Both failed showing status "(Exited)"
First was not playing out TV at exit
Second was playing out TV at exit - which remained frozen
Both most recent subscription was "epggrab"

It seems therefore that there exists a superficially similar fail mode in the 4.3.357.
Perhaps there is a difference in that the fail occurs after a longer period of operation. (3 to 6 times longer).

If it's the epggrab you could maybe trigger this by pressing the "Force EPG grabber" button multiple times (let it finish each time).

I shall hook up my spare SAT tuner & see if I can get mine to OOM. I'll also upgrade my main system to the latest, still only using 332.

EDIT:

Ok, I'm running 4.3.361 built with clang and set it up to look for leaks, let's see if I can find anything that'll help Jaroslav fix this issue :).

Ideally I think he'd need access to arm hardware (as in the RPI) to debug these kind of problems, I'm not sure he has anything though.

#11

Updated by cy clic about 7 years ago

_Mark Clarkstone wrote:

If it's the epggrab you could maybe trigger this by pressing the "Force EPG grabber" button multiple times (let it finish each time)_

Given that it fails without intervention needed,
I used a script to log service status periodically.
HTS Tvheadend 4.3-363~ge1ad2342d~xenial on pi2 with Ubuntu 16.04
Memory usage hovered around 65MB for 15 hours
(Usual scans epggrab occasional live playout)
Inspection of status summary attached shows memory
32 13:52 64.6 MB
33 14:02 64.6 MB
34 14:12 65.1 MB
35 14:22 451.4 MB
36 14:32 734.4 MB
37 14:42 exited
If leak is linear - probably started soon after 14:12
Inspection of the syslog tvheadend grep shows
There are lots of epggrabs but curiously an uncommon dvr at 14:15
Total death is at 14:37
_ Aug 30 14:37:37 h1 kernel: [2184818.225668] [29666] 112 29666 74342 15967 128 0 0 0 tvheadend
Aug 30 14:37:37 h1 kernel: [2184818.225705] Out of memory: Kill process 29666 (tvheadend) score 68 or sacrifice child
Aug 30 14:37:37 h1 kernel: [2184818.248345] Killed process 29666 (tvheadend) total-vm:297368kB, anon-rss:62028kB, file-rss:1840kB_
Truncated files are attached.
I'll rerun.

#12

Updated by cy clic about 7 years ago

cr c wrote:

Truncated files are attached.

#13

Updated by Mark Clarkstone about 7 years ago

-- Removed as not really related to issue --

#14

Updated by Jaroslav Kysela about 7 years ago

Mark, use dvbcsa library instead ffdecsa (--enable-dvbcsa) for ARM.

#15

Updated by Mark Clarkstone about 7 years ago

Jaroslav Kysela wrote:

Mark, use dvbcsa library instead ffdecsa (--enable-dvbcsa) for ARM.

I just disabled it in this case, thought you might want to be aware of it (assumed it as a bug).

#16

Updated by cy clic about 7 years ago

cr c wrote:

I'll rerun.

Re-run started on HTS Tvheadend 4.3-363~ge1ad2342d~xenial on pi2
Restart at 19:41 - 30Aug17
Starts scans as usual, Memory settled at ~ 64-66MB (normal?)
*Things happened at ~14:15 31Aug17 (19 hours running
Memory ramped to from 68 MB to 779 MB over 30 minutes
That's ~ 23 MB per minute

Note:
Before 14:15 syslog seem normal - usual epggrab:
At 14:15 there is pass: and dvr: (epggrab: continue as before)*
I have the full status and syslog files if that helps
but below is snipit at crucial moments.

Jaroslav,
Is this helpful?
--- Cyclic


Memory Every 10 Minutes during the last hour
follows:+

Thu Aug 31 13:41:34 UTC 2017
Memory: 66.3M
Thu Aug 31 13:51:34 UTC 2017
Memory: 66.3M
Thu Aug 31 14:01:34 UTC 2017
Memory: 66.6M
Thu Aug 31 14:11:34 UTC 2017
Memory: 68.1M
Thu Aug 31 14:21:35 UTC 2017
Memory: 444.8M
Thu Aug 31 14:31:35 UTC 2017
Memory: 734.5M
Thu Aug 31 14:41:35 UTC 2017
Memory: 779.6M
Thu Aug 31 14:51:36 UTC 2017

Syslog for tvheadend for last 47 minutes which includes memory ramp and death:

Aug 31 14:01:00 h1 tvheadend1041: mpegts: 11425H in 28.2E - tuning on SAT>IP DVB-S Tuner #6 (x.x.x.x)
Aug 31 14:01:00 h1 tvheadend1041: subscription: 0447: "" subscribing to mux "11425H", weight: 100, adapter: "SAT>IP DVB-S Tuner #6 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:04:00 h1 tvheadend1041: mpegts: 10714.25H in 28.2E - tuning on SAT>IP DVB-S Tuner #7 (x.x.x.x)
Aug 31 14:04:00 h1 tvheadend1041: subscription: 044A: "epggrab" subscribing to mux "10714.25H", weight: 4, adapter: "SAT>IP DVB-S Tuner #7 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:04:00 h1 tvheadend1041: mpegts: 10773H in 28.2E - tuning on SAT>IP DVB-S Tuner #8 (x.x.x.x)
Aug 31 14:04:00 h1 tvheadend1041: subscription: 044B: "epggrab" subscribing to mux "10773H", weight: 4, adapter: "SAT>IP DVB-S Tuner #8 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:11:00 h1 tvheadend1041: subscription: 0447: "" unsubscribing
Aug 31 14:11:00 h1 tvheadend1041: mpegts: 11425H in 28.2E - tuning on SAT>IP DVB-S Tuner #6 (x.x.x.x)
Aug 31 14:11:00 h1 tvheadend1041: subscription: 0450: "epggrab" subscribing to mux "11425H", weight: 4, adapter: "SAT>IP DVB-S Tuner #6 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:14:10 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 10773H in 28.2E
Aug 31 14:14:10 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 10773H in 28.2E
Aug 31 14:14:10 h1 tvheadend1041: subscription: 044B: "epggrab" unsubscribing
Aug 31 14:14:10 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 10714.25H in 28.2E
Aug 31 14:14:10 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 10714.25H in 28.2E
Aug 31 14:14:10 h1 tvheadend1041: subscription: 044A: "epggrab" unsubscribing
Aug 31 14:14:11 h1 tvheadend1041: mpegts: 10803H in 28.2E - tuning on SAT>IP DVB-S Tuner #7 (x.x.x.x)
Aug 31 14:14:11 h1 tvheadend1041: subscription: 0454: "epggrab" subscribing to mux "10803H", weight: 4, adapter: "SAT>IP DVB-S Tuner #7 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:14:11 h1 tvheadend1041: mpegts: 10891H in 28.2E - tuning on SAT>IP DVB-S Tuner #8 (x.x.x.x)
Aug 31 14:14:11 h1 tvheadend1041: subscription: 0455: "epggrab" subscribing to mux "10891H", weight: 4, adapter: "SAT>IP DVB-S Tuner #8 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:14:30 h1 tvheadend1041: dvr: "Badge of Betrayal" on "Channel 5HD" recorder starting
Aug 31 14:14:30 h1 tvheadend1041: mpegts: 10964H in 28.2E - tuning on SAT>IP DVB-S Tuner #8 (x.x.x.x)
Aug 31 14:14:30 h1 tvheadend1041: subscription: 0455: "epggrab" unsubscribing
Aug 31 14:14:30 h1 tvheadend1041: subscription: 0457: "DVR: Badge of Betrayal" subscribing on channel "Channel 5HD", weight: 300, adapter: "SAT>IP DVB-S Tuner #8 (x.x.x.x)", network: "28.2E", mux: "10964H", provider: "BSkyB", service: "Channel 5HD", profile="pass"
Aug 31 14:15:00 h1 tvheadend1041: pass: /mnt/tttt/Video/hts/Badge of Betrayal-1.ts: Unable to change permissions -- Operation not permitted
Aug 31 14:15:00 h1 tvheadend1041: dvr: /mnt/tttt/Video/hts/Badge of Betrayal-1.ts from adapter: "SAT>IP DVB-S Tuner #8 (x.x.x.x)", network: "28.2E", mux: "10964H", provider: "BSkyB", service: "Channel 5HD"
Aug 31 14:15:00 h1 tvheadend1041: dvr: # type lang resolution aspect ratio sample rate channels
Aug 31 14:15:00 h1 tvheadend1041: dvr: 1 H264 1920x1080 ?
Aug 31 14:15:00 h1 tvheadend1041: dvr: 2 MPEG2AUDIO nar ? ?
Aug 31 14:15:00 h1 tvheadend1041: dvr: 3 DVBSUB eng
Aug 31 14:15:00 h1 tvheadend1041: dvr: 4 TELETEXT
Aug 31 14:15:00 h1 tvheadend1041: dvr: 5 AC3 eng ? ?
Aug 31 14:15:00 h1 tvheadend1041: dvr: 6 TEXTSUB eng
Aug 31 14:21:10 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 11425H in 28.2E
Aug 31 14:21:10 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 11425H in 28.2E
Aug 31 14:21:10 h1 tvheadend1041: subscription: 0450: "epggrab" unsubscribing
Aug 31 14:21:11 h1 tvheadend1041: mpegts: 10891H in 28.2E - tuning on SAT>IP DVB-S Tuner #6 (x.x.x.x)
Aug 31 14:21:11 h1 tvheadend1041: subscription: 0462: "epggrab" subscribing to mux "10891H", weight: 4, adapter: "SAT>IP DVB-S Tuner #6 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:24:21 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 10803H in 28.2E
Aug 31 14:24:21 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 10803H in 28.2E
Aug 31 14:24:21 h1 tvheadend1041: subscription: 0454: "epggrab" unsubscribing
Aug 31 14:24:22 h1 tvheadend1041: mpegts: 11024H in 28.2E - tuning on SAT>IP DVB-S Tuner #7 (x.x.x.x)
Aug 31 14:24:22 h1 tvheadend1041: subscription: 0466: "epggrab" subscribing to mux "11024H", weight: 4, adapter: "SAT>IP DVB-S Tuner #7 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:24:40 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 10964H in 28.2E
Aug 31 14:24:40 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 10964H in 28.2E
Aug 31 14:30:00 h1 tvheadend1041: subscription: 041F: "DVR: The Adventures of Tintin: Secret of..." unsubscribing from "BBC Two HD"
Aug 31 14:30:00 h1 tvheadend1041: dvr: "The Adventures of Tintin: Secret of..." on "BBC Two HD": End of program: Time missed
Aug 31 14:31:21 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 10891H in 28.2E
Aug 31 14:31:21 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 10891H in 28.2E
Aug 31 14:31:21 h1 tvheadend1041: subscription: 0462: "epggrab" unsubscribing
Aug 31 14:31:22 h1 tvheadend1041: mpegts: 11052.75H in 28.2E - tuning on SAT>IP DVB-S Tuner #6 (x.x.x.x)
Aug 31 14:31:22 h1 tvheadend1041: subscription: 0472: "epggrab" subscribing to mux "11052.75H", weight: 4, adapter: "SAT>IP DVB-S Tuner #6 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:34:32 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 11024H in 28.2E
Aug 31 14:34:32 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 11024H in 28.2E
Aug 31 14:34:32 h1 tvheadend1041: subscription: 0466: "epggrab" unsubscribing
Aug 31 14:34:33 h1 tvheadend1041: mpegts: 11081H in 28.2E - tuning on SAT>IP DVB-S Tuner #7 (x.x.x.x)
Aug 31 14:34:33 h1 tvheadend1041: subscription: 0476: "epggrab" subscribing to mux "11081H", weight: 4, adapter: "SAT>IP DVB-S Tuner #7 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:41:33 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 11052.75H in 28.2E
Aug 31 14:41:33 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 11052.75H in 28.2E
Aug 31 14:41:33 h1 tvheadend1041: subscription: 0472: "epggrab" unsubscribing
Aug 31 14:41:33 h1 tvheadend1041: mpegts: 11141.25H in 28.2E - tuning on SAT>IP DVB-S Tuner #6 (x.x.x.x)
Aug 31 14:41:33 h1 tvheadend1041: subscription: 047B: "epggrab" subscribing to mux "11141.25H", weight: 4, adapter: "SAT>IP DVB-S Tuner #6 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:44:43 h1 tvheadend1041: epggrab: UK: Freesat - data completion timeout for 11081H in 28.2E
Aug 31 14:44:43 h1 tvheadend1041: epggrab: EIT: DVB Grabber - data completion timeout for 11081H in 28.2E
Aug 31 14:44:43 h1 tvheadend1041: subscription: 0476: "epggrab" unsubscribing
Aug 31 14:44:44 h1 tvheadend1041: mpegts: 11170.75H in 28.2E - tuning on SAT>IP DVB-S Tuner #7 (x.x.x.x)
Aug 31 14:44:44 h1 tvheadend1041: subscription: 047F: "epggrab" subscribing to mux "11170.75H", weight: 4, adapter: "SAT>IP DVB-S Tuner #7 (x.x.x.x)", network: "28.2E", service: "Raw PID Subscription"
Aug 31 14:47:41 h1 kernel: [2271822.332175] [ 1041] 112 1041 74301 17147 129 0 0 0 tvheadend
Aug 31 14:47:41 h1 kernel: [2271822.332217] Out of memory: Kill process 1041 (tvheadend) score 73 or sacrifice child
Aug 31 14:47:41 h1 kernel: [2271822.355864] Killed process 1041 (tvheadend) total-vm:297204kB, anon-rss:66452kB, file-rss:2724kB

#17

Updated by Mark Clarkstone about 7 years ago

cr c wrote:

... snip ...

Ah I didn't notice you were using a SAT-IP server. I have a feeling there is some kind of backlog going on. Try disabling OTA epg grabbing on all but one satip tuner, see if that helps.

#18

Updated by cy clic about 7 years ago

_Mark Clarkstone wrote:

cr c wrote:

... snip ...

Ah I didn't notice you were using a SAT-IP server. I have a feeling there is some kind of backlog going on. Try disabling OTA epg grabbing on all but one satip tuner, see if that helps._

Mark,
I don't see any correlation evidence that it is connected with SatIP Tuners or epg grabbing.

I looked elsewhere and my best bet conclusion is as below.

1. This Crash scenario is always proceeded by a "recording" which has just "completed".
Typical example:
tvheadend24528: subscription: 0269: "DVR: Happy Feet" subscribing on channel "ITV HD", weight: 300, adapter: "SAT>IP DVB-S Tuner #5 (xx.xx.xx.xx)", network: "28.2E", mux: "11052.75H", provider: "BSkyB", service: "ITV HD", profile="pass"

2. On completion, "pass:" attempts to modify file permissions which fails.
Typical example:
tvheadend24528: pass: /mnt/zzzz/Video/hts/Happy Feet-1.ts: Unable to change permissions -- Operation not permitted

3. Followed by dvr: activity.
Typical Example:
tvheadend24528: dvr: /mnt/t2-ChanAll/Video/hts/Happy Feet-1.ts from adapter: "SAT>IP DVB-S Tuner #5 (192.168.0.50)", network: "28.2E", mux: "11052.75H", provider: "BSkyB", service: "ITV HD"
tvheadend24528: dvr: # type lang resolution aspect ratio sample rate channels
tvheadend24528: dvr: 2 MPEG2AUDIO nar ? ?
tvheadend24528: dvr: 3 DVBSUB eng
tvheadend24528: dvr: 4 TELETEXT
tvheadend24528: dvr: 5 AC3 eng ? ?
tvheadend24528: dvr: 6 TEXTSUB eng

4. After this pass/dvr action a significant memory leak starts at ~20MB /min.
Within typically 10 minutes the memory leak triggers the kernel killing.
Typical example:
Sep 3 12:08:09 h1 kernel: [2521450.666478] [24528] 112 24528 71922 18167 124 0 0 0 tvheadend
Sep 3 12:08:09 h1 kernel: [2521450.666500] Out of memory: Kill process 24528 (tvheadend) score 77 or sacrifice child
Sep 3 12:08:09 h1 kernel: [2521450.686111] Killed process 24528 (tvheadend) total-vm:287688kB, anon-rss:65608kB, file-rss:7060kB

Conclusion
I think that the code for changing file permission as setup under "Recording file options" does not handle a failure to change the permission condition "gracefully" (such as when recording to a mounted NAS e.g. a Terrastation supporting smb but not the changing of permissions, as mounted.
Instead the code enters a deadly loop which takes up memory at 20MB/min until killed by the kernel.

If this analysis is correct and a patch made I would be happy to test.
cyclic

#19

Updated by Jaroslav Kysela about 7 years ago

There is not other bottleneck in the code when the permission error is printed. I guess that the whole problem is that your network mount is really slow and tvh is not able to write data in time so they're stacked in the main memory. Could you do some i/o benchmark ?

#20

Updated by Mark Clarkstone about 7 years ago

Jaroslav Kysela wrote:

There is not other bottleneck in the code when the permission error is printed. I guess that the whole problem is that your network mount is really slow and tvh is not able to write data in time so they're stacked in the main memory. Could you do some i/o benchmark ?

I had similar thinking but more along the lines of the RPi's one usb bus getting overloaded & tvh ending up with a backlog.

#21

Updated by cy clic about 7 years ago

_Jaroslav Kysela wrote:

Could you do some i/o benchmark ?_

Test Config
pi2 with hts running and 2 NAS devices both mounted.
fstab mounts are identical except share and mount point.
Net is 1G but of course pi2 is only 100M !!
Tuners are SAT>IP - 4 tuners are allocated to the pi2 hts
Kodi client is also connected
also PC with web portal to hts
also PC with ssh to hts
This is all on a live network with other traffic.

Simple I/O:
NAS1 The Terastation (which will not allow permission change) - raid5 4 x sata 300
NAS2 6yr old Atom Ion dual core with zfs mirror 2 x sata 300 using samba
test was "time cp" 1GB from the pi2 USB to NAS.
NAS 1: typ 1m52sec, NAS 2: typ 1m34sec - So not too dissimilar

HTS I/O test is to test multiple concurrent HD recordings:

Live Recording - hts on pi2 -> NAS1
Memory usage is stable until after attempt to change file permissions at first recording.
In this configuration memory reached 600MB after 40 minutes.
This is slightly different from the status/syslog from the live configuration.

Live Recording - hts on pi2 -> NAS2
3 concurrent HD recording - Memory stays at 66MB - No crash
A 4th recording starts does memory increasing

Do you have any stats on the net benefit of pi3 over pi2 to tvheadend?

#22

Updated by Jaroslav Kysela about 7 years ago

Could you use 'cifsiostat -m 1' command on rpi to watch the I/O throughput when recordings is running?

#23

Updated by cy clic about 7 years ago

_Jaroslav Kysela wrote:

Could you use 'cifsiostat -m 1' command on rpi to watch the I/O throughput when recordings is running?_

Sorry for the delay.

I see 2 indicators which I recognise to show that tvheadend has failed.
1/ The service tvheadend status - displays "exited"
2/ The 9981 web interface ceases to respond on pc or client

I have been using 2 nas servers:
a) NAS1: Terastation which rejects file mode change and fails in hours
b) NAS2: UbuntuServer+Samba which accepts file mode change and fails in days
I am now using NAS2 for tests.

It can record 3 HD programmes at the same time and they do not fail.
However when a programme comes to the end there is a "failure".
The others continue for a while (buffers?) afterwards.
(I inspect size and date/time stamp on the NAS files)

If tvheadend is restarted, the unfinished recording are resurrected.
Sometimes on end of first file, sometimes with -n on end of file name.
This seems like a (separate) bug??

So if I restart the service (rather than reboot)
then one can end up with 3 or 4 files of the same programme.

Attached is text file with "free -m" & "cifsiostat -m" every 10 seconds.
Memory used seems to increase as expected but when all unused
memory is allocated to buffers, tvheadeend does not automatically crash.
However if a recording finishes at this point it does fail.
I am not sure what cisfiostat is saying?

On the attached file at the beginning 2 old recordings are re continued.
@Count =1
Then 2 more HD are set recording.
@Count=~24
So 4 HD are then recording.
Shortly before the end of this file a recording "Grand Prix" finishes,
Count=~41 -~42
Than Tvheadend crashes and memory recovers to tickover.

Not sure how this relates to the code...
Version is HTS Tvheadend 4.3-363~ge1ad2342d~xenial
H/w is pi2 on unbuntu server 16.04.03
Cyclic

#24

Updated by cy clic about 7 years ago

cr c wrote:

_Jaroslav Kysela wrote:

Could you use 'cifsiostat -m 1' command on rpi to watch the I/O throughput when recordings is running?_

Jaroslav,
I think that I have found something significant.
(Still testing so thoughts below not conclusive)

I have been watching memory usage while trying lots of scenario for abnormalities.
Using free /-m was showing "used" as changing somewhat (~100MB +/-) but not silly.
On recording cache/buffers would ramp up, return (jump) to lower value repeatedly
The higher this figure, the more likely it is to fail.

I looked at the swap and reconfigured it all from scratch - partition on usb.
Since then it has not failed.

The combine figure for cache and buffer can I think be "separated" out by accessing the nas share on another machine as a mounted share and displayed say by dolphin.
A refresh on that dolphin always resets the combined figure to a lower value which I assume is the buffer usage (on the assumption that the refresh makes the cache component to be out of date). Is this correct?

That lower figure I assumed to be the buffer excluding cache and is typically around 55MB - 120MB depending on activity including recordings.

Does this make sense?
Should tvheadend on pi require swap to be configured or should the cache allocation be managed without swap?
cyclic

#25

Updated by Jaroslav Kysela about 7 years ago

The "disk" cache is generally managed in kernel, so you might tune the kernel parameters to "flush" data in short intervals - look for 'vm.dirty_expire_centisecs' sysctl parameter (and probably all 'vm.dirty*' variables). Also, I often tune 'vm.swappiness=10' to avoid using swap.

#26

Updated by marco raap about 7 years ago

Same Problem here.

it uses while recording all the available memory until it all crashed

#27

Updated by marco raap about 7 years ago

It seems like the problem lies in the fstab config.

remove the sync option on mounted devices. that keeps the memory free.
But still testing

#28

Updated by cy clic about 7 years ago

_Jaroslav Kysela wrote:

The "disk" cache is generally managed in kernel, so you might tune the kernel parameters to "flush" data in short intervals - look for 'vm.dirty_expire_centisecs' sysctl parameter (and probably all 'vm.dirty*' variables). Also, I often tune 'vm.swappiness=10' to avoid using swap._

Thanks for your steer. I have extended tested various configurations playing with sysctl settings.
For others, the solution I have on rpi2 tested on 4.2.3-113 is -

Setting vm.dirty_expire_centisecs to 300 seems to flush well enough.
I finally left vm.swappiness at 1 which does do occasional swapping using up to 20MB or so swap.
The result is no crash.

Finally I added :
vm.dirty_expire_centisecs = 300
vm.swappiness = 1
to the tail of /etc/sysctl.conf to keep on reboot.

I am all for things working "out of the box" especially for users new to tvh who might otherwise give up, and I wonder is it not possible to effect the recording without triggering the nugatory caching which seems to be the base issue?

Also available in: Atom PDF