Bug #4607
TVHeadend crash (out of memory)
0%
Description
I have a problem with some of my TVHeadend VMs. They regularly crash with an out-of-memory message.
I have 1 ESXi host with the following config:
1. Linux Debian 9 + DVBSky S960 + TVheadend 4.3-330~g1c6c91508
2. Linux Debian 9 + DVBSky S960 + TVheadend 4.3-295~ga9dffb7e9
3. Linux Debian 9 + Terratec H7 + TVHeadend 4.3-322~g096079129
4. Linux Debian 9 + Terratec H7 + TVheadend 4.3-358~g99d191a46
5. Linux Debian 8 + TBS Qbox 5980 + TVHeadend 4.1-2449~gda58349
+ a few other VMs that have nothing to do with TVHeadend
Furthermore, I also have a RPi + DVBSky S960 + Tvheadend 4.3-117~gddf420b
This config has been in use for a few years, and never caused any real issues.
My problems started a few days after I upgraded from Debian 8 to Debian 9.
I was unable to get the drivers of the TBS working, so that VM (#5) stayed on Debian 8. It has not been affected by this problem.
After the upgrade to Debian 9, I recompiled the drivers for the DVB hardware and TVHeadend itself. All VMs (except #5) were at 4.3-295 then.
3 of the 4 other VMs have crashed multiple times since the upgrade to Debian 9. I don't think VM #2 ever crashed, even though it has the same hardware and config as #1.
After the crash I upgraded and recompiled TVheadend, hence all the different versions I have now.
#1 seems to be most affected. It crashed at least 3 times yesterday.
This is a part from the messages log from yesterday
Sep 17 09:38:03 caladan kernel: [28719.338548] ntpd invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0 Sep 17 09:38:03 caladan kernel: [28719.338549] ntpd cpuset=/ mems_allowed=0 Sep 17 09:38:03 caladan kernel: [28719.338559] CPU: 0 PID: 514 Comm: ntpd Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3 Sep 17 09:38:03 caladan kernel: [28719.338560] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 Sep 17 09:38:03 caladan kernel: [28719.338561] 0000000000000000 ffffffff8b928574 ffff9e66c0bdfc38 ffff8bffda5d3080 Sep 17 09:38:03 caladan kernel: [28719.338563] ffffffff8b7fe050 0000000000000000 0000000000000000 0000000c024213ca Sep 17 09:38:03 caladan kernel: [28719.338564] ffffffff8b7844e7 00000042df117040 0000000000000000 7e26a2f7f02060be Sep 17 09:38:03 caladan kernel: [28719.338565] Call Trace: Sep 17 09:38:03 caladan kernel: [28719.338593] [<ffffffff8b928574>] ? dump_stack+0x5c/0x78 Sep 17 09:38:03 caladan kernel: [28719.338600] [<ffffffff8b7fe050>] ? dump_header+0x78/0x1fd Sep 17 09:38:03 caladan kernel: [28719.338606] [<ffffffff8b7844e7>] ? get_page_from_freelist+0x3f7/0xb40 Sep 17 09:38:03 caladan kernel: [28719.338610] [<ffffffff8b78047a>] ? oom_kill_process+0x21a/0x3e0 Sep 17 09:38:03 caladan kernel: [28719.338611] [<ffffffff8b7800fd>] ? oom_badness+0xed/0x170 Sep 17 09:38:03 caladan kernel: [28719.338612] [<ffffffff8b780911>] ? out_of_memory+0x111/0x470 Sep 17 09:38:03 caladan kernel: [28719.338613] [<ffffffff8b785b4f>] ? __alloc_pages_slowpath+0xb7f/0xbc0 Sep 17 09:38:03 caladan kernel: [28719.338614] [<ffffffff8b785d8e>] ? __alloc_pages_nodemask+0x1fe/0x260 Sep 17 09:38:03 caladan kernel: [28719.338619] [<ffffffff8b7d5ec1>] ? alloc_pages_current+0x91/0x140 Sep 17 09:38:03 caladan kernel: [28719.338621] [<ffffffff8b77e940>] ? filemap_fault+0x300/0x5a0 Sep 17 09:38:03 caladan kernel: [28719.338635] [<ffffffffc05f9791>] ? ext4_filemap_fault+0x31/0x50 [ext4] Sep 17 09:38:03 caladan kernel: [28719.338638] [<ffffffff8b7aeef1>] ? __do_fault+0x81/0x170 Sep 17 09:38:03 caladan kernel: [28719.338640] [<ffffffff8b7b3671>] ? handle_mm_fault+0xdd1/0x1350 Sep 17 09:38:03 caladan kernel: [28719.338647] [<ffffffff8b65fd84>] ? __do_page_fault+0x2a4/0x510 Sep 17 09:38:03 caladan kernel: [28719.338656] [<ffffffff8bc07658>] ? page_fault+0x28/0x30 Sep 17 09:38:03 caladan kernel: [28719.338657] Mem-Info: Sep 17 09:38:03 caladan kernel: [28719.338661] active_anon:55502 inactive_anon:55559 isolated_anon:0 Sep 17 09:38:03 caladan kernel: [28719.338661] active_file:203 inactive_file:227 isolated_file:0 Sep 17 09:38:03 caladan kernel: [28719.338661] unevictable:0 dirty:0 writeback:0 unstable:2 Sep 17 09:38:03 caladan kernel: [28719.338661] slab_reclaimable:2182 slab_unreclaimable:3214 Sep 17 09:38:03 caladan kernel: [28719.338661] mapped:265 shmem:75 pagetables:1076 bounce:0 Sep 17 09:38:03 caladan kernel: [28719.338661] free:1116 free_pcp:70 free_cma:0 Sep 17 09:38:03 caladan kernel: [28719.338664] Node 0 active_anon:222008kB inactive_anon:222236kB active_file:812kB inactive_file:908kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1060kB dirty:0kB writeback:0kB shmem:300kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:8kB pages_scanned:13301 all_unreclaimable? yes Sep 17 09:38:03 caladan kernel: [28719.338665] Node 0 DMA free:1876kB min:88kB low:108kB high:128kB active_anon:6328kB inactive_anon:7100kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:24kB slab_unreclaimable:12kB kernel_stack:16kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Sep 17 09:38:03 caladan kernel: [28719.338667] lowmem_reserve[]: 0 455 455 455 455 Sep 17 09:38:03 caladan kernel: [28719.338669] Node 0 DMA32 free:2588kB min:2684kB low:3352kB high:4020kB active_anon:215680kB inactive_anon:215136kB active_file:812kB inactive_file:908kB unevictable:0kB writepending:0kB present:507840kB managed:488500kB mlocked:0kB slab_reclaimable:8704kB slab_unreclaimable:12844kB kernel_stack:1632kB pagetables:4220kB bounce:0kB free_pcp:280kB local_pcp:280kB free_cma:0kB Sep 17 09:38:03 caladan kernel: [28719.338671] lowmem_reserve[]: 0 0 0 0 0 Sep 17 09:38:03 caladan kernel: [28719.338672] Node 0 DMA: 27*4kB (UME) 17*8kB (UME) 16*16kB (U) 3*32kB (UME) 12*64kB (UME) 4*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1876kB Sep 17 09:38:03 caladan kernel: [28719.338687] Node 0 DMA32: 647*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2588kB Sep 17 09:38:03 caladan kernel: [28719.338690] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Sep 17 09:38:03 caladan kernel: [28719.338691] 1113 total pagecache pages Sep 17 09:38:03 caladan kernel: [28719.338691] 608 pages in swap cache Sep 17 09:38:03 caladan kernel: [28719.338692] Swap cache stats: add 198871, delete 198263, find 7938/13916 Sep 17 09:38:03 caladan kernel: [28719.338692] Free swap = 0kB Sep 17 09:38:03 caladan kernel: [28719.338693] Total swap = 731132kB Sep 17 09:38:03 caladan kernel: [28719.338693] 130958 pages RAM Sep 17 09:38:03 caladan kernel: [28719.338693] 0 pages HighMem/MovableOnly Sep 17 09:38:03 caladan kernel: [28719.338693] 4856 pages reserved Sep 17 09:38:03 caladan kernel: [28719.338694] 0 pages hwpoisoned Sep 17 09:38:03 caladan kernel: [28719.338694] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name Sep 17 09:38:03 caladan kernel: [28719.338697] [ 196] 0 196 13890 114 24 3 92 0 systemd-journal Sep 17 09:38:03 caladan kernel: [28719.338698] [ 199] 0 199 33443 84 56 3 261 0 vmtoolsd Sep 17 09:38:03 caladan kernel: [28719.338699] [ 240] 0 240 11669 13 22 3 469 -1000 systemd-udevd Sep 17 09:38:03 caladan kernel: [28719.338699] [ 255] 0 255 12468 0 28 3 115 0 rpcbind Sep 17 09:38:03 caladan kernel: [28719.338700] [ 311] 0 311 38372 1 65 3 865 0 VGAuthService Sep 17 09:38:03 caladan kernel: [28719.338701] [ 312] 0 312 1054 0 8 3 36 0 acpid Sep 17 09:38:03 caladan kernel: [28719.338702] [ 313] 0 313 62529 47 30 3 160 0 rsyslogd Sep 17 09:38:03 caladan kernel: [28719.338703] [ 314] 106 314 11283 23 26 3 93 -900 dbus-daemon Sep 17 09:38:03 caladan kernel: [28719.338704] [ 319] 0 319 7409 17 18 3 45 0 cron Sep 17 09:38:03 caladan kernel: [28719.338705] [ 321] 0 321 9495 30 22 3 100 0 systemd-logind Sep 17 09:38:03 caladan kernel: [28719.338706] [ 456] 0 456 17486 0 37 3 194 -1000 sshd Sep 17 09:38:03 caladan kernel: [28719.338707] [ 475] 108 475 8858 0 22 3 202 0 rpc.statd Sep 17 09:38:03 caladan kernel: [28719.338708] [ 502] 0 502 3634 0 12 3 40 0 agetty Sep 17 09:38:03 caladan kernel: [28719.338709] [ 512] 107 512 444318 109652 654 5 178829 0 tvheadend Sep 17 09:38:03 caladan kernel: [28719.338709] [ 514] 105 514 24463 33 21 3 99 0 ntpd Sep 17 09:38:03 caladan kernel: [28719.373963] oom_reaper: reaped process 512 (tvheadend), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kBA few more things I have tried beside upgrading/recompiling TVH:
- switch the DVB-hardware between #1 and #2
- switch the DVB-hardware between #1 and the RPi
- move everything (VMs + DVB-hardware) to a new ESXi-host
- memory test on the old ESXi-host => no problems found
- harddrive test on the old ESXi-host => no problems found
VM#1 and VM#2 have been assigned 512MB of RAM, VM#3 and VM#4 have been assigned 1GB of RAM.
While 512MB may be a bit marginal (is it?), the problem also occurs on VM#3 and #4 (but not on #2 - as of today)
The ESXi-host is not oversubscribed, it has 64GB of RAM
I am not sure if this is a problem with TVHeadend itself.
As far as I'm concerned it can also be something to do with the Kernel. Debian 9 uses 4.9, Debian 8 is still at 3.16
Over at the LibreElec forum there is a topic about issues with DVB-adapters and Kernel 4.9
This one: https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/
These are the current values of the top command:
VM #1
top - 20:46:25 up 3:28, 1 user, load average: 0.11, 0.11, 0.09 Tasks: 76 total, 2 running, 74 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.7 us, 2.1 sy, 0.0 ni, 93.1 id, 3.8 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem : 504408 total, 6748 free, 466604 used, 31056 buff/cache KiB Swap: 731132 total, 712908 free, 18224 used. 23576 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 512 hts 20 0 1048324 412448 768 S 4.7 81.8 5:23.21 tvheadend 931 root 20 0 0 0 0 S 0.3 0.0 0:01.37 kworker/0:1 934 root 20 0 44916 3396 2768 R 0.3 0.7 0:00.07 top 1 root 20 0 57020 1688 816 S 0.0 0.3 0:00.44 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
VM #2
top - 20:47:18 up 73 days, 3:09, 1 user, load average: 0.00, 0.00, 0.00 Threads: 115 total, 1 running, 114 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 504408 total, 6692 free, 231556 used, 266160 buff/cache KiB Swap: 731132 total, 687656 free, 43476 used. 254768 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 540 hts 20 0 854320 169684 0 S 0.3 33.6 185:53.61 tvheadend 1 root 20 0 57148 4544 2932 S 0.0 0.9 0:21.13 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:01.06 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:31.59 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
VM#3
top - 20:52:35 up 1 day, 19:09, 1 user, load average: 0.02, 0.07, 0.08 Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 1020488 total, 313844 free, 355928 used, 350716 buff/cache KiB Swap: 731132 total, 731132 free, 0 used. 512592 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 137 root 20 0 0 0 0 S 0.3 0.0 0:36.75 kworker/0:3 377 root 20 0 0 0 0 S 0.3 0.0 7:17.52 kdvb-ca-0:0 541 hts 20 0 915392 304216 8840 S 0.3 29.8 12:37.94 tvheadend 4613 root 20 0 44916 3804 3148 R 0.3 0.4 0:00.01 top 1 root 20 0 57028 6776 5224 S 0.0 0.7 0:00.81 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
VM #4
top - 20:54:23 up 25 days, 1:31, 1 user, load average: 0.11, 0.09, 0.09 Tasks: 90 total, 1 running, 89 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem : 1020488 total, 74744 free, 86772 used, 858972 buff/cache KiB Swap: 731132 total, 731132 free, 0 used. 777576 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 574 hts 20 0 718312 30400 8672 S 0.7 3.0 317:54.87 tvheadend 414 root 20 0 0 0 0 S 0.3 0.0 102:14.88 kdvb-ca-0:0 18768 root 20 0 0 0 0 S 0.3 0.0 0:00.13 kworker/u2:2 1 root 20 0 57020 6836 5284 S 0.0 0.7 0:06.67 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.30 kthreadd
VM #5
top - 20:49:09 up 41 days, 8:10, 1 user, load average: 0.11, 0.11, 0.13 Tasks: 78 total, 1 running, 77 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.4 us, 0.4 sy, 0.0 ni, 94.4 id, 4.9 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 506280 total, 499980 used, 6300 free, 41928 buffers KiB Swap: 731132 total, 14044 used, 717088 free. 352520 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 478 tvheade+ 20 0 873196 32900 1600 S 3.3 6.5 352:57.67 tvheadend 1 root 20 0 28296 2748 1876 S 0.0 0.5 0:25.99 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.54 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 78:07.46 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
RPi
top - 20:50:47 up 1 day, 1:36, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 114 total, 1 running, 113 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.0 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 933352 total, 832384 used, 100968 free, 93328 buffers KiB Swap: 102396 total, 0 used, 102396 free. 673768 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10218 pi 20 0 5240 2592 2148 R 1.3 0.3 0:00.16 top 6 root 20 0 0 0 0 S 0.3 0.0 0:00.68 kworker/u8:0 757 tvheade+ 20 0 317648 25748 6760 S 0.3 2.8 59:24.99 tvheadend 10205 root 20 0 0 0 0 S 0.3 0.0 0:00.04 kworker/0:2 1 root 20 0 5400 3916 2788 S 0.0 0.4 0:09.07 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.12 kthreadd
The uptime is an indication of the last crash on each VM.
The RPi's last reboot had nothing to do with TVheadend or a crash and can be ignored.
Files
History
Updated by Gert Jansen over 7 years ago
It just happened again:
[110148.675391] tvh:tcp-start invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0 [110148.675393] tvh:tcp-start cpuset=/ mems_allowed=0 [110148.675401] CPU: 0 PID: 1452 Comm: tvh:tcp-start Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3 [110148.675402] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 [110148.675405] 0000000000000000 ffffffff9a928574 ffffb8aa42173c38 ffff9bbb9f41a040 [110148.675407] ffffffff9a7fe050 0000000000000000 0000000000000000 0000000c024213ca [110148.675408] ffffffff9a7844e7 000000429c9ad000 0000000000000000 f5688d9f446e9a94 [110148.675409] Call Trace: [110148.675437] [<ffffffff9a928574>] ? dump_stack+0x5c/0x78 [110148.675444] [<ffffffff9a7fe050>] ? dump_header+0x78/0x1fd [110148.675450] [<ffffffff9a7844e7>] ? get_page_from_freelist+0x3f7/0xb40 [110148.675454] [<ffffffff9a78047a>] ? oom_kill_process+0x21a/0x3e0 [110148.675455] [<ffffffff9a7800fd>] ? oom_badness+0xed/0x170 [110148.675456] [<ffffffff9a780911>] ? out_of_memory+0x111/0x470 [110148.675457] [<ffffffff9a785b4f>] ? __alloc_pages_slowpath+0xb7f/0xbc0 [110148.675458] [<ffffffff9a785d8e>] ? __alloc_pages_nodemask+0x1fe/0x260 [110148.675463] [<ffffffff9a7d5ec1>] ? alloc_pages_current+0x91/0x140 [110148.675464] [<ffffffff9a77e940>] ? filemap_fault+0x300/0x5a0 [110148.675511] [<ffffffffc02ee791>] ? ext4_filemap_fault+0x31/0x50 [ext4] [110148.675514] [<ffffffff9a7aeef1>] ? __do_fault+0x81/0x170 [110148.675516] [<ffffffff9a7b3671>] ? handle_mm_fault+0xdd1/0x1350 [110148.675524] [<ffffffff9a65fd84>] ? __do_page_fault+0x2a4/0x510 [110148.675531] [<ffffffff9ac07658>] ? page_fault+0x28/0x30 [110148.675558] Mem-Info: [110148.675564] active_anon:55464 inactive_anon:55480 isolated_anon:0 active_file:153 inactive_file:189 isolated_file:0 unevictable:0 dirty:0 writeback:2 unstable:9 slab_reclaimable:2203 slab_unreclaimable:3226 mapped:160 shmem:258 pagetables:1099 bounce:0 free:1121 free_pcp:169 free_cma:0 [110148.675567] Node 0 active_anon:221856kB inactive_anon:221920kB active_file:612kB inactive_file:756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:640kB dirty:0kB writeback:8kB shmem:1032kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:36kB pages_scanned:2993 all_unreclaimable? yes [110148.675568] Node 0 DMA free:1908kB min:88kB low:108kB high:128kB active_anon:7740kB inactive_anon:5376kB active_file:0kB inactive_file:12kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:20kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:80kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [110148.675570] lowmem_reserve[]: 0 455 455 455 455 [110148.675572] Node 0 DMA32 free:2576kB min:2684kB low:3352kB high:4020kB active_anon:214116kB inactive_anon:216544kB active_file:612kB inactive_file:744kB unevictable:0kB writepending:8kB present:507840kB managed:488500kB mlocked:0kB slab_reclaimable:8792kB slab_unreclaimable:12896kB kernel_stack:1680kB pagetables:4316kB bounce:0kB free_pcp:676kB local_pcp:676kB free_cma:0kB [110148.675574] lowmem_reserve[]: 0 0 0 0 0 [110148.675575] Node 0 DMA: 3*4kB (U) 41*8kB (UM) 30*16kB (UME) 6*32kB (UME) 6*64kB (UME) 2*128kB (ME) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1908kB [110148.675580] Node 0 DMA32: 644*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2576kB [110148.675583] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [110148.675584] 1863 total pagecache pages [110148.675584] 1254 pages in swap cache [110148.675585] Swap cache stats: add 225647, delete 224393, find 21067/37197 [110148.675585] Free swap = 0kB [110148.675585] Total swap = 731132kB [110148.675586] 130958 pages RAM [110148.675586] 0 pages HighMem/MovableOnly [110148.675586] 4856 pages reserved [110148.675586] 0 pages hwpoisoned [110148.675587] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [110148.675590] [ 201] 0 201 33443 84 58 3 261 0 vmtoolsd [110148.675591] [ 204] 0 204 13731 68 22 3 84 0 systemd-journal [110148.675592] [ 240] 0 240 11973 11 25 3 757 -1000 systemd-udevd [110148.675593] [ 268] 0 268 12468 15 29 3 114 0 rpcbind [110148.675594] [ 317] 0 317 38372 1 65 3 866 0 VGAuthService [110148.675595] [ 319] 0 319 7409 18 19 3 44 0 cron [110148.675596] [ 320] 0 320 1054 0 8 3 37 0 acpid [110148.675597] [ 321] 106 321 11283 47 26 3 71 -900 dbus-daemon [110148.675598] [ 328] 0 328 11630 33 27 3 111 0 systemd-logind [110148.675598] [ 329] 0 329 62529 43 29 4 186 0 rsyslogd [110148.675599] [ 458] 0 458 17486 0 38 3 198 -1000 sshd [110148.675600] [ 475] 108 475 8858 0 22 3 201 0 rpc.statd [110148.675601] [ 500] 0 500 3634 0 12 3 38 0 agetty [110148.675602] [ 512] 107 512 442457 108792 667 4 178529 0 tvheadend [110148.675603] [ 514] 105 514 24463 65 22 4 99 0 ntpd [110148.675605] Out of memory: Kill process 512 (tvheadend) score 932 or sacrifice child [110148.675687] Killed process 512 (tvheadend) total-vm:1769828kB, anon-rss:435168kB, file-rss:0kB, shmem-rss:0kB [110148.706219] oom_reaper: reaped process 512 (tvheadend), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Is there anything I can do to gather more informtation? Any log file or error message you are interested in? Anything else I can do, or should not do?
The easiest thing I could is probably reinstall everything with Debian 8. That would maybe solve my problems, but if this is a bug then that won't help anyone.
Updated by Gert Jansen over 7 years ago
I am prepared to assist as much as I can with testing, log files etc...
However, I cannot afford to just sit and wait and have my TVHeadend crashing while no progress is being made.
If there is no time or no interest to invest time in this issue, please let me know and I'll revert back to Debian 8 and see how it goes from there.
Updated by Mark Clarkstone over 7 years ago
Gert Jansen wrote:
I am prepared to assist as much as I can with testing, log files etc...
However, I cannot afford to just sit and wait and have my TVHeadend crashing while no progress is being made.
I completely understand, if possible could you collect debugging information? See Debugging & Traces for info.
If there is no time or no interest to invest time in this issue, please let me know and I'll revert back to Debian 8 and see how it goes from there.
It's not that there is no interest, it's lack of man power, there is currently mostly only one person working on tvheadend, with a few others helping where possible.
Out of interest, have you tried the stable-4.2 release and/or the latest 4.3.488? I personally have been running 4.3.x for a while with very few issues on Debian 9 (but not in a VM). If you're able keep a snap shot of the failing VM & go back to a working one for now.
Updated by Gert Jansen over 7 years ago
Mark Clarkstone wrote:
Out of interest, have you tried the stable-4.2 release and/or the latest 4.3.488? I personally have been running 4.3.x for a while with very few issues on Debian 9 (but not in a VM). If you're able keep a snap shot of the failing VM & go back to a working one for now.
No, I haven't tried 4.3.488 yet.
As you can see, I have lots of different versions running at the moment, and after a while I gave up on the upgrading since that was not providing a solution.
For now, I have concentrated on VM#1.
I have stopped the TVH service, copied the tvheadend directory to my user profiles, and started tvheadend manually from there.
I have used the ulimit -c unlimited command, and started TVH with the ./build.linux/tvheadend -l /tmp/tvheadend.log --trace htsp,subscription,pat,pmt,bat command.
However, I got a message that pat, pmt, bat were unknown subsystem options (or something similar)
I get the same message when I try to add them to one of the other TVH machines.
2017-09-23 11:53:49.088 config: uknown subsystem 'pat' 2017-09-23 11:53:49.088 config: uknown subsystem 'pmt' 2017-09-23 11:53:49.088 config: uknown subsystem 'bat'
One thing I noticed is that TVH seems to be using a lot more memory than it did before.
Top show a %Mem of 33.6 for VM#1 (after roughly 2 hours), while that is only 6.5 on the VM#5 (the Debian 8 machine) and 2.8 on the RPi.
It was as high as 87.8 before I stopped it to enable the tracing/debugging things...
Updated by Gert Jansen over 7 years ago
Gert Jansen wrote:
No, I haven't tried 4.3.488 yet.
I can try 4.3.488 or the then-current (latest) version if/when this one crashes and we have some useful logs and trace files
Updated by Jaroslav Kysela over 7 years ago
Could you look to 'Configuration / Debugging / Memory information entries' and compare values after process start and after awhile ? (VM#1) - install debug version of tvh, if you don't have such tab in the Debugging.
Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).
Updated by Gert Jansen over 7 years ago
- File tvh_01.PNG tvh_01.PNG added
Jaroslav Kysela wrote:
Could you look to 'Configuration / Debugging / Memory information entries' and compare values after process start and after awhile ? (VM#1) - install debug version of tvh, if you don't have such tab in the Debugging.
Do you mean like the attached file? (tvh_01.png)
This is the current situation, about 3 hours after I started TVH manually.
Since then, no recordings have been made.
Is there an easier way to get all the values apart from making a screenshot?
Shall I stop and restart TVH and create a new overview just after the start, and a new after half an hour?
Jaroslav Kysela wrote:
Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).
I only use it to record programs. No streaming, no live watching.
Updated by Jaroslav Kysela over 7 years ago
Gert Jansen wrote:
Jaroslav Kysela wrote:
Could you look to 'Configuration / Debugging / Memory information entries' and compare values after process start and after awhile ? (VM#1) - install debug version of tvh, if you don't have such tab in the Debugging.
Do you mean like the attached file? (tvh_01.png)
This is the current situation, about 3 hours after I started TVH manually.
Since then, no recordings have been made.Is there an easier way to get all the values apart from making a screenshot?
wget -O a.txt http://localhost:9981/api/memoryinfo/grid
Shall I stop and restart TVH and create a new overview just after the start, and a new after half an hour?
Yes, probably.
Jaroslav Kysela wrote:
Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).
I only use it to record programs. No streaming, no live watching.
OK. We had another report and the culprit was the slow disk I/O. TVH tries to buffer everything to memory in this case (I have to probably implement a buffer threshold to drop data to avoid such situations).
Updated by Gert Jansen over 7 years ago
Jaroslav Kysela wrote:
Gert Jansen wrote:
Shall I stop and restart TVH and create a new overview just after the start, and a new after half an hour?
Yes, probably.
OK, will do so in about 30 minutes.
Jaroslav Kysela wrote:
Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).
I only use it to record programs. No streaming, no live watching.
OK. We had another report and the culprit was the slow disk I/O. TVH tries to buffer everything to memory in this case (I have to probably implement a buffer threshold to drop data to avoid such situations).
The disk the VMs record to is on the same ESXi host, so it is 'local' (no actual network connection needed) for all the VMs (not for the RPi).
I have a cronjob which moves the recordings from that disk to my NAS during the night.
Updated by Gert Jansen over 7 years ago
- File tvh_01_start.PNG tvh_01_start.PNG added
The file attached is from a few seconds after the start of TVH (tvh_01_start.PNG)
I got a 401 Unauthorized (Username/Password Authentication failed) when trying the wget command.
Updated by Gert Jansen over 7 years ago
- File tvh_01_h01.PNG tvh_01_h01.PNG added
And this is about an hour later
Updated by Jaroslav Kysela over 7 years ago
The contents is identical. RSS/RES (memory used by the tvh process) increases?
Updated by Gert Jansen over 7 years ago
Jaroslav Kysela wrote:
The contents is identical. RSS/RES (memory used by the tvh process) increases?
And it is still exactly the same at this moment.
The memory usage had increased, but just a bit.
I only know the %MEM, sorry. That was 31.3 when it started, and now it is:
top - 20:26:47 up 2 days, 2:15, 2 users, load average: 0.00, 0.00, 0.00 Threads: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.7 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 504408 total, 20068 free, 268068 used, 216272 buff/cache KiB Swap: 731132 total, 715156 free, 15976 used. 218364 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3169 gert 20 0 844692 227284 11332 S 0.3 45.1 0:11.88 tvheadend
Updated by Gert Jansen over 7 years ago
I seem not to be able to edit my post once it is submitted?
Anyhow, I have 2 recordings coming up soon. Both on the same channel.
I'll see what happens there.
Updated by Jaroslav Kysela over 7 years ago
The 'RES' column in top is RSS. You can obtain RSS also through ps like:
ps -aly | grep -E "(RSS|tvheadend)"
If tvh process is using over 200MB of memory, but the memory entries shows than only few MB are allocated, then something is really wrong and it might not be easy to determine what's going on in the tvh process.
Updated by Gert Jansen over 7 years ago
VM#1 - currently recording
ps -aly | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 1000 3169 576 2 80 0 236316 211173 - pts/0 00:07:16 tvheadend
VM#2 - currently recording
ps -aly -u hts | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 107 540 1 0 80 0 168620 213580 - ? 17:14:42 tvheadend
VM#3 - currently idle
ps -aly -u hts | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 107 553 1 0 80 0 380296 245332 - ? 00:45:43 tvheadend
VM#4 - currently idle
ps -aly -u hts | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 107 561 1 0 80 0 29172 179061 - ? 01:02:23 tvheadend
VM#5 - currently recording
ps -aly -u tvheadend | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 1001 478 1 0 80 0 31676 222397 - ? 06:39:01 tvheadend
RPi - currently idle
ps -aly -u tvheadend | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 1001 757 1 3 80 0 25524 79940 - ? 04:47:00 tvheadend
Updated by Jaroslav Kysela over 7 years ago
What is in 'Memory information entries' for VM#1 now ? (Don't forget to refresh the contents if you have already open the page in browser - the auto-refresh does not work).
Updated by Gert Jansen over 7 years ago
- File tvh_01_h07.PNG tvh_01_h07.PNG added
See attachment.
The first recording is finished, the second one is still busy.
ps -aly | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 1000 3169 576 3 80 0 249208 228830 - pts/0 00:10:31 tvheadend
Updated by Gert Jansen over 7 years ago
- File tvh_01_h27.PNG tvh_01_h27.PNG added
Not much happened in the last 24 hours. VM#1 still has not crashed (yet)
ps -aly | grep -E "(RSS|tvheadend)" S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD S 1000 3169 576 1 80 0 413048 261598 - pts/0 00:28:07 tvheadend
top - 22:28:41 up 3 days, 4:17, 3 users, load average: 0.18, 0.23, 0.25 Threads: 118 total, 2 running, 116 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.5 us, 2.5 sy, 0.0 ni, 86.3 id, 8.5 wa, 0.0 hi, 0.4 si, 0.0 st KiB Mem : 504408 total, 6536 free, 471104 used, 26768 buff/cache KiB Swap: 731132 total, 699708 free, 31424 used. 17824 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4252 gert 20 0 1046392 415708 0 R 4.3 82.4 6:32.06 tvh:lnxdvb-+ 3208 gert 20 0 1046392 415708 0 S 1.7 82.4 3:40.56 tvh:mi-main 4297 root 20 0 0 0 0 S 1.0 0.0 0:05.40 kworker/0:3 4283 gert 20 0 1046392 415708 0 S 0.7 82.4 0:08.08 tvh:dvr 138 root 0 -20 0 0 0 S 0.3 0.0 1:22.86 kworker/0:1H
Debugging - Memory Infornation Entries attached.
Updated by Jaroslav Kysela over 7 years ago
Which streaming profile do you use for DVR? (matroska? pass?)
Updated by Gert Jansen over 7 years ago
Jaroslav Kysela wrote:
Which streaming profile do you use for DVR? (matroska? pass?)
I use pass for stream profile. I have set cache scheme to "Don't Keep" and adapted the pre- and post-recording padding times. Apart from that I think most values in my recording profile are still at the default values.
Updated by Gert Jansen over 7 years ago
Eventually, it (VM #1) did crash.
The last few messages in the session where I started TVHeadend from:
2017-09-25 23:38:54.034 [WARNING] TS: LNB2 = Diseqc B = Astra 3/12187H/Veronica/DisneyXD HD: H264 @ #517 Continuity counter error (total 2) 2017-09-25 23:42:45.361 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:42:55.373 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:43:05.360 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:43:15.359 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:43:25.383 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:43:35.377 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:43:45.374 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:43:55.389 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:44:05.398 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:44:15.404 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:44:25.405 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:44:35.409 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:44:45.419 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:44:55.461 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new 2017-09-25 23:45:05.557 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new Killed
At 23:38 it was recording a movie (start time 20:21:30, scheduled end time: 23:42:00). There was no other recording going on on that VM at the same time.
On VM#5, there was a recording from 22:41:30 until 23:52 and another one from 23:26:30 until 00:42:00. Both are absolutely fine. No errors, no warnings. Just perfect.
All other VMs and the RPi were idle.
I have not yet restarted the VM.
Is there any information I need to get from it before I can safely restart it?
Where can I find the logs/traces that are interesting?
Updated by Jaroslav Kysela over 7 years ago
You may try valgrind https://tvheadend.org/projects/tvheadend/wiki/Debugging#Memory-leaks-or-corruption to detect the memory leak. Note that it's slow, so I suggest to do a test recording (5 minutes or so) and then terminate tvh. Run tvh binary from command line (not in background).
Updated by Gert Jansen over 7 years ago
May try that later today. Can't do that while working, sadly.
I may try to sneak it in during a break later today.
Is there anything useful like logfiles or traces that I have to save before I reboot the VM or restart TVH ?
Updated by Gert Jansen over 7 years ago
- File tvh.log tvh.log added
- File valgrind_tvh.log valgrind_tvh.log added
- tvh.log = the last 250 lines of the log/trace/debugfile that ended with yesterday morning's crash
- valgrind_tvh.log = the file that was generated while doing the 5 minute recording
If needed, I can upload the full logfile instead of just the last 250 lines, but it's rather big: 61MB
Updated by Jaroslav Kysela over 7 years ago
The valgrind does not show any memory leaks which can explain the behaviour. Could you observe, when RES/RSS is being increased exactly? Try to split activities like DVR, HTTP access, EPG grab etc...
Updated by Gert Jansen about 7 years ago
Despite all the time that went on, I have been unable to exactly pinpoint why or when it goes wrong.
I have made a few changes in the mean time:- added a second ESXi server
- split the TVHeadend clients across the 2 hosts: each host has a client with a DVBSky and a Terratec adapter. The TBS adapter stayed where it was
- upgraded all TVHeadend clients to newer versions (except the TBS one)
- reinstalled a VM from scratch for one of the DVBSky clients.
The crashes seem to occur less frequently but have not gone away.
As far as I can tell, they only happen on the VMs with DVBSky adapters.
I have no idea if the problem is caused by- something in TVHeadend
- Debian 9 - or more precisely: something in kernel 4.9.x
- the drivers of the DVBSky adapters.
Previously, the DVBSky VMs were running Debian 8 (Kernel 3.16.x) and drivers compiled from the sources provided by DVBSky.
Nowadays the DVBSky is included in the kernel (since 3.18.x IIRC), and doesn't need a driver anymore.
Yesterday I have created 2 new VMs with Debian 8, and again drivers compiled from the sources provided by DVBSky.
I have built the latest TVHeadend (4.3-861~g715a4a8) for both of them.
I'll see how it goes from here.