Project

General

Profile

Bug #4607

TVHeadend crash (out of memory)

Added by Gert Jansen about 7 years ago. Updated almost 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Crashes
Target version:
-
Start date:
2017-09-18
Due date:
% Done:

0%

Estimated time:
Found in version:
4.3-330~g1c6c91508
Affected Versions:

Description

I have a problem with some of my TVHeadend VMs. They regularly crash with an out-of-memory message.

I have 1 ESXi host with the following config:
1. Linux Debian 9 + DVBSky S960 + TVheadend 4.3-330~g1c6c91508
2. Linux Debian 9 + DVBSky S960 + TVheadend 4.3-295~ga9dffb7e9
3. Linux Debian 9 + Terratec H7 + TVHeadend 4.3-322~g096079129
4. Linux Debian 9 + Terratec H7 + TVheadend 4.3-358~g99d191a46
5. Linux Debian 8 + TBS Qbox 5980 + TVHeadend 4.1-2449~gda58349
+ a few other VMs that have nothing to do with TVHeadend

Furthermore, I also have a RPi + DVBSky S960 + Tvheadend 4.3-117~gddf420b

This config has been in use for a few years, and never caused any real issues.
My problems started a few days after I upgraded from Debian 8 to Debian 9.

I was unable to get the drivers of the TBS working, so that VM (#5) stayed on Debian 8. It has not been affected by this problem.

After the upgrade to Debian 9, I recompiled the drivers for the DVB hardware and TVHeadend itself. All VMs (except #5) were at 4.3-295 then.
3 of the 4 other VMs have crashed multiple times since the upgrade to Debian 9. I don't think VM #2 ever crashed, even though it has the same hardware and config as #1.

After the crash I upgraded and recompiled TVheadend, hence all the different versions I have now.

#1 seems to be most affected. It crashed at least 3 times yesterday.

This is a part from the messages log from yesterday

Sep 17 09:38:03 caladan kernel: [28719.338548] ntpd invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
Sep 17 09:38:03 caladan kernel: [28719.338549] ntpd cpuset=/ mems_allowed=0
Sep 17 09:38:03 caladan kernel: [28719.338559] CPU: 0 PID: 514 Comm: ntpd Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
Sep 17 09:38:03 caladan kernel: [28719.338560] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
Sep 17 09:38:03 caladan kernel: [28719.338561]  0000000000000000 ffffffff8b928574 ffff9e66c0bdfc38 ffff8bffda5d3080
Sep 17 09:38:03 caladan kernel: [28719.338563]  ffffffff8b7fe050 0000000000000000 0000000000000000 0000000c024213ca
Sep 17 09:38:03 caladan kernel: [28719.338564]  ffffffff8b7844e7 00000042df117040 0000000000000000 7e26a2f7f02060be
Sep 17 09:38:03 caladan kernel: [28719.338565] Call Trace:
Sep 17 09:38:03 caladan kernel: [28719.338593]  [<ffffffff8b928574>] ? dump_stack+0x5c/0x78
Sep 17 09:38:03 caladan kernel: [28719.338600]  [<ffffffff8b7fe050>] ? dump_header+0x78/0x1fd
Sep 17 09:38:03 caladan kernel: [28719.338606]  [<ffffffff8b7844e7>] ? get_page_from_freelist+0x3f7/0xb40
Sep 17 09:38:03 caladan kernel: [28719.338610]  [<ffffffff8b78047a>] ? oom_kill_process+0x21a/0x3e0
Sep 17 09:38:03 caladan kernel: [28719.338611]  [<ffffffff8b7800fd>] ? oom_badness+0xed/0x170
Sep 17 09:38:03 caladan kernel: [28719.338612]  [<ffffffff8b780911>] ? out_of_memory+0x111/0x470
Sep 17 09:38:03 caladan kernel: [28719.338613]  [<ffffffff8b785b4f>] ? __alloc_pages_slowpath+0xb7f/0xbc0
Sep 17 09:38:03 caladan kernel: [28719.338614]  [<ffffffff8b785d8e>] ? __alloc_pages_nodemask+0x1fe/0x260
Sep 17 09:38:03 caladan kernel: [28719.338619]  [<ffffffff8b7d5ec1>] ? alloc_pages_current+0x91/0x140
Sep 17 09:38:03 caladan kernel: [28719.338621]  [<ffffffff8b77e940>] ? filemap_fault+0x300/0x5a0
Sep 17 09:38:03 caladan kernel: [28719.338635]  [<ffffffffc05f9791>] ? ext4_filemap_fault+0x31/0x50 [ext4]
Sep 17 09:38:03 caladan kernel: [28719.338638]  [<ffffffff8b7aeef1>] ? __do_fault+0x81/0x170
Sep 17 09:38:03 caladan kernel: [28719.338640]  [<ffffffff8b7b3671>] ? handle_mm_fault+0xdd1/0x1350
Sep 17 09:38:03 caladan kernel: [28719.338647]  [<ffffffff8b65fd84>] ? __do_page_fault+0x2a4/0x510
Sep 17 09:38:03 caladan kernel: [28719.338656]  [<ffffffff8bc07658>] ? page_fault+0x28/0x30
Sep 17 09:38:03 caladan kernel: [28719.338657] Mem-Info:
Sep 17 09:38:03 caladan kernel: [28719.338661] active_anon:55502 inactive_anon:55559 isolated_anon:0
Sep 17 09:38:03 caladan kernel: [28719.338661]  active_file:203 inactive_file:227 isolated_file:0
Sep 17 09:38:03 caladan kernel: [28719.338661]  unevictable:0 dirty:0 writeback:0 unstable:2
Sep 17 09:38:03 caladan kernel: [28719.338661]  slab_reclaimable:2182 slab_unreclaimable:3214
Sep 17 09:38:03 caladan kernel: [28719.338661]  mapped:265 shmem:75 pagetables:1076 bounce:0
Sep 17 09:38:03 caladan kernel: [28719.338661]  free:1116 free_pcp:70 free_cma:0
Sep 17 09:38:03 caladan kernel: [28719.338664] Node 0 active_anon:222008kB inactive_anon:222236kB active_file:812kB inactive_file:908kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1060kB dirty:0kB writeback:0kB shmem:300kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:8kB pages_scanned:13301 all_unreclaimable? yes
Sep 17 09:38:03 caladan kernel: [28719.338665] Node 0 DMA free:1876kB min:88kB low:108kB high:128kB active_anon:6328kB inactive_anon:7100kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:24kB slab_unreclaimable:12kB kernel_stack:16kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 17 09:38:03 caladan kernel: [28719.338667] lowmem_reserve[]: 0 455 455 455 455
Sep 17 09:38:03 caladan kernel: [28719.338669] Node 0 DMA32 free:2588kB min:2684kB low:3352kB high:4020kB active_anon:215680kB inactive_anon:215136kB active_file:812kB inactive_file:908kB unevictable:0kB writepending:0kB present:507840kB managed:488500kB mlocked:0kB slab_reclaimable:8704kB slab_unreclaimable:12844kB kernel_stack:1632kB pagetables:4220kB bounce:0kB free_pcp:280kB local_pcp:280kB free_cma:0kB
Sep 17 09:38:03 caladan kernel: [28719.338671] lowmem_reserve[]: 0 0 0 0 0
Sep 17 09:38:03 caladan kernel: [28719.338672] Node 0 DMA: 27*4kB (UME) 17*8kB (UME) 16*16kB (U) 3*32kB (UME) 12*64kB (UME) 4*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1876kB
Sep 17 09:38:03 caladan kernel: [28719.338687] Node 0 DMA32: 647*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2588kB
Sep 17 09:38:03 caladan kernel: [28719.338690] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 17 09:38:03 caladan kernel: [28719.338691] 1113 total pagecache pages
Sep 17 09:38:03 caladan kernel: [28719.338691] 608 pages in swap cache
Sep 17 09:38:03 caladan kernel: [28719.338692] Swap cache stats: add 198871, delete 198263, find 7938/13916
Sep 17 09:38:03 caladan kernel: [28719.338692] Free swap  = 0kB
Sep 17 09:38:03 caladan kernel: [28719.338693] Total swap = 731132kB
Sep 17 09:38:03 caladan kernel: [28719.338693] 130958 pages RAM
Sep 17 09:38:03 caladan kernel: [28719.338693] 0 pages HighMem/MovableOnly
Sep 17 09:38:03 caladan kernel: [28719.338693] 4856 pages reserved
Sep 17 09:38:03 caladan kernel: [28719.338694] 0 pages hwpoisoned
Sep 17 09:38:03 caladan kernel: [28719.338694] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Sep 17 09:38:03 caladan kernel: [28719.338697] [  196]     0   196    13890      114      24       3       92             0 systemd-journal
Sep 17 09:38:03 caladan kernel: [28719.338698] [  199]     0   199    33443       84      56       3      261             0 vmtoolsd
Sep 17 09:38:03 caladan kernel: [28719.338699] [  240]     0   240    11669       13      22       3      469         -1000 systemd-udevd
Sep 17 09:38:03 caladan kernel: [28719.338699] [  255]     0   255    12468        0      28       3      115             0 rpcbind
Sep 17 09:38:03 caladan kernel: [28719.338700] [  311]     0   311    38372        1      65       3      865             0 VGAuthService
Sep 17 09:38:03 caladan kernel: [28719.338701] [  312]     0   312     1054        0       8       3       36             0 acpid
Sep 17 09:38:03 caladan kernel: [28719.338702] [  313]     0   313    62529       47      30       3      160             0 rsyslogd
Sep 17 09:38:03 caladan kernel: [28719.338703] [  314]   106   314    11283       23      26       3       93          -900 dbus-daemon
Sep 17 09:38:03 caladan kernel: [28719.338704] [  319]     0   319     7409       17      18       3       45             0 cron
Sep 17 09:38:03 caladan kernel: [28719.338705] [  321]     0   321     9495       30      22       3      100             0 systemd-logind
Sep 17 09:38:03 caladan kernel: [28719.338706] [  456]     0   456    17486        0      37       3      194         -1000 sshd
Sep 17 09:38:03 caladan kernel: [28719.338707] [  475]   108   475     8858        0      22       3      202             0 rpc.statd
Sep 17 09:38:03 caladan kernel: [28719.338708] [  502]     0   502     3634        0      12       3       40             0 agetty
Sep 17 09:38:03 caladan kernel: [28719.338709] [  512]   107   512   444318   109652     654       5   178829             0 tvheadend
Sep 17 09:38:03 caladan kernel: [28719.338709] [  514]   105   514    24463       33      21       3       99             0 ntpd
Sep 17 09:38:03 caladan kernel: [28719.373963] oom_reaper: reaped process 512 (tvheadend), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

A few more things I have tried beside upgrading/recompiling TVH:
  • switch the DVB-hardware between #1 and #2
  • switch the DVB-hardware between #1 and the RPi
  • move everything (VMs + DVB-hardware) to a new ESXi-host
  • memory test on the old ESXi-host => no problems found
  • harddrive test on the old ESXi-host => no problems found

VM#1 and VM#2 have been assigned 512MB of RAM, VM#3 and VM#4 have been assigned 1GB of RAM.
While 512MB may be a bit marginal (is it?), the problem also occurs on VM#3 and #4 (but not on #2 - as of today)

The ESXi-host is not oversubscribed, it has 64GB of RAM

I am not sure if this is a problem with TVHeadend itself.
As far as I'm concerned it can also be something to do with the Kernel. Debian 9 uses 4.9, Debian 8 is still at 3.16

Over at the LibreElec forum there is a topic about issues with DVB-adapters and Kernel 4.9
This one: https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/

These are the current values of the top command:

VM #1

top - 20:46:25 up  3:28,  1 user,  load average: 0.11, 0.11, 0.09
Tasks:  76 total,   2 running,  74 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  2.1 sy,  0.0 ni, 93.1 id,  3.8 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :   504408 total,     6748 free,   466604 used,    31056 buff/cache
KiB Swap:   731132 total,   712908 free,    18224 used.    23576 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  512 hts       20   0 1048324 412448    768 S  4.7 81.8   5:23.21 tvheadend
  931 root      20   0       0      0      0 S  0.3  0.0   0:01.37 kworker/0:1
  934 root      20   0   44916   3396   2768 R  0.3  0.7   0:00.07 top
    1 root      20   0   57020   1688    816 S  0.0  0.3   0:00.44 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd

VM #2

top - 20:47:18 up 73 days,  3:09,  1 user,  load average: 0.00, 0.00, 0.00
Threads: 115 total,   1 running, 114 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   504408 total,     6692 free,   231556 used,   266160 buff/cache
KiB Swap:   731132 total,   687656 free,    43476 used.   254768 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  540 hts       20   0  854320 169684      0 S  0.3 33.6 185:53.61 tvheadend
    1 root      20   0   57148   4544   2932 S  0.0  0.9   0:21.13 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:01.06 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:31.59 ksoftirqd/0
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H

VM#3

top - 20:52:35 up 1 day, 19:09,  1 user,  load average: 0.02, 0.07, 0.08
Tasks:  98 total,   1 running,  97 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1020488 total,   313844 free,   355928 used,   350716 buff/cache
KiB Swap:   731132 total,   731132 free,        0 used.   512592 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  137 root      20   0       0      0      0 S  0.3  0.0   0:36.75 kworker/0:3
  377 root      20   0       0      0      0 S  0.3  0.0   7:17.52 kdvb-ca-0:0
  541 hts       20   0  915392 304216   8840 S  0.3 29.8  12:37.94 tvheadend
 4613 root      20   0   44916   3804   3148 R  0.3  0.4   0:00.01 top
    1 root      20   0   57028   6776   5224 S  0.0  0.7   0:00.81 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.02 kthreadd

VM #4

top - 20:54:23 up 25 days,  1:31,  1 user,  load average: 0.11, 0.09, 0.09
Tasks:  90 total,   1 running,  89 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :  1020488 total,    74744 free,    86772 used,   858972 buff/cache
KiB Swap:   731132 total,   731132 free,        0 used.   777576 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  574 hts       20   0  718312  30400   8672 S  0.7  3.0 317:54.87 tvheadend
  414 root      20   0       0      0      0 S  0.3  0.0 102:14.88 kdvb-ca-0:0
18768 root      20   0       0      0      0 S  0.3  0.0   0:00.13 kworker/u2:2
    1 root      20   0   57020   6836   5284 S  0.0  0.7   0:06.67 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.30 kthreadd

VM #5

top - 20:49:09 up 41 days,  8:10,  1 user,  load average: 0.11, 0.11, 0.13
Tasks:  78 total,   1 running,  77 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.4 sy,  0.0 ni, 94.4 id,  4.9 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    506280 total,   499980 used,     6300 free,    41928 buffers
KiB Swap:   731132 total,    14044 used,   717088 free.   352520 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  478 tvheade+  20   0  873196  32900   1600 S  3.3  6.5 352:57.67 tvheadend
    1 root      20   0   28296   2748   1876 S  0.0  0.5   0:25.99 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.54 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0  78:07.46 ksoftirqd/0
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H

RPi

top - 20:50:47 up 1 day,  1:36,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 114 total,   1 running, 113 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    933352 total,   832384 used,   100968 free,    93328 buffers
KiB Swap:   102396 total,        0 used,   102396 free.   673768 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10218 pi        20   0    5240   2592   2148 R   1.3  0.3   0:00.16 top
    6 root      20   0       0      0      0 S   0.3  0.0   0:00.68 kworker/u8:0
  757 tvheade+  20   0  317648  25748   6760 S   0.3  2.8  59:24.99 tvheadend
10205 root      20   0       0      0      0 S   0.3  0.0   0:00.04 kworker/0:2
    1 root      20   0    5400   3916   2788 S   0.0  0.4   0:09.07 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.12 kthreadd

The uptime is an indication of the last crash on each VM.
The RPi's last reboot had nothing to do with TVheadend or a crash and can be ignored.


Files

tvh_01.PNG (24.9 KB) tvh_01.PNG Gert Jansen, 2017-09-23 14:57
tvh_01_start.PNG (12.5 KB) tvh_01_start.PNG Gert Jansen, 2017-09-23 16:53
tvh_01_h01.PNG (12.4 KB) tvh_01_h01.PNG Gert Jansen, 2017-09-23 18:21
tvh_01_h07.PNG (13.1 KB) tvh_01_h07.PNG Gert Jansen, 2017-09-23 22:25
tvh_01_h27.PNG (12.5 KB) tvh_01_h27.PNG Gert Jansen, 2017-09-24 22:28
tvh.log (31.1 KB) tvh.log Gert Jansen, 2017-09-27 11:36
valgrind_tvh.log (30.7 KB) valgrind_tvh.log Gert Jansen, 2017-09-27 11:36

History

#1

Updated by Gert Jansen about 7 years ago

It just happened again:

[110148.675391] tvh:tcp-start invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
[110148.675393] tvh:tcp-start cpuset=/ mems_allowed=0
[110148.675401] CPU: 0 PID: 1452 Comm: tvh:tcp-start Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[110148.675402] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[110148.675405]  0000000000000000 ffffffff9a928574 ffffb8aa42173c38 ffff9bbb9f41a040
[110148.675407]  ffffffff9a7fe050 0000000000000000 0000000000000000 0000000c024213ca
[110148.675408]  ffffffff9a7844e7 000000429c9ad000 0000000000000000 f5688d9f446e9a94
[110148.675409] Call Trace:
[110148.675437]  [<ffffffff9a928574>] ? dump_stack+0x5c/0x78
[110148.675444]  [<ffffffff9a7fe050>] ? dump_header+0x78/0x1fd
[110148.675450]  [<ffffffff9a7844e7>] ? get_page_from_freelist+0x3f7/0xb40
[110148.675454]  [<ffffffff9a78047a>] ? oom_kill_process+0x21a/0x3e0
[110148.675455]  [<ffffffff9a7800fd>] ? oom_badness+0xed/0x170
[110148.675456]  [<ffffffff9a780911>] ? out_of_memory+0x111/0x470
[110148.675457]  [<ffffffff9a785b4f>] ? __alloc_pages_slowpath+0xb7f/0xbc0
[110148.675458]  [<ffffffff9a785d8e>] ? __alloc_pages_nodemask+0x1fe/0x260
[110148.675463]  [<ffffffff9a7d5ec1>] ? alloc_pages_current+0x91/0x140
[110148.675464]  [<ffffffff9a77e940>] ? filemap_fault+0x300/0x5a0
[110148.675511]  [<ffffffffc02ee791>] ? ext4_filemap_fault+0x31/0x50 [ext4]
[110148.675514]  [<ffffffff9a7aeef1>] ? __do_fault+0x81/0x170
[110148.675516]  [<ffffffff9a7b3671>] ? handle_mm_fault+0xdd1/0x1350
[110148.675524]  [<ffffffff9a65fd84>] ? __do_page_fault+0x2a4/0x510
[110148.675531]  [<ffffffff9ac07658>] ? page_fault+0x28/0x30
[110148.675558] Mem-Info:
[110148.675564] active_anon:55464 inactive_anon:55480 isolated_anon:0
                 active_file:153 inactive_file:189 isolated_file:0
                 unevictable:0 dirty:0 writeback:2 unstable:9
                 slab_reclaimable:2203 slab_unreclaimable:3226
                 mapped:160 shmem:258 pagetables:1099 bounce:0
                 free:1121 free_pcp:169 free_cma:0
[110148.675567] Node 0 active_anon:221856kB inactive_anon:221920kB active_file:612kB inactive_file:756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:640kB dirty:0kB writeback:8kB shmem:1032kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:36kB pages_scanned:2993 all_unreclaimable? yes
[110148.675568] Node 0 DMA free:1908kB min:88kB low:108kB high:128kB active_anon:7740kB inactive_anon:5376kB active_file:0kB inactive_file:12kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:20kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:80kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[110148.675570] lowmem_reserve[]: 0 455 455 455 455
[110148.675572] Node 0 DMA32 free:2576kB min:2684kB low:3352kB high:4020kB active_anon:214116kB inactive_anon:216544kB active_file:612kB inactive_file:744kB unevictable:0kB writepending:8kB present:507840kB managed:488500kB mlocked:0kB slab_reclaimable:8792kB slab_unreclaimable:12896kB kernel_stack:1680kB pagetables:4316kB bounce:0kB free_pcp:676kB local_pcp:676kB free_cma:0kB
[110148.675574] lowmem_reserve[]: 0 0 0 0 0
[110148.675575] Node 0 DMA: 3*4kB (U) 41*8kB (UM) 30*16kB (UME) 6*32kB (UME) 6*64kB (UME) 2*128kB (ME) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1908kB
[110148.675580] Node 0 DMA32: 644*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2576kB
[110148.675583] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[110148.675584] 1863 total pagecache pages
[110148.675584] 1254 pages in swap cache
[110148.675585] Swap cache stats: add 225647, delete 224393, find 21067/37197
[110148.675585] Free swap  = 0kB
[110148.675585] Total swap = 731132kB
[110148.675586] 130958 pages RAM
[110148.675586] 0 pages HighMem/MovableOnly
[110148.675586] 4856 pages reserved
[110148.675586] 0 pages hwpoisoned
[110148.675587] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[110148.675590] [  201]     0   201    33443       84      58       3      261             0 vmtoolsd
[110148.675591] [  204]     0   204    13731       68      22       3       84             0 systemd-journal
[110148.675592] [  240]     0   240    11973       11      25       3      757         -1000 systemd-udevd
[110148.675593] [  268]     0   268    12468       15      29       3      114             0 rpcbind
[110148.675594] [  317]     0   317    38372        1      65       3      866             0 VGAuthService
[110148.675595] [  319]     0   319     7409       18      19       3       44             0 cron
[110148.675596] [  320]     0   320     1054        0       8       3       37             0 acpid
[110148.675597] [  321]   106   321    11283       47      26       3       71          -900 dbus-daemon
[110148.675598] [  328]     0   328    11630       33      27       3      111             0 systemd-logind
[110148.675598] [  329]     0   329    62529       43      29       4      186             0 rsyslogd
[110148.675599] [  458]     0   458    17486        0      38       3      198         -1000 sshd
[110148.675600] [  475]   108   475     8858        0      22       3      201             0 rpc.statd
[110148.675601] [  500]     0   500     3634        0      12       3       38             0 agetty
[110148.675602] [  512]   107   512   442457   108792     667       4   178529             0 tvheadend
[110148.675603] [  514]   105   514    24463       65      22       4       99             0 ntpd
[110148.675605] Out of memory: Kill process 512 (tvheadend) score 932 or sacrifice child
[110148.675687] Killed process 512 (tvheadend) total-vm:1769828kB, anon-rss:435168kB, file-rss:0kB, shmem-rss:0kB
[110148.706219] oom_reaper: reaped process 512 (tvheadend), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Is there anything I can do to gather more informtation? Any log file or error message you are interested in? Anything else I can do, or should not do?

The easiest thing I could is probably reinstall everything with Debian 8. That would maybe solve my problems, but if this is a bug then that won't help anyone.

#2

Updated by Gert Jansen about 7 years ago

I am prepared to assist as much as I can with testing, log files etc...
However, I cannot afford to just sit and wait and have my TVHeadend crashing while no progress is being made.

If there is no time or no interest to invest time in this issue, please let me know and I'll revert back to Debian 8 and see how it goes from there.

#3

Updated by Mark Clarkstone about 7 years ago

Gert Jansen wrote:

I am prepared to assist as much as I can with testing, log files etc...
However, I cannot afford to just sit and wait and have my TVHeadend crashing while no progress is being made.

I completely understand, if possible could you collect debugging information? See Debugging & Traces for info.

If there is no time or no interest to invest time in this issue, please let me know and I'll revert back to Debian 8 and see how it goes from there.

It's not that there is no interest, it's lack of man power, there is currently mostly only one person working on tvheadend, with a few others helping where possible.

Out of interest, have you tried the stable-4.2 release and/or the latest 4.3.488? I personally have been running 4.3.x for a while with very few issues on Debian 9 (but not in a VM). If you're able keep a snap shot of the failing VM & go back to a working one for now.

#4

Updated by Gert Jansen about 7 years ago

Mark Clarkstone wrote:

Out of interest, have you tried the stable-4.2 release and/or the latest 4.3.488? I personally have been running 4.3.x for a while with very few issues on Debian 9 (but not in a VM). If you're able keep a snap shot of the failing VM & go back to a working one for now.

No, I haven't tried 4.3.488 yet.
As you can see, I have lots of different versions running at the moment, and after a while I gave up on the upgrading since that was not providing a solution.

For now, I have concentrated on VM#1.
I have stopped the TVH service, copied the tvheadend directory to my user profiles, and started tvheadend manually from there.

I have used the ulimit -c unlimited command, and started TVH with the ./build.linux/tvheadend -l /tmp/tvheadend.log --trace htsp,subscription,pat,pmt,bat command.
However, I got a message that pat, pmt, bat were unknown subsystem options (or something similar)

I get the same message when I try to add them to one of the other TVH machines.

2017-09-23 11:53:49.088 config: uknown subsystem 'pat'
2017-09-23 11:53:49.088 config: uknown subsystem 'pmt'
2017-09-23 11:53:49.088 config: uknown subsystem 'bat'

One thing I noticed is that TVH seems to be using a lot more memory than it did before.
Top show a %Mem of 33.6 for VM#1 (after roughly 2 hours), while that is only 6.5 on the VM#5 (the Debian 8 machine) and 2.8 on the RPi.
It was as high as 87.8 before I stopped it to enable the tracing/debugging things...

#5

Updated by Gert Jansen about 7 years ago

Gert Jansen wrote:

No, I haven't tried 4.3.488 yet.

I can try 4.3.488 or the then-current (latest) version if/when this one crashes and we have some useful logs and trace files

#6

Updated by Jaroslav Kysela about 7 years ago

Could you look to 'Configuration / Debugging / Memory information entries' and compare values after process start and after awhile ? (VM#1) - install debug version of tvh, if you don't have such tab in the Debugging.

Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).

#7

Updated by Gert Jansen about 7 years ago

Jaroslav Kysela wrote:

Could you look to 'Configuration / Debugging / Memory information entries' and compare values after process start and after awhile ? (VM#1) - install debug version of tvh, if you don't have such tab in the Debugging.

Do you mean like the attached file? (tvh_01.png)
This is the current situation, about 3 hours after I started TVH manually.
Since then, no recordings have been made.

Is there an easier way to get all the values apart from making a screenshot?
Shall I stop and restart TVH and create a new overview just after the start, and a new after half an hour?

Jaroslav Kysela wrote:

Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).

I only use it to record programs. No streaming, no live watching.

#8

Updated by Jaroslav Kysela about 7 years ago

Gert Jansen wrote:

Jaroslav Kysela wrote:

Could you look to 'Configuration / Debugging / Memory information entries' and compare values after process start and after awhile ? (VM#1) - install debug version of tvh, if you don't have such tab in the Debugging.

Do you mean like the attached file? (tvh_01.png)
This is the current situation, about 3 hours after I started TVH manually.
Since then, no recordings have been made.

Is there an easier way to get all the values apart from making a screenshot?

wget -O a.txt http://localhost:9981/api/memoryinfo/grid

Shall I stop and restart TVH and create a new overview just after the start, and a new after half an hour?

Yes, probably.

Jaroslav Kysela wrote:

Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).

I only use it to record programs. No streaming, no live watching.

OK. We had another report and the culprit was the slow disk I/O. TVH tries to buffer everything to memory in this case (I have to probably implement a buffer threshold to drop data to avoid such situations).

#9

Updated by Gert Jansen about 7 years ago

Jaroslav Kysela wrote:

Gert Jansen wrote:

Shall I stop and restart TVH and create a new overview just after the start, and a new after half an hour?

Yes, probably.

OK, will do so in about 30 minutes.

Jaroslav Kysela wrote:

Also, it may help to know, how the tvh is used (HTTP streaming only, DVR, etc.).

I only use it to record programs. No streaming, no live watching.

OK. We had another report and the culprit was the slow disk I/O. TVH tries to buffer everything to memory in this case (I have to probably implement a buffer threshold to drop data to avoid such situations).

The disk the VMs record to is on the same ESXi host, so it is 'local' (no actual network connection needed) for all the VMs (not for the RPi).
I have a cronjob which moves the recordings from that disk to my NAS during the night.

#10

Updated by Gert Jansen about 7 years ago

The file attached is from a few seconds after the start of TVH (tvh_01_start.PNG)

I got a 401 Unauthorized (Username/Password Authentication failed) when trying the wget command.

#11

Updated by Gert Jansen about 7 years ago

And this is about an hour later

#12

Updated by Jaroslav Kysela about 7 years ago

The contents is identical. RSS/RES (memory used by the tvh process) increases?

#13

Updated by Gert Jansen about 7 years ago

Jaroslav Kysela wrote:

The contents is identical. RSS/RES (memory used by the tvh process) increases?

And it is still exactly the same at this moment.

The memory usage had increased, but just a bit.
I only know the %MEM, sorry. That was 31.3 when it started, and now it is:

top - 20:26:47 up 2 days,  2:15,  2 users,  load average: 0.00, 0.00, 0.00
Threads: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.7 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   504408 total,    20068 free,   268068 used,   216272 buff/cache
KiB Swap:   731132 total,   715156 free,    15976 used.   218364 avail Mem

PID   USER     PR  NI  VIRT    RES    SHR    S %CPU %MEM TIME+     COMMAND
3169  gert     20  0   844692  227284 11332  S 0.3  45.1 0:11.88   tvheadend
#14

Updated by Gert Jansen about 7 years ago

I seem not to be able to edit my post once it is submitted?

Anyhow, I have 2 recordings coming up soon. Both on the same channel.
I'll see what happens there.

#15

Updated by Jaroslav Kysela about 7 years ago

The 'RES' column in top is RSS. You can obtain RSS also through ps like:

ps -aly | grep -E "(RSS|tvheadend)" 

If tvh process is using over 200MB of memory, but the memory entries shows than only few MB are allocated, then something is really wrong and it might not be easy to determine what's going on in the tvh process.

#16

Updated by Gert Jansen about 7 years ago

VM#1 - currently recording

ps -aly | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S  1000  3169   576  2  80   0 236316 211173 -    pts/0    00:07:16 tvheadend

VM#2 - currently recording

ps -aly -u hts | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S   107   540     1  0  80   0 168620 213580 -    ?        17:14:42 tvheadend

VM#3 - currently idle

 ps -aly -u hts | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S   107   553     1  0  80   0 380296 245332 -    ?        00:45:43 tvheadend

VM#4 - currently idle

 ps -aly -u hts | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S   107   561     1  0  80   0 29172 179061 -     ?        01:02:23 tvheadend

VM#5 - currently recording

ps -aly -u tvheadend | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S  1001   478     1  0  80   0 31676 222397 -     ?        06:39:01 tvheadend

RPi - currently idle

ps -aly -u tvheadend | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S  1001   757     1  3  80   0 25524 79940 -      ?        04:47:00 tvheadend

#17

Updated by Jaroslav Kysela about 7 years ago

What is in 'Memory information entries' for VM#1 now ? (Don't forget to refresh the contents if you have already open the page in browser - the auto-refresh does not work).

#18

Updated by Gert Jansen about 7 years ago

See attachment.

The first recording is finished, the second one is still busy.

 ps -aly | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S  1000  3169   576  3  80   0 249208 228830 -    pts/0    00:10:31 tvheadend
#19

Updated by Gert Jansen about 7 years ago

Not much happened in the last 24 hours. VM#1 still has not crashed (yet)

ps -aly | grep -E "(RSS|tvheadend)" 
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
S  1000  3169   576  1  80   0 413048 261598 -    pts/0    00:28:07 tvheadend
top - 22:28:41 up 3 days,  4:17,  3 users,  load average: 0.18, 0.23, 0.25
Threads: 118 total,   2 running, 116 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.5 us,  2.5 sy,  0.0 ni, 86.3 id,  8.5 wa,  0.0 hi,  0.4 si,  0.0 st
KiB Mem :   504408 total,     6536 free,   471104 used,    26768 buff/cache
KiB Swap:   731132 total,   699708 free,    31424 used.    17824 avail Mem

PID   USER     PR  NI  VIRT    RES    SHR    S %CPU %MEM TIME+     COMMAND
4252  gert     20  0   1046392 415708 0      R 4.3  82.4 6:32.06   tvh:lnxdvb-+
3208  gert     20  0   1046392 415708 0      S 1.7  82.4 3:40.56   tvh:mi-main
4297  root     20  0   0       0      0      S 1.0  0.0  0:05.40   kworker/0:3
4283  gert     20  0   1046392 415708 0      S 0.7  82.4 0:08.08   tvh:dvr
138   root     0   -20 0       0      0      S 0.3  0.0  1:22.86   kworker/0:1H

Debugging - Memory Infornation Entries attached.

#20

Updated by Jaroslav Kysela about 7 years ago

Which streaming profile do you use for DVR? (matroska? pass?)

#21

Updated by Gert Jansen about 7 years ago

Jaroslav Kysela wrote:

Which streaming profile do you use for DVR? (matroska? pass?)

I use pass for stream profile. I have set cache scheme to "Don't Keep" and adapted the pre- and post-recording padding times. Apart from that I think most values in my recording profile are still at the default values.

#22

Updated by Gert Jansen about 7 years ago

Eventually, it (VM #1) did crash.

The last few messages in the session where I started TVHeadend from:

2017-09-25 23:38:54.034 [WARNING] TS: LNB2 = Diseqc B = Astra 3/12187H/Veronica/DisneyXD HD: H264 @ #517 Continuity counter error (total 2)
2017-09-25 23:42:45.361 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:42:55.373 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:43:05.360 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:43:15.359 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:43:25.383 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:43:35.377 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:43:45.374 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:43:55.389 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:44:05.398 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:44:15.404 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:44:25.405 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:44:35.409 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:44:45.419 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:44:55.461 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
2017-09-25 23:45:05.557 [WARNING] mpegts: too much queued table input data (over 2MB), discarding new
Killed

At 23:38 it was recording a movie (start time 20:21:30, scheduled end time: 23:42:00). There was no other recording going on on that VM at the same time.

On VM#5, there was a recording from 22:41:30 until 23:52 and another one from 23:26:30 until 00:42:00. Both are absolutely fine. No errors, no warnings. Just perfect.

All other VMs and the RPi were idle.

I have not yet restarted the VM.
Is there any information I need to get from it before I can safely restart it?
Where can I find the logs/traces that are interesting?

#23

Updated by Jaroslav Kysela about 7 years ago

You may try valgrind https://tvheadend.org/projects/tvheadend/wiki/Debugging#Memory-leaks-or-corruption to detect the memory leak. Note that it's slow, so I suggest to do a test recording (5 minutes or so) and then terminate tvh. Run tvh binary from command line (not in background).

#24

Updated by Gert Jansen about 7 years ago

May try that later today. Can't do that while working, sadly.
I may try to sneak it in during a break later today.

Is there anything useful like logfiles or traces that I have to save before I reboot the VM or restart TVH ?

#25

Updated by Gert Jansen about 7 years ago

I have attached two files:
  • tvh.log = the last 250 lines of the log/trace/debugfile that ended with yesterday morning's crash
  • valgrind_tvh.log = the file that was generated while doing the 5 minute recording

If needed, I can upload the full logfile instead of just the last 250 lines, but it's rather big: 61MB

#26

Updated by Jaroslav Kysela about 7 years ago

The valgrind does not show any memory leaks which can explain the behaviour. Could you observe, when RES/RSS is being increased exactly? Try to split activities like DVR, HTTP access, EPG grab etc...

#27

Updated by Gert Jansen almost 7 years ago

Despite all the time that went on, I have been unable to exactly pinpoint why or when it goes wrong.

I have made a few changes in the mean time:
  • added a second ESXi server
  • split the TVHeadend clients across the 2 hosts: each host has a client with a DVBSky and a Terratec adapter. The TBS adapter stayed where it was
  • upgraded all TVHeadend clients to newer versions (except the TBS one)
  • reinstalled a VM from scratch for one of the DVBSky clients.

The crashes seem to occur less frequently but have not gone away.

As far as I can tell, they only happen on the VMs with DVBSky adapters.

I have no idea if the problem is caused by
  • something in TVHeadend
  • Debian 9 - or more precisely: something in kernel 4.9.x
  • the drivers of the DVBSky adapters.

Previously, the DVBSky VMs were running Debian 8 (Kernel 3.16.x) and drivers compiled from the sources provided by DVBSky.
Nowadays the DVBSky is included in the kernel (since 3.18.x IIRC), and doesn't need a driver anymore.

Yesterday I have created 2 new VMs with Debian 8, and again drivers compiled from the sources provided by DVBSky.
I have built the latest TVHeadend (4.3-861~g715a4a8) for both of them.

I'll see how it goes from here.

Also available in: Atom PDF