Project

General

Profile

Bug #5651

CPU at 100% after some hours

Added by deganza 11 over 5 years ago. Updated over 3 years ago.

Status:
Invalid
Priority:
Normal
Assignee:
-
Category:
General
Target version:
-
Start date:
2019-06-09
Due date:
% Done:

0%

Estimated time:
Found in version:
4.3-1792
Affected Versions:

Description

I'm using Tvheadend 4.3-1792~g466a01431~cosmic on an Ubuntu machine with an I7 with 12GB RAM.
Every morning I've the same problem: the CPU is on 100% and I've to restart with "systemctl restart tvheadend" the TVH-Server to cool down the machine.

Do you know why the machine get's such high CPU usage? After the restart everything is fine until the next end of the day.


Files

tvhhtop100.jpg (319 KB) tvhhtop100.jpg deganza 11, 2019-06-12 04:18
tvh_CPU.jpg (243 KB) tvh_CPU.jpg deganza 11, 2019-06-30 09:58
truss.gz (139 KB) truss.gz truss -p pid/tid Martin Rehak, 2021-03-31 08:05

History

#1

Updated by saen acro over 5 years ago

try with

top -H

Very strange way to restart service normally command is

service tvheadend restart 

#2

Updated by deganza 11 over 5 years ago

saen acro wrote:

try with
[...]

Very strange way to restart service normally command is
[...]

saen acro wrote:

try with
[...]

Very strange way to restart service normally command is
[...]

What happens if write top -H ?

deganza 11 wrote:

I'm using Tvheadend 4.3-1792~g466a01431~cosmic on an Ubuntu machine with an I7 with 12GB RAM.
Every morning I've the same problem: the CPU is on 100% and I've to restart with "systemctl restart tvheadend" the TVH-Server to cool down the machine.

Do you know why the machine get's such high CPU usage? After the restart everything is fine until the next end of the day.

#3

Updated by saen acro over 5 years ago

top -H show sub-treads of TVH with will point to problematic part

if you want to be more detailed use HTOP and on setting set to show custom treads names


#4

Updated by deganza 11 over 5 years ago

saen acro wrote:

top -H show sub-treads of TVH with will point to problematic part

if you want to be more detailed use HTOP and on setting set to show custom treads names


ok, thank you. The most ressources are used when TVH is recording programms.
So if TVH is recording 5 programms at the same time, the CPU loads at 100%.

But after the recording are finished, the CPU still stays very high. But there are no recordings anymore.

What are the ideal sytemrequirements if i want to record 15 programms at the same time?
Is an i7 with 12 GB RAM enough?

#5

Updated by saen acro over 5 years ago

External RAID controller ;)
Depend on bit-rate of recorded content.
Still guessing what operating system you use.

#6

Updated by deganza 11 over 5 years ago

saen acro wrote:

External RAID controller ;)
Depend on bit-rate of recorded content.
Still guessing what operating system you use.

I'm using Ubuntu 18.10.

With htoo it's showing that 11 processes are on. But tvh isn't recording nothing anymore.
Any idea? It happens every day.

I'm not able to upload the screenshot. It gives me everytime an error.

#7

Updated by saen acro over 5 years ago

So with is process is with high cpu use?
https://postimages.org

#8

Updated by deganza 11 over 5 years ago

saen acro wrote:

So with is process is with high cpu use?
https://postimages.org

Here are the processes

#9

Updated by Pablo R. over 5 years ago

Use

top -H

when the problem occurs

#10

Updated by Pablo R. over 5 years ago

But, from your last screenshot is a problem with DVR...

#11

Updated by deganza 11 over 5 years ago

Pablo R. wrote:

But, from your last screenshot is a problem with DVR...

correct, it's a problem with DVR. This processes in the screenshot are all terminated but still consume the whole CPU.

#12

Updated by Flole Systems over 5 years ago

Use gdb to check what they are doing. Then you know why they're "locked up".

#13

Updated by deganza 11 over 5 years ago

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm sorry but I don't know how to use this command with tvh.

#14

Updated by deganza 11 over 5 years ago

deganza 11 wrote:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm sorry but I don't know how to use this command with tvh.

deganza 11 wrote:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm sorry but I don't know how to use this command with tvh.

Here's you see what I mean. This are the used cpu ressources of the last 24 hours (red).
I'ver every day the same pattern. The cpu rises and then stays all the time at 90%.

#16

Updated by bas co over 5 years ago

Same here, did the HTOP thing, thv.dvr process at 100%. using master build of today.
tried different setting, but nothing improved the situation so far.
i'm using mergerfs. do my initial thoughts where that tvh does not cope well with mergerfs.
but no errors in mergerfs log.

#17

Updated by Flole Systems over 5 years ago

Then same thing for you aswell:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

#18

Updated by bas co over 5 years ago

ok, i did some research.
i'm using mergerfs/fuse to store my recordings.

this is what i found:

as soon as one of the disks in my jbod that tvheadend is recording onto is full, the recording thread stops recording and the cpu goes to 100%.

this is what i did:

i configured mergerfs to always use the disk that has most free space (category.create=mfs instead of the disk that has the correct directory tree).
so tvheadend does not encounter full disks anymore, except the one occasion where my entire jbod is full.

#19

Updated by Jonathan S almost 5 years ago

This bug also surfaces when the user running Tvheadend does not have write permissions on the recording directory.

#20

Updated by Hugh Phoenix-Hulme over 4 years ago

I'm also getting 99% from two instances of tvh:pvr. However, I have no "funny business" in the pipeline, it's just writing to a local disk to which it has permission to write. Two threads are chewing up 100% CPU, IO is quite low. There are also two idle instances of tvh:pvr.

The stream profile is 'pass', and by comparison, ffmpeg streaming h264 1080p video from video cameras to the same disk is hardly taking any CPU time at all. Those processes should be doing roughly the same: copying the data directly onto disk.

I get this when there's one programme recording (I imagine the other is for Timeshift) - I'll see what happens with no recordings tomorrow (or else I'll have to wail til 1am)

#21

Updated by Flole Systems over 4 years ago

Same thing for you then: Use gdb to find out what exactly is looping so fast/using that much CPU.

#22

Updated by Hugh Phoenix-Hulme over 4 years ago

I get it with no recordings. Just two instances of htv:pvr sitting there at 99% CPU.

Disabling Timeshift makes no difference.

Restarting the service brings it back down.

Enabling Timeshift doesn't make it increase.

Recording doesn't make it increase.

Deleting the recording doesn't make it increase.

#23

Updated by Flole Systems over 4 years ago

  • Status changed from New to Invalid

Information needed are not provided. This can be re-opened once gdb traces are provided.

#24

Updated by Hugh Phoenix-Hulme over 4 years ago

Unfortunately "use gdb" isn't itself enough information...

#25

Updated by Stephen West over 4 years ago

I think I'm getting hit with this issue:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm not sure how to gdb this right now, but I can strace:


(gdb) bt full
#0  0xb7f2bbb5 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb5e62208 in futex_wake (private=-1243144192, processes_to_wake=2147483647, 
    futex_word=0x21f4238) at ../sysdeps/unix/sysv/linux/futex-internal.h:231
        resultvar = <optimized out>
        __ret = <optimized out>
        res = <optimized out>
        res = <optimized out>
        __ret = <optimized out>
        resultvar = <optimized out>
#2  __condvar_confirm_wakeup (cond=cond@entry=0x21f4214, private=private@entry=0)
    at pthread_cond_wait.c:55
No locals.
#3  0xb5e629e3 in __pthread_cond_wait_common (abstime=0x0, mutex=0x21f41d4, 
    cond=0x21f4214) at pthread_cond_wait.c:419
        err = 22
        g = 0
        flags = <optimized out>
        g1_start = <optimized out>
        signals = <optimized out>
        result = 0
        wseq = <optimized out>
        seq = 4384125476
        private = <optimized out>
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        g1_start = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        spin = <optimized out>
        buffer = <optimized out>
        cbuffer = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        rt = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        __atg102_expected = <optimized out>
        s = <optimized out>
        __atg100_val = <optimized out>
        __atg102_expected = <optimized out>
#4  __pthread_cond_wait (cond=0x21f4214, mutex=0x21f41d4) at pthread_cond_wait.c:655
No locals.
#5  0x004c3286 in ?? ()
No symbol table info available.
#6  0x0055e2f6 in ?? ()
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#7  0x004c242a in ?? ()
No symbol table info available.
#8  0xb5e5c3bd in start_thread (arg=0xa54ffb40) at pthread_create.c:463
        pd = 0xa54ffb40
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1243144192, -1521484992, 
                -1243144192, -1521489624, -1199842656, -332536703}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#9  0xb5c66fe6 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108
No locals.
(gdb) 

I don't think that was helpful, I don't have symbols for the middle part (#5,6,7).

Is there anything else to try? I'll just restart the service and hope it doesn't come back.

#26

Updated by Stephen West over 4 years ago

forgot I'm running:
HTS Tvheadend 4.3.0~pre+202007131151-0~built202007140102~gitce09077~ubuntu18.04.1

...maybe I shouldn't be running the unstable branch. I just realized that.

#27

Updated by Flole Systems over 4 years ago

That's perfectly fine, the real issue is that you are running a version that contains some kinds of modifications. With that it's impossible to find the issue. Also it seems like for the gdb trace the debug symbols are missing.

#28

Updated by Stephen West over 4 years ago

I didn't think I modified anything. How would I go about getting an unmodified version? I tried to follow the instructions at https://tvheadend.org/projects/tvheadend/wiki/AptRepositories , I think I did the "sudo apt-add-repository ppa:mamarley/tvheadend-git" instead of the Doozer packages. Would it be helpful/dangerous to switch to the official Doozer package?

#29

Updated by Flole Systems over 4 years ago

When you use an official version this would be the right place to ask for support, otherwise you would have to contact the maintainer of the package. You will notice that the version number is completely different on the official repository. There shouldn't be any issues switching the repository but you should make a config backup just to be safe.

#30

Updated by saen acro over 4 years ago

Stephen West wrote:

From attached image is clear that only one of two cores is on 100%
With can mean that cpu is slow for task.

#31

Updated by Martin Klayer over 4 years ago

Flole Systems Systems I also followed the instruction at https://tvheadend.org/projects/tvheadend/wiki/AptRepositories .
What else could I do? What is the correct repro to use.

I am seeing also the 100 % CPU utilization once a recording has be started. But even after the recording has finished and not other tasks are running on the tvhead server, the utilization is staying at 100%

The only workaround ist to perform a cron job which executes systemctl restart tvhead.service.

#32

Updated by Flole Systems over 4 years ago

The correct repo is obviously the one under the official section. Once you have a version from that you should reproduce the issue, note down the PID, identify the Thread that's using the CPU and use gdb to trace what that thread is doing.

#33

Updated by Rene Wagler almost 4 years ago

anything new here ?

#34

Updated by Flole Systems almost 4 years ago

No, still no support for unofficial or modified versions here. It's up to the maintainer to provide support for their custom and modified builds. That will most likely not change anytime soon.

#35

Updated by Erwin Mueller almost 4 years ago

Hello,
I have the same issue but I am using Build: 4.2.8-36~g5bdcfd8ac directly from tvheadend repository (stable) via apt-get.
So it should be an official version.

Is there a version out there without having this issue?

#36

Updated by Flole Systems almost 4 years ago

Update to latest master.

#37

Updated by Martin Rehak almost 4 years ago

Attaching a truss output from the tvheadend:dvr taking 100% core on FreeBSD. This is from the git repo pull on March 29.

commit dbaa0f850394af8ab845df802f5f781ac0218ec4 (HEAD -> master, origin/master, origin/HEAD)

Does it help anyhow? How to use gdb to find what the thread is doing? Is the stack trace enough?

#38

Updated by Martin Schlatter over 3 years ago

Same here: after a few days, CPU goes up to 100%, the subthread is :dvr, no recording is active.

tvheadend/focal,now 4.3-1979~g8fc2dfa7e~focal

#39

Updated by Martin Schlatter over 3 years ago

Here is a part of the syslog. What is remarkable is that the movie title contains the special character "ä" ("läuft").
And also that the path "/media/nuc/16ef5d00-a19a-492a-a7cb-7770e06165f5" does not exist.
The problem are not the permissions, normally recording works.

Oct 11 18:04:30 nuc tvheadend[602567]: dvr: "Deutschland 24/7  Ohne uns läuft nichts!" on "DMAX" recorder starting
Oct 11 18:04:30 nuc tvheadend[602567]: mpegts: 282MHz in DVBC - tuning on DRXK DVB-C DVB-T #2 : DVB-C #0
Oct 11 18:04:30 nuc tvheadend[602567]: subscription: 0346: "DVR: Deutschland 24/7  Ohne uns läuft nichts!" subscribing on channel "DMAX", weight: 300, adapter: "DRXK DVB-C DVB-T #2 : DVB-C #0", network: "DVBC", mux: "282MHz", provider: "Unitymedia", service: "DMAX", profile="pass", username="" 

Oct 11 18:05:00 nuc tvheadend[602567]: pass: /media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts: Unable to create file, open failed -- Permission denied
Oct 11 18:05:00 nuc tvheadend[602567]: dvr: Recording error: "/media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts": Unable to open file
Oct 11 18:05:00 nuc tvheadend[602567]: subscription: 0346: "DVR: Deutschland 24/7  Ohne uns läuft nichts!" unsubscribing from "DMAX", username="" 
Oct 11 18:05:00 nuc tvheadend[602567]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts'
Oct 11 18:05:00 nuc tvheadend[602567]: dvr: "Deutschland 24/7  Ohne uns läuft nichts!" on "DMAX": End of program: File not created

Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts'
Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts'
Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts'
Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/nuc/16ef5d00-a19a-492a-a7cb-7770e06165f5/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!-2.ts'
Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts'
Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7  Ohne uns lauft nichts!/Deutschland 24-7  Ohne uns lauft nichts!.ts'
#40

Updated by Martin Schlatter over 3 years ago

Oh and it is a serie and the title contains a "/" so the constructed path is wrong.

#41

Updated by Martin Schlatter over 3 years ago

The directory "Deutschland 24" was created by another user, so there was indeed a permission problem at 18:05. This caused a looping "stat" at 18:47

#42

Updated by Martin Schlatter over 3 years ago

https://tvheadend.org/projects/tvheadend/repository/tvheadend/revisions/master/entry/src/dvr/dvr_vfsmgr.c

In dvr_vfsmgr.c there is a function dvr_vfs_refresh_entry . this function loops over files and does a check stat(filename, &st).
If the return value from stat is <0, it calls tvherror(LS_DVR, "unable to stat file '%s'", filename) and ignores the file.
But if the return code from stat is <0 because the file is there but cannot be accessed because permission is denied, the loop is endless.

#43

Updated by saen acro over 3 years ago

Martin Schlatter wrote:

https://tvheadend.org/projects/tvheadend/repository/tvheadend/revisions/master/entry/src/dvr/dvr_vfsmgr.c

In dvr_vfsmgr.c there is a function dvr_vfs_refresh_entry . this function loops over files and does a check stat(filename, &st).
If the return value from stat is <0, it calls tvherror(LS_DVR, "unable to stat file '%s'", filename) and ignores the file.
But if the return code from stat is <0 because the file is there but cannot be accessed because permission is denied, the loop is endless.

https://github.com/tvheadend/tvheadend/blob/master/src/dvr/dvr_vfsmgr.c#L85
can you suggest PR with fix
And message: permission denied

#44

Updated by Flole Systems over 3 years ago

I'm not really sure why that loop should be endless? It's a foreach-loop, so it shouldn't be endless.

#45

Updated by Martin Schlatter over 3 years ago

TAILQ_FOREACH tries to traverse a directory, but does not have permission.
Why is he doing

        htsmsg_delete_field(m, "fsid");
        htsmsg_delete_field(m, "size");

and what happens if there is no fsid and no size?

Also available in: Atom PDF