Bug #5651
CPU at 100% after some hours
0%
Description
I'm using Tvheadend 4.3-1792~g466a01431~cosmic on an Ubuntu machine with an I7 with 12GB RAM.
Every morning I've the same problem: the CPU is on 100% and I've to restart with "systemctl restart tvheadend" the TVH-Server to cool down the machine.
Do you know why the machine get's such high CPU usage? After the restart everything is fine until the next end of the day.
Files
History
Updated by saen acro over 5 years ago
try with
top -H
Very strange way to restart service normally command is
service tvheadend restart
Updated by deganza 11 over 5 years ago
saen acro wrote:
try with
[...]Very strange way to restart service normally command is
[...]
saen acro wrote:
try with
[...]Very strange way to restart service normally command is
[...]
What happens if write top -H ?
deganza 11 wrote:
I'm using Tvheadend 4.3-1792~g466a01431~cosmic on an Ubuntu machine with an I7 with 12GB RAM.
Every morning I've the same problem: the CPU is on 100% and I've to restart with "systemctl restart tvheadend" the TVH-Server to cool down the machine.Do you know why the machine get's such high CPU usage? After the restart everything is fine until the next end of the day.
Updated by saen acro over 5 years ago
top -H show sub-treads of TVH with will point to problematic part
if you want to be more detailed use HTOP and on setting set to show custom treads names
Updated by deganza 11 over 5 years ago
saen acro wrote:
top -H show sub-treads of TVH with will point to problematic part
if you want to be more detailed use HTOP and on setting set to show custom treads names
ok, thank you. The most ressources are used when TVH is recording programms.
So if TVH is recording 5 programms at the same time, the CPU loads at 100%.
But after the recording are finished, the CPU still stays very high. But there are no recordings anymore.
What are the ideal sytemrequirements if i want to record 15 programms at the same time?
Is an i7 with 12 GB RAM enough?
Updated by saen acro over 5 years ago
External RAID controller ;)
Depend on bit-rate of recorded content.
Still guessing what operating system you use.
Updated by deganza 11 over 5 years ago
saen acro wrote:
External RAID controller ;)
Depend on bit-rate of recorded content.
Still guessing what operating system you use.
I'm using Ubuntu 18.10.
With htoo it's showing that 11 processes are on. But tvh isn't recording nothing anymore.
Any idea? It happens every day.
I'm not able to upload the screenshot. It gives me everytime an error.
Updated by saen acro over 5 years ago
So with is process is with high cpu use?
https://postimages.org
Updated by deganza 11 over 5 years ago
- File tvhhtop100.jpg tvhhtop100.jpg added
saen acro wrote:
So with is process is with high cpu use?
https://postimages.org
Here are the processes
Updated by deganza 11 over 5 years ago
Pablo R. wrote:
But, from your last screenshot is a problem with DVR...
correct, it's a problem with DVR. This processes in the screenshot are all terminated but still consume the whole CPU.
Updated by Flole Systems over 5 years ago
Use gdb to check what they are doing. Then you know why they're "locked up".
Updated by deganza 11 over 5 years ago
Flole Systems wrote:
Use gdb to check what they are doing. Then you know why they're "locked up".
I'm sorry but I don't know how to use this command with tvh.
Updated by deganza 11 over 5 years ago
- File tvh_CPU.jpg tvh_CPU.jpg added
deganza 11 wrote:
Flole Systems wrote:
Use gdb to check what they are doing. Then you know why they're "locked up".
I'm sorry but I don't know how to use this command with tvh.
deganza 11 wrote:
Flole Systems wrote:
Use gdb to check what they are doing. Then you know why they're "locked up".
I'm sorry but I don't know how to use this command with tvh.
Here's you see what I mean. This are the used cpu ressources of the last 24 hours (red).
I'ver every day the same pattern. The cpu rises and then stays all the time at 90%.
Updated by bas co about 5 years ago
Same here, did the HTOP thing, thv.dvr process at 100%. using master build of today.
tried different setting, but nothing improved the situation so far.
i'm using mergerfs. do my initial thoughts where that tvh does not cope well with mergerfs.
but no errors in mergerfs log.
Updated by Flole Systems about 5 years ago
Then same thing for you aswell:
Flole Systems wrote:
Use gdb to check what they are doing. Then you know why they're "locked up".
Updated by bas co about 5 years ago
ok, i did some research.
i'm using mergerfs/fuse to store my recordings.
this is what i found:
as soon as one of the disks in my jbod that tvheadend is recording onto is full, the recording thread stops recording and the cpu goes to 100%.
this is what i did:
i configured mergerfs to always use the disk that has most free space (category.create=mfs instead of the disk that has the correct directory tree).
so tvheadend does not encounter full disks anymore, except the one occasion where my entire jbod is full.
Updated by Jonathan S over 4 years ago
This bug also surfaces when the user running Tvheadend does not have write permissions on the recording directory.
Updated by Hugh Phoenix-Hulme over 4 years ago
I'm also getting 99% from two instances of tvh:pvr. However, I have no "funny business" in the pipeline, it's just writing to a local disk to which it has permission to write. Two threads are chewing up 100% CPU, IO is quite low. There are also two idle instances of tvh:pvr.
The stream profile is 'pass', and by comparison, ffmpeg streaming h264 1080p video from video cameras to the same disk is hardly taking any CPU time at all. Those processes should be doing roughly the same: copying the data directly onto disk.
I get this when there's one programme recording (I imagine the other is for Timeshift) - I'll see what happens with no recordings tomorrow (or else I'll have to wail til 1am)
Updated by Flole Systems over 4 years ago
Same thing for you then: Use gdb to find out what exactly is looping so fast/using that much CPU.
Updated by Hugh Phoenix-Hulme over 4 years ago
I get it with no recordings. Just two instances of htv:pvr sitting there at 99% CPU.
Disabling Timeshift makes no difference.
Restarting the service brings it back down.
Enabling Timeshift doesn't make it increase.
Recording doesn't make it increase.
Deleting the recording doesn't make it increase.
Updated by Flole Systems over 4 years ago
- Status changed from New to Invalid
Information needed are not provided. This can be re-opened once gdb traces are provided.
Updated by Hugh Phoenix-Hulme over 4 years ago
Unfortunately "use gdb" isn't itself enough information...
Updated by Stephen West over 4 years ago
I think I'm getting hit with this issue:
Flole Systems wrote:
Use gdb to check what they are doing. Then you know why they're "locked up".
I'm not sure how to gdb this right now, but I can strace:
(gdb) bt full #0 0xb7f2bbb5 in __kernel_vsyscall () No symbol table info available. #1 0xb5e62208 in futex_wake (private=-1243144192, processes_to_wake=2147483647, futex_word=0x21f4238) at ../sysdeps/unix/sysv/linux/futex-internal.h:231 resultvar = <optimized out> __ret = <optimized out> res = <optimized out> res = <optimized out> __ret = <optimized out> resultvar = <optimized out> #2 __condvar_confirm_wakeup (cond=cond@entry=0x21f4214, private=private@entry=0) at pthread_cond_wait.c:55 No locals. #3 0xb5e629e3 in __pthread_cond_wait_common (abstime=0x0, mutex=0x21f41d4, cond=0x21f4214) at pthread_cond_wait.c:419 err = 22 g = 0 flags = <optimized out> g1_start = <optimized out> signals = <optimized out> result = 0 wseq = <optimized out> seq = 4384125476 private = <optimized out> maxspin = <optimized out> err = <optimized out> result = <optimized out> wseq = <optimized out> g = <optimized out> seq = <optimized out> flags = <optimized out> private = <optimized out> signals = <optimized out> g1_start = <optimized out> __atg101_val = <optimized out> __atg100_val = <optimized out> spin = <optimized out> buffer = <optimized out> cbuffer = <optimized out> __atg101_val = <optimized out> __atg100_val = <optimized out> __atg101_val = <optimized out> __atg100_val = <optimized out> rt = <optimized out> __atg101_val = <optimized out> __atg100_val = <optimized out> __atg102_expected = <optimized out> s = <optimized out> __atg100_val = <optimized out> __atg102_expected = <optimized out> #4 __pthread_cond_wait (cond=0x21f4214, mutex=0x21f41d4) at pthread_cond_wait.c:655 No locals. #5 0x004c3286 in ?? () No symbol table info available. #6 0x0055e2f6 in ?? () ---Type <return> to continue, or q <return> to quit--- No symbol table info available. #7 0x004c242a in ?? () No symbol table info available. #8 0xb5e5c3bd in start_thread (arg=0xa54ffb40) at pthread_create.c:463 pd = 0xa54ffb40 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1243144192, -1521484992, -1243144192, -1521489624, -1199842656, -332536703}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = { prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #9 0xb5c66fe6 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108 No locals. (gdb)
I don't think that was helpful, I don't have symbols for the middle part (#5,6,7).
Is there anything else to try? I'll just restart the service and hope it doesn't come back.
Updated by Stephen West over 4 years ago
forgot I'm running:
HTS Tvheadend 4.3.0~pre+202007131151-0~built202007140102~gitce09077~ubuntu18.04.1
...maybe I shouldn't be running the unstable branch. I just realized that.
Updated by Flole Systems over 4 years ago
That's perfectly fine, the real issue is that you are running a version that contains some kinds of modifications. With that it's impossible to find the issue. Also it seems like for the gdb trace the debug symbols are missing.
Updated by Stephen West over 4 years ago
I didn't think I modified anything. How would I go about getting an unmodified version? I tried to follow the instructions at https://tvheadend.org/projects/tvheadend/wiki/AptRepositories , I think I did the "sudo apt-add-repository ppa:mamarley/tvheadend-git" instead of the Doozer packages. Would it be helpful/dangerous to switch to the official Doozer package?
Updated by Flole Systems over 4 years ago
When you use an official version this would be the right place to ask for support, otherwise you would have to contact the maintainer of the package. You will notice that the version number is completely different on the official repository. There shouldn't be any issues switching the repository but you should make a config backup just to be safe.
Updated by saen acro over 4 years ago
Stephen West wrote:
From attached image is clear that only one of two cores is on 100%
With can mean that cpu is slow for task.
Updated by Martin Klayer over 4 years ago
Flole Systems Systems I also followed the instruction at https://tvheadend.org/projects/tvheadend/wiki/AptRepositories .
What else could I do? What is the correct repro to use.
I am seeing also the 100 % CPU utilization once a recording has be started. But even after the recording has finished and not other tasks are running on the tvhead server, the utilization is staying at 100%
The only workaround ist to perform a cron job which executes systemctl restart tvhead.service.
Updated by Flole Systems over 4 years ago
The correct repo is obviously the one under the official section. Once you have a version from that you should reproduce the issue, note down the PID, identify the Thread that's using the CPU and use gdb to trace what that thread is doing.
Updated by Flole Systems almost 4 years ago
No, still no support for unofficial or modified versions here. It's up to the maintainer to provide support for their custom and modified builds. That will most likely not change anytime soon.
Updated by Erwin Mueller over 3 years ago
Hello,
I have the same issue but I am using Build: 4.2.8-36~g5bdcfd8ac directly from tvheadend repository (stable) via apt-get.
So it should be an official version.
Is there a version out there without having this issue?
Updated by Martin Rehak over 3 years ago
Attaching a truss output from the tvheadend:dvr taking 100% core on FreeBSD. This is from the git repo pull on March 29.
commit dbaa0f850394af8ab845df802f5f781ac0218ec4 (HEAD -> master, origin/master, origin/HEAD)
Does it help anyhow? How to use gdb to find what the thread is doing? Is the stack trace enough?
Updated by Martin Schlatter about 3 years ago
Same here: after a few days, CPU goes up to 100%, the subthread is :dvr, no recording is active.
tvheadend/focal,now 4.3-1979~g8fc2dfa7e~focal
Updated by Martin Schlatter about 3 years ago
Here is a part of the syslog. What is remarkable is that the movie title contains the special character "ä" ("läuft").
And also that the path "/media/nuc/16ef5d00-a19a-492a-a7cb-7770e06165f5" does not exist.
The problem are not the permissions, normally recording works.
Oct 11 18:04:30 nuc tvheadend[602567]: dvr: "Deutschland 24/7 Ohne uns läuft nichts!" on "DMAX" recorder starting Oct 11 18:04:30 nuc tvheadend[602567]: mpegts: 282MHz in DVBC - tuning on DRXK DVB-C DVB-T #2 : DVB-C #0 Oct 11 18:04:30 nuc tvheadend[602567]: subscription: 0346: "DVR: Deutschland 24/7 Ohne uns läuft nichts!" subscribing on channel "DMAX", weight: 300, adapter: "DRXK DVB-C DVB-T #2 : DVB-C #0", network: "DVBC", mux: "282MHz", provider: "Unitymedia", service: "DMAX", profile="pass", username="" Oct 11 18:05:00 nuc tvheadend[602567]: pass: /media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts: Unable to create file, open failed -- Permission denied Oct 11 18:05:00 nuc tvheadend[602567]: dvr: Recording error: "/media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts": Unable to open file Oct 11 18:05:00 nuc tvheadend[602567]: subscription: 0346: "DVR: Deutschland 24/7 Ohne uns läuft nichts!" unsubscribing from "DMAX", username="" Oct 11 18:05:00 nuc tvheadend[602567]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts' Oct 11 18:05:00 nuc tvheadend[602567]: dvr: "Deutschland 24/7 Ohne uns läuft nichts!" on "DMAX": End of program: File not created Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts' Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts' Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts' Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/nuc/16ef5d00-a19a-492a-a7cb-7770e06165f5/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!-2.ts' Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts' Okt 11 18:47:30 nuc tvheadend[1102820]: dvr: unable to stat file '/media/video-ssd/Deutschland 24/7 Ohne uns lauft nichts!/Deutschland 24-7 Ohne uns lauft nichts!.ts'
Updated by Martin Schlatter about 3 years ago
Oh and it is a serie and the title contains a "/" so the constructed path is wrong.
Updated by Martin Schlatter about 3 years ago
The directory "Deutschland 24" was created by another user, so there was indeed a permission problem at 18:05. This caused a looping "stat" at 18:47
Updated by Martin Schlatter about 3 years ago
In dvr_vfsmgr.c
there is a function dvr_vfs_refresh_entry
. this function loops over files and does a check stat(filename, &st).
If the return value from stat is <0, it calls tvherror(LS_DVR, "unable to stat file '%s'", filename) and ignores the file.
But if the return code from stat is <0 because the file is there but cannot be accessed because permission is denied, the loop is endless.
Updated by saen acro about 3 years ago
Martin Schlatter wrote:
In
dvr_vfsmgr.c
there is a functiondvr_vfs_refresh_entry
. this function loops over files and does a check stat(filename, &st).
If the return value from stat is <0, it calls tvherror(LS_DVR, "unable to stat file '%s'", filename) and ignores the file.
But if the return code from stat is <0 because the file is there but cannot be accessed because permission is denied, the loop is endless.
https://github.com/tvheadend/tvheadend/blob/master/src/dvr/dvr_vfsmgr.c#L85
can you suggest PR with fix
And message: permission denied
Updated by Flole Systems about 3 years ago
I'm not really sure why that loop should be endless? It's a foreach-loop, so it shouldn't be endless.
Updated by Martin Schlatter about 3 years ago
TAILQ_FOREACH tries to traverse a directory, but does not have permission.
Why is he doing
htsmsg_delete_field(m, "fsid"); htsmsg_delete_field(m, "size");
and what happens if there is no fsid and no size?