Bug #6009
[DVR] 100%CPU after running some days
0%
Description
I'm using tvheadend for a long time. Every now and then this issue happens. But the last weeks this issue reoccurs on a daily basis.
I think that's because over the time we collect over 5000 recordings - but I don't like to delete them to verify...
System: Ubuntu 20.04.2 LTS
Memory: 32 GB
Disk: 24 TB btrfs raid 5
top -H shows tvh:dvr on first place
strace tvheadend repeats the lines
futex(0x5631f0daed60, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x5631f0c4e028, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1613940659, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT
strace tvh:dvr repeats the line
futex(0x5631f3e0adf4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
killing the tvh:dvr thread or restarting tvheadend solves the issue for a while.
History
Updated by Flole Systems over 3 years ago
Update to latest master which contains many improvements and fixes.
Then please check with gdb what the Thread is doing.
Updated by Robert Heel over 3 years ago
As the cpu load is high again I just checked. I will now update to commit 00b35ec7803388eb08e4835a1df821283ddef4a9 and check these days again.
A endless loop in src/dvr/dvr_rec.c
1667 while(run) {
(gdb) n
1668 sm = TAILQ_FIRST(&sq->sq_queue);
(gdb) n
1669 if(sm == NULL) {
(gdb) n
1670 tvh_cond_wait(&sq->sq_cond, &sq->sq_mutex);
(gdb) n
1671 continue;
(gdb) n
1667 while(run) {
(gdb) n
1668 sm = TAILQ_FIRST(&sq->sq_queue);
(gdb) n
1669 if(sm == NULL) {
(gdb) n
1670 tvh_cond_wait(&sq->sq_cond, &sq->sq_mutex);
(gdb) n
1671 continue;
(gdb) n
Updated by Flole Systems over 3 years ago
Something is waking up the thread apparently. If that's done constantly you need to see what exactly is sending those wakeup instructions.
Updated by Robert Heel over 3 years ago
How? I have some coding skills, but I'm not familiar with threads...
TAILQ_FIRST should not be NULL? as long as it is NULL, code will loop ...
New version runs 22 hours without hang - will try next time to dig deeper.
Updated by Flole Systems over 3 years ago
The tvh_cond_wait should wait until there's a wakeup event. So you need to figure out where that wakeup event is coming from.
Updated by Robert Heel over 3 years ago
Also happens with current git version.
1669 if(sm == NULL) {
(gdb) n
1670 tvh_cond_wait(&sq->sq_cond, &sq->sq_mutex);
(gdb) s
tvh_cond_wait (cond=cond@entry=0x7f1b900e7020, mutex=mutex@entry=0x7f1b900e6fb0) at src/tvh_thread.c:353
353 {
(gdb) n
357 const char *filename = NULL;
(gdb) n
359 tvh_mutex_check_magic(mutex, NULL, 0);
(gdb) n
360 if (tvh_thread_debug > 0)
(gdb) n
363 r = pthread_cond_wait(&cond->cond, &mutex->mutex);
(gdb) p cond->cond
$1 = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_MONOTONIC, Shared = No}
(gdb) p mutex->mutex
$2 = pthread_mutex_t = {Type = Adaptive, Status = Destroyed}
(gdb) p r
$3 = <optimized out>
Is it correct that the system call pthread_cond_wait should wait?
Found an error in syslog around the time this happend, too:
Feb 28 04:09:00 server tvheadend602547: dvr: unable to create unique name (missing $n in format string?)
Feb 28 04:09:00 server tvheadend602547: 2021-02-28 04:09:00.027 [ ERROR] dvr: unable to create unique name (missing $n in format string?)
I will create a watchdog to write something in the syslog so maybe I can see if this happens before or after...
Updated by Robert Heel over 3 years ago
Everytime the error "unable to create unique name" occurs, the thread uses 100% CPU
Mar 1 23:45:01 server root: tvheadend 0,0 % CPU
Mar 1 23:46:01 server root: tvheadend 0,0 % CPU
Mar 1 23:46:14 server tvheadend685657: dvr: "Young Sheldon" on "ProSieben" recorder starting
Mar 1 23:46:14 server tvheadend685657: 2021-03-01 23:46:14.004 [ INFO] dvr: "Young Sheldon" on "ProSieben" recorder starting
Mar 1 23:46:14 server tvheadend685657: 2021-03-01 23:46:14.005 [ INFO] mpegts: 12544.75H in DVB-S Network - tuning on TurboSight TBS 6902 DVB-S/S2 #1 : DVB-S #0
Mar 1 23:46:14 server tvheadend685657: mpegts: 12544.75H in DVB-S Network - tuning on TurboSight TBS 6902 DVB-S/S2 #1 : DVB-S #0
Mar 1 23:46:14 server tvheadend685657: subscription: 0017: "DVR: Young Sheldon" subscribing on channel "ProSieben", weight: 500, adapter: "TurboSight TBS 6902 DVB-S/S2 #1 : DVB-S #0", network: "DVB-S Network", mux: "12544.75H", provider: "ProSiebenSat.1", service: "ProSieben", profile="matroska", username="martina"
Mar 1 23:46:14 server tvheadend685657: 2021-03-01 23:46:14.352 [ INFO] subscription: 0017: "DVR: Young Sheldon" subscribing on channel "ProSieben", weight: 500, adapter: "TurboSight TBS 6902 DVB-S/S2 #1 : DVB-S #0", network: "DVB-S Network", mux: "12544.75H", provider: "ProSiebenSat.1", service: "ProSieben", profile="matroska", username="martina"
Mar 1 23:46:44 server tvheadend685657: dvr: unable to create unique name (missing $n in format string?)
Mar 1 23:46:44 server tvheadend685657: 2021-03-01 23:46:44.034 [ ERROR] dvr: unable to create unique name (missing $n in format string?)
Mar 1 23:46:44 server tvheadend685657: 2021-03-01 23:46:44.034 [ ERROR] dvr: Recording error: "Young Sheldon": Unable to create file
Mar 1 23:46:44 server tvheadend685657: 2021-03-01 23:46:44.034 [ INFO] subscription: 0017: "DVR: Young Sheldon" unsubscribing from "ProSieben", username="martina"
Mar 1 23:46:44 server tvheadend685657: 2021-03-01 23:46:44.034 [ INFO] dvr: "Young Sheldon" on "ProSieben": End of program: File not created
Mar 1 23:46:44 server tvheadend685657: dvr: Recording error: "Young Sheldon": Unable to create file
Mar 1 23:46:44 server tvheadend685657: subscription: 0017: "DVR: Young Sheldon" unsubscribing from "ProSieben", username="martina"
Mar 1 23:46:44 server tvheadend685657: dvr: "Young Sheldon" on "ProSieben": End of program: File not created
Mar 1 23:47:01 server root: tvheadend 93,3 % CPU
Mar 1 23:48:01 server root: tvheadend 100,0 % CPU
Updated by Flole Systems over 3 years ago
That issue is known. You need to figure out why it keeps looping if no file can be created.
Updated by Robert Heel over 3 years ago
IMHO there are two issues:
1) Something goes wrongs when the filename is generated (I will try to figure that out)
2) The "return -1;" in dvr_rec.c line 1194 (after "unable to create unique name" message) seems not to end the recording thread.
Updated by Robert Heel over 3 years ago
1) Now I got the error message "missing $n in format string?" - Yes, I missed the $n in the recording file name configuration. I have only the data configured. The series is send twice a day, so the second one failes.
2) I had to debug the content of *sp (streaming_queue_t) anyway.
while(run) {
sm = TAILQ_FIRST(&sq->sq_queue);
if(sm == NULL) {
tvh_cond_wait(&sq->sq_cond, &sq->sq_mutex);
continue;
}
sq_queue seems to be empty, but sq_cond does not wait
How can I debug *sq in gdb, or how can I write a log with the content of *sq ?
Updated by Marv Teich over 3 years ago
Same Problem here with tvheadend: version 4.3-1923~gaaca05cc1
Any suggestions on how i can support you with informations? I'm not familiar with the gdb and strace things.
Updated by Flole Systems over 3 years ago
No need to provide additional information. The cause seems pretty clear (Tvheadend being unable to create the recording file) and it's kinda a misconfiguration that causes it. While Tvheadend shouldn't utilize the entire CPU in that case it needs further investigating to see why it's not sleeping/constantly waking up.
Updated by Marv Teich over 3 years ago
Hey guys,
just to let you know. As already mentioned i had the same problem. I now noticed that this issue appears everytime when there's an "ß" in the name of the show. I'm saving onto a NAS via NFS.
I looked through the recordingsettings and noticed that the charset is set to "ASCII". I changed it to "UTF-8" and tested with a record of a show with a "ß" in the name and it worked without any issues anymore.
Maybe this helps someone.
Updated by Flole Systems over 3 years ago
In the end it's the same issue: Tvheadend is unable to create the file and then causes that issue. It's kinda both, configuration and Tvheadend issue. You should not configure settings that don't work and Tvheadend should not go completely crazy if you do.