Project

General

Profile

Bug #5492

Crash after mutex magic-error

Added by Flole Systems almost 6 years ago. Updated over 5 years ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2019-01-11
Due date:
% Done:

100%

Estimated time:
Found in version:
4.3-1718~g8e0dd2bee
Affected Versions:

Description

I think I just had a mutex magic-error which caused a crash:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
120     ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory
[Current thread is 1 (Thread 0x7fb9c5eeb700 (LWP 19048))]
(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1  0x00007fb9d276d4d3 in _IO_vfprintf_internal (s=s@entry=0x7fb9c5eea2e0, format=format@entry=0x55a5fc5aa5a0 "mutex %p locked in: %s:%i (thread %ld)\n",
    ap=ap@entry=0x7fb9c5eea488) at vfprintf.c:1643
#2  0x00007fb9d2798910 in _IO_vsnprintf (string=0x7fb9c5eea4a0 "mutex 0x55a600068640 locked in: tsdemux.c:265)\n", maxlen=<optimized out>,
    format=0x55a5fc5aa5a0 "mutex %p locked in: %s:%i (thread %ld)\n", args=0x7fb9c5eea488) at vsnprintf.c:114
#3  0x000055a5fb5fe591 in htsbuf_vqprintf (hq=0x7fb9c5eea640, fmt=0x55a5fc5aa5a0 "mutex %p locked in: %s:%i (thread %ld)\n", ap0=0x7fb9c5eea530)
    at src/htsbuf.c:258
#4  0x000055a5fb5fe7bc in htsbuf_qprintf (hq=0x7fb9c5eea640, fmt=0x55a5fc5aa5a0 "mutex %p locked in: %s:%i (thread %ld)\n") at src/htsbuf.c:301
#5  0x000055a5fb5aa873 in tvh_thread_mutex_failed (mutex=0x55a600068640, reason=0x55a5fc5aa4df "magic",
    filename=0x55a5fc676b80 "src/input/mpegts/tsdemux.c", lineno=265) at src/tvh_thread.c:480
#6  0x000055a5fb5a9ca1 in tvh_mutex_check_magic (mutex=0x55a600068640, filename=0x55a5fc676b80 "src/input/mpegts/tsdemux.c", lineno=265)
    at src/tvh_thread.c:148
#7  0x000055a5fb5aa15b in tvh__mutex_lock (mutex=0x55a600068640, filename=0x55a5fc676b80 "src/input/mpegts/tsdemux.c", lineno=265) at src/tvh_thread.c:252
#8  0x000055a5fb6bbc68 in ts_recv_raw (t=0x55a600068450, tspos=20829924172, tsb=0x55a60019e179 "G@\021\025", len=188) at src/input/mpegts/tsdemux.c:265
#9  0x000055a5fb6b84ec in mpegts_input_process (mi=0x55a600571b70, mpkt=0x55a60019d420) at src/input/mpegts/mpegts_input.c:1519
#10 0x000055a5fb6b91de in mpegts_input_thread (p=0x55a600571b70) at src/input/mpegts/mpegts_input.c:1746
#11 0x000055a5fb5a9ac8 in thread_wrapper (p=0x55a6006461f0) at src/tvh_thread.c:91
#12 0x00007fb9d3dad6db in start_thread (arg=0x7fb9c5eeb700) at pthread_create.c:463
#13 0x00007fb9d283188f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Also if the magic check really failed, is it possible to look what's written into the mutex? Maybe it becomes more clear where the memory corruption is coming from.

History

#1

Updated by Jaroslav Kysela almost 6 years ago

This is exactly what the code which crashes it trying to do. You may inspect the structures though gdb in this case using the core dump or run tvh inside gdb.

#2

Updated by Flole Systems almost 6 years ago

It is not supposed to crash in vsnprintf.c, is it? I would expect it to log something and then crash but not crash while trying to write "mutex %p locked in: %s:%i (thread %ld)\n" somewhere.

Any Idea what I might be looking at here?

(gdb) x/fs 0x55a600068640
0x55a600068640: "\354\205ì\300\025\367\332H6*Y\353\230\371\201dr\200ǒ\273H\314\033\346\314\340\t\037Oj&\375\305\035\250\026\344<mp/\263\265\031َV\235\310BF\004\341\275J\315\364\024\375\023_݅\031\b\017/\177G]\023\264P)\275\020G5DJp!,\030\005\a\312\321f\277i\317Z\340\354\345)\231'N1w{W\216Km\275\360\324[L\242\030\213\262oH\202\063CfP" 

#3

Updated by Jaroslav Kysela almost 6 years ago

It's better to go to frame 8 and inspect the real problem (why the mutex is invalid). The printf receives probably a string pointer which points outside the valid memory range, thus it crashes.

#4

Updated by Flole Systems almost 6 years ago

The printf receives a string pointer which is outside the valid memory range because you are not reacting in tvh_thread_mutex_failed to the reason: magic properly: From that point on you know that the mutex has been corrupted (otherwise the magic check wouldn't fail), you can not try to get information from the corrupted mutex then, instead I would suggest simply writing "mutex was corrupted, this should never happen! Please report this as a bug." to a log and then crashing (so even the average user knows what happens and can post an issue here).

With the

(gdb) x/fs 0x55a600068640
0x55a600068640: "\354\205ì\300\025\367\332H6*Y\353\230\371\201dr\200ǒ\273H\314\033\346\314\340\t\037Oj&\375\305\035\250\026\344<mp/\263\265\031َV\235\310BF\004\341\275J\315\364\024\375\023_݅\031\b\017/\177G]\023\264P)\275\020G5DJp!,\030\005\a\312\321f\277i\317Z\340\354\345)\231'N1w{W\216Km\275\360\324[L\242\030\213\262oH\202\063CfP" 

I am already looking at the mutex, so now I need to know what exactly that is that is in the memory region where the mutex is supposed to be at this point. It could of course also be that t got corrupted and therefor t->s_stream_mutex aswell but as I do not know how it should look like I do not know if it has been corrupted or not.

#5

Updated by Jaroslav Kysela almost 6 years ago

The mutex is invalid, so you need to inspect the higher structures (using the lines from the backtrace in the issue description):

gdb) frame 8               # rollback to stack at src/input/mpegts/tsdemux.c:265
gdb) print *t              # print the mpegts_service_t structure contents
#6

Updated by Flole Systems almost 6 years ago

Alright, i think it corrupted a little more than just the mutex there:

(gdb) print *t
$1 = {{s_id = {in_uuid = {bin = "\267\321ì{\374\230V\213\071@\377j\030\350J"}, in_link = {left = 0x8897f0125293c33, right = 0x72bb4b0f522dc814, parent = 0x4577ef26dc7b816d, color = -1088868606}, in_domain_link = {left = 0x11d6c1f3964f4335, right = 0x5217fb04c7f55982, parent = 0xa60ff1b52e12f4ae, color = -1274100400}, in_domain = 0xe36d3c9497d5d646, in_class = 0x7000000000000f0, in_access = 0x128917c083004c03, in_save = 0xa3502ad945d0ca28}, s_all_link = {tqe_next = 0x4ff9a80952c681d8, tqe_prev = 0xee1a8fe081607bfd}, s_status = (SERVICE_RUNNING | SERVICE_ZOMBIE | unknown: 1351412796), s_refcount = 1346580412, s_type = (STYPE_RAW_REMOVED | unknown: 1998324272), s_source_type = (S_MPEG_PS | S_OTHER | unknown: 1082087108), s_servicetype = 2126421834, s_enabled = 254110891, s_verified = -1606877480, s_auto = -536440540, s_prio = 206113048, s_type_user = -684286477, s_pts_shift = 659527190, s_active_link = {le_next = 0x63289ebfe1a449e4, le_prev = 0xa5076748f13d4ecc}, s_subscriptions = {lh_first = 0xfac43aeea859cf5d}, s_is_enabled = 0xd4d6db40db74680c, s_enlist = 0xb804fa2fd14acb0a, s_start_feed = 0xbedc439a92844f67, s_refresh_feed = 0x7afafd65e5a3ed49, s_stop_feed = 0x41558d113bd5de59, s_config_save = 0x97db02041e42cabd, s_setsourceinfo = 0x6b459027d07aa6e1, s_grace_period = 0xc31ffab005e0bc2, s_delete = 0x226fb14db6354876, s_unref = 0xdebcef7bcd7f0fc0, s_pid_list = 0xc363a0e28665248a, s_satip_source = 0xcd010260e24c8a34, s_memoryinfo = 0x7e3ea02824e14aff, s_unseen = 0x4dbeb7fca474d90, s_channel_number = 0x6264ff71d778040f, s_channel_name = 0x1090811a7ae185, s_source = 0xa681030000000000, s_channel_epgid = 0x8d5f2bee5d200c2b, s_channel_tags = 0xa53f67eed71bc257, s_provider_name = 0x394d21e2db69f495, s_channel_icon = 0x54f97cba6154a22c, s_mapped = 0xafcee77b17683d1b, s_nicename = 0x9fcfe7f3f97c3e9f <error: Cannot access memory at address 0x9fcfe7f3f97c3e9f>, s_nicename_prefidx = -201753538, s_tt_commercial_advice = (COMMERCIAL_YES | COMMERCIAL_NO | unknown: 1050660836), s_tt_clock = 8952768824557697404, s_channels = {lh_first = 0xbdbb18a713e8f3f9}, s_sm_onqueue = -1239038037, s_ps_onqueue = 761985246, s_ps_link = {tqe_next = 0x41a437d9450c51a6, tqe_prev = 0x148e12642bb3a119}, s_receive_timer = {mti_link = {le_next = 0x2804e0c3153a4eba, le_prev = 0xe2089c385d4e58ad}, mti_magic1 = 3274517595, mti_callback = 0xab991641ca9ec430, mti_opaque = 0xfe8eab53edf99fd6, mti_expire = -8304337808350014708}, s_timeout = -1347012676, s_grace_delay = -2030189328, s_start_time = 3465222498043998977, s_stream_mutex = {mutex = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 751011308, Robust = No, Shared = No, Protocol = Priority inherit}, magic1 = 3006230637, tid = -4764522230966608554, filename = 0xdd5f13fd14f4cd4a <error: Cannot access memory at address 0xdd5f13fd14f4cd4a>, lineno = 252189061, tstamp = 3839055611984655379, waiters = {lh_first = 0x705182c21704a44}, link = {tqe_next = 0xe05acf69bf66d1ca, tqe_prev = 0x77314e279929e5ec}, magic2 = 1267619707}, s_streaming_status = 413289563, s_streaming_live = 1215279755, s_running = 1715680130, s_pending_restart = 80, s_dvb_forcecaid = 0, s_descramblers = {lh_first = 0xa5de01596f2c514f}, s_scrambled_seen = 15 '\017', s_scrambled_pass = 136 '\210', s_descramble = 0xdf40fd3f452d15f0, s_descrambler = 0xa4d44f8e087487fa, s_descramble_info = 0x27efb70197838349, s_components = {set_all = {tqh_first = 0x983b75241961ba9, tqh_last = 0xe0e00c0001ee2fb0}, set_filter = {tqh_first = 0xf14765bdbba2067a, tqh_last = 0x9d203c882700de98}, set_pmt_pid = 12906, set_pcr_pid = 46489, set_service_id = 53963, set_subsys = -308630774, set_nicename = 0xc8af6493e161f21f <error: Cannot access memory at address 0xc8af6493e161f21f>, set_service = 0x1ad0bb32c0b4195e, set_last_pid = 34785, set_last_es = 0x4050653390efb3b4}, s_streaming_pad = {sp_targets = {lh_first = 0x140862e7a6118e56}, sp_ntargets = -1218133845, sp_reject_filter = -864484105}, s_tei_log = {last = 8871964237665079435, count = 15127680404592788718}, s_lcns = {lh_first = 0x39fe9a8bb62c09d}, s_hbbtv = 0xac64ff48c00a3cba}, s_update_pids = 0xff90f884009f20dd, s_link = 0x77f248c0c5414083, s_unlink = 0x1c10518e08a68eda, s_dvb_subscription_flags = 1487052526, s_dvb_subscription_weight = -1123831074, s_pids = 0x5c7cc2e129101f4a, s_masters = {is_array = 0x81fa7f8a5a2ae0dd, is_alloc = 32959, is_count = 75611494798474752, is_sorted = 174 '\256'}, s_slaves = {is_array = 0x2f22859a5a4f3f49, is_alloc = 10267091284095922826, is_count = 14932421766499655051, is_sorted = 26 '\032'}, s_slaves_pids = 0xc8d6fd49d976077f, s_dvb_channel_num = 903652653, s_dvb_channel_minor = 14376, s_dvb_channel_dtag = 83 'S', s_dvb_svcname = 0xfcb6844e0072b953 <error: Cannot access memory at address 0xfcb6844e0072b953>, s_dvb_provider = 0x97c0eb9595aa9837 <error: Cannot access memory at address 0x97c0eb9595aa9837>, s_dvb_cridauth = 0x8582d0f4a852ffee <error: Cannot access memory at address 0x8582d0f4a852ffee>, s_dvb_servicetype = 18640, s_dvb_ignore_eit = -1059459688, s_dvb_charset = 0xf8968c07fc396e1a <error: Cannot access memory at address 0xf8968c07fc396e1a>, s_dvb_prefcapid = 8070, s_dvb_prefcapid_lock = -299213203, s_dvb_created = 8790250450600574959, s_dvb_last_seen = -496721123882617122, s_dvb_check_seen = 5926914129195786226, s_dvb_eit_enable = -1733629967, s_dvb_opentv_chnum = 10003236624828515327, s_dvb_opentv_id = 15608, s_atsc_source_id = 26475, s_dvb_mux_link = {le_next = 0x8ae3c0a804806eec, le_prev = 0xc0dfdc5184bad8d5}, s_dvb_mux = 0x9f65c377634afa22, s_dvb_active_input = 0xafc701946b8a1bfa, s_tsbuf = {sb_data = 0x18c223a6d44eb4db <error: Cannot access memory at address 0x18c223a6d44eb4db>, sb_ptr = -1128255409, sb_size = -1655963153, sb_err = 55197, sb_bswap = 169 '\251'}, s_tsbuf_last = -499265956088638113, s_pcr_drift = 8447899027268900637, s_cat_opened = 158 '\236', s_pmt_mon = 0xca5100f301000000, s_cat_mon = 0x7d}

I just received another one at the exact same location:

(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1  0x00007f8bff1894d3 in _IO_vfprintf_internal (s=s@entry=0x7f8bf23032e0,
    format=format@entry=0x560b66bf65a0 "mutex %p locked in: %s:%i (thread %ld)\n", ap=ap@entry=0x7f8bf2303488) at vfprintf.c:1643
#2  0x00007f8bff1b4910 in _IO_vsnprintf (
    string=0x7f8bf23034a0 "mutex 0x560b6b3e8780 locked in: tsdemux.c:265)\n",
    maxlen=<optimized out>,
    format=0x560b66bf65a0 "mutex %p locked in: %s:%i (thread %ld)\n",
    args=0x7f8bf2303488) at vsnprintf.c:114
#3  0x0000560b65c4a591 in htsbuf_vqprintf (hq=0x7f8bf2303640,
    fmt=0x560b66bf65a0 "mutex %p locked in: %s:%i (thread %ld)\n",
    ap0=0x7f8bf2303530) at src/htsbuf.c:258
#4  0x0000560b65c4a7bc in htsbuf_qprintf (hq=0x7f8bf2303640,
    fmt=0x560b66bf65a0 "mutex %p locked in: %s:%i (thread %ld)\n")
    at src/htsbuf.c:301
#5  0x0000560b65bf6873 in tvh_thread_mutex_failed (mutex=0x560b6b3e8780,
    reason=0x560b66bf64df "magic",
    filename=0x560b66cc2b80 "src/input/mpegts/tsdemux.c", lineno=265)
    at src/tvh_thread.c:480
#6  0x0000560b65bf5ca1 in tvh_mutex_check_magic (mutex=0x560b6b3e8780,
    filename=0x560b66cc2b80 "src/input/mpegts/tsdemux.c", lineno=265)
    at src/tvh_thread.c:148
#7  0x0000560b65bf615b in tvh__mutex_lock (mutex=0x560b6b3e8780,
    filename=0x560b66cc2b80 "src/input/mpegts/tsdemux.c", lineno=265)
    at src/tvh_thread.c:252
#8  0x0000560b65d07c68 in ts_recv_raw (t=0x560b6b3e8590, tspos=21436341108,
    tsb=0x560b6c16724d "G@\021\032", len=188) at src/input/mpegts/tsdemux.c:265
#9  0x0000560b65d044ec in mpegts_input_process (mi=0x560b6bc84470,
    mpkt=0x560b6c164270) at src/input/mpegts/mpegts_input.c:1519
#10 0x0000560b65d051de in mpegts_input_thread (p=0x560b6bc84470)
    at src/input/mpegts/mpegts_input.c:1746
#11 0x0000560b65bf5ac8 in thread_wrapper (p=0x560b6bd28420) at src/tvh_thread.c:91
#12 0x00007f8c007c96db in start_thread (arg=0x7f8bf2304700)
    at pthread_create.c:463
#13 0x00007f8bff24d88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) set pagination off
(gdb) frame 8
#8  0x0000560b65d07c68 in ts_recv_raw (t=0x560b6b3e8590, tspos=21436341108, tsb=0x560b6c16724d "G@\021\032", len=188) at src/input/mpegts/tsdemux.c:265
265       tvh_mutex_lock(&t->s_stream_mutex);
(gdb) print *t
$1 = {{s_id = {in_uuid = {bin = "\260\032\316i\vV\000\000`\n\362i\vV\000"}, in_link = {left = 0x0, right = 0x0, parent = 0x1525821180fe0f7f, color = 1946353136}, in_domain_link = {left = 0x362ff5f0f869df1, right = 0xf67d0154b20a0a74, parent = 0x7e28a852740ee9da, color = -703592352}, in_domain = 0x4fc3be97f8ef94ba, in_class = 0xdf8f08fc26f9099b, in_access = 0x550effb0b375a41, in_save = 0x8b8827816232298c}, s_all_link = {tqe_next = 0x3777a6142c8c76ac, tqe_prev = 0xd766dada1aa94408}, s_status = (SERVICE_ZOMBIE | unknown: 816179620), s_refcount = -136717567, s_type = (STYPE_RAW | unknown: 1821566948), s_source_type = (S_MPEG_PS | unknown: 4278998708), s_servicetype = -346051318, s_enabled = -519584778, s_verified = 249321452, s_auto = 1278182081, s_prio = 318992572, s_type_user = -1824954900, s_pts_shift = 1983673491, s_active_link = {le_next = 0x2b4609105d4488bd, le_prev = 0x9f4a46c246b4d103}, s_subscriptions = {lh_first = 0x6545d4a6bd87bc11}, s_is_enabled = 0x7fcf4f9c556bb19b, s_enlist = 0xc4db2b534106c283, s_start_feed = 0xcdf3d3a128c3ec58, s_refresh_feed = 0x55c392a6f4b46f1f, s_stop_feed = 0xb1656aa777ebcdad, s_config_save = 0x11d9a19049db2f00, s_setsourceinfo = 0x86c641e7afe86581, s_grace_period = 0xa37f371ebe0cc80c, s_delete = 0x8803d38761eeff87, s_unref = 0xd60af6b2991aa85d, s_pid_list = 0x5c5b915d595e00ca, s_satip_source = 0x77e2bf951f49b8c4, s_memoryinfo = 0xfe0f57a1cc11e8dd, s_unseen = 0x7cc08fb6ee58a9ca, s_channel_number = 0x34e0f681004969d8, s_channel_name = 0xdb812dc217ec1588, s_source = 0x3ca69607af9d1f19, s_channel_epgid = 0x8cc2823b2f564027, s_channel_tags = 0xd373030f5409d8fa, s_provider_name = 0xa75966571e69e931, s_channel_icon = 0x29a9d6c38da40597, s_mapped = 0x690f049548ee0280, s_nicename = 0x1e0698b791d49a0e <error: Cannot access memory at address 0x1e0698b791d49a0e>, s_nicename_prefidx = -426360444, s_tt_commercial_advice = (COMMERCIAL_NO | unknown: 3325056780), s_tt_clock = 8706009446930076134, s_channels = {lh_first = 0x5ec7abbd1ff19d49}, s_sm_onqueue = -333898461, s_ps_onqueue = -414165256, s_ps_link = {tqe_next = 0x7292a6e345b8fe70, tqe_prev = 0xb703489af05f0e09}, s_receive_timer = {mti_link = {le_next = 0x563ae3c0ef95f69c, le_prev = 0xba36d4e7ea2e6356}, mti_magic1 = 4172316394, mti_callback = 0x1a831d47c61ee050, mti_opaque = 0xf8a3bfead1836bca, mti_expire = -4643286756010577732}, s_timeout = 1978773879, s_grace_delay = 944813899, s_start_time = -5694461432956054224, s_stream_mutex = {mutex = pthread_mutex_t = {Type = Adaptive, Status = Acquired, possibly with waiters, Owner ID = 126151714 (dead), Robust = Yes, Shared = No, Protocol = Priority protect, Priority ceiling = -1379}, magic1 = 2446637583, tid = 6438146630930546822, filename = 0xbb1794762296d7 <error: Cannot access memory at address 0xbb1794762296d7>, lineno = -1899819390, tstamp = -8286160976956479065, waiters = {lh_first = 0x24528a4b3c131a80}, link = {tqe_next = 0x2872d1952c33a7c4, tqe_prev = 0x8ec132b28db01199}, magic2 = 3049348283}, s_streaming_status = -1643122082, s_streaming_live = 284239755, s_running = 1234702203, s_pending_restart = -383118787, s_dvb_forcecaid = 1600, s_descramblers = {lh_first = 0xe06e0cb9bd8b50e1}, s_scrambled_seen = 209 '\321', s_scrambled_pass = 21 '\025', s_descramble = 0x7947b0c261906d7d, s_descrambler = 0xe7ff2765a9881f3, s_descramble_info = 0xee6a67e4d0370a01, s_components = {set_all = {tqh_first = 0x619ce8e95edf56a6, tqh_last = 0x50b89af4c0215edd}, set_filter = {tqh_first = 0x4268099d3352065f, tqh_last = 0xee623048c8b4f304}, set_pmt_pid = 43567, set_pcr_pid = 52486, set_service_id = 37042, set_subsys = -262001669, set_nicename = 0x476a5bb3e4788c5c <error: Cannot access memory at address 0x476a5bb3e4788c5c>, set_service = 0xccc76fd13b88e21, set_last_pid = 54548, set_last_es = 0x23104822dea9f71d}, s_streaming_pad = {sp_targets = {lh_first = 0xc366150ebd9502e7}, sp_ntargets = -1028839355, sp_reject_filter = 57940094}, s_tei_log = {last = -3463219256616753410, count = 1163163164579473387}, s_lcns = {lh_first = 0x8a848ff1fa1d5b5b}, s_hbbtv = 0x422419921badb183}, s_update_pids = 0x146b15a771f72f6, s_link = 0xf2ef37e6bcdbba6f, s_unlink = 0x66ffd203167b8e40, s_dvb_subscription_flags = 1416508311, s_dvb_subscription_weight = 1891750308, s_pids = 0x8d0d57d484e44151, s_masters = {is_array = 0xf34306010a04607c, is_alloc = 3568576246908909009, is_count = 762837407003813613, is_sorted = 138 '\212'}, s_slaves = {is_array = 0x1819b9afb6be3dff, is_alloc = 212931547687522496, is_count = 5011618731290287702, is_sorted = 152 '\230'}, s_slaves_pids = 0x130362410500f8f1, s_dvb_channel_num = 3692030721, s_dvb_channel_minor = 32850, s_dvb_channel_dtag = 10 '\n', s_dvb_svcname = 0xf1e500411fcc1760 <error: Cannot access memory at address 0xf1e500411fcc1760>, s_dvb_provider = 0xfab2cef7d411eaf7 <error: Cannot access memory at address 0xfab2cef7d411eaf7>, s_dvb_cridauth = 0x21b05a257de89e20 <error: Cannot access memory at address 0x21b05a257de89e20>, s_dvb_servicetype = 50017, s_dvb_ignore_eit = 1974573550, s_dvb_charset = 0x33c820fc8f9dbd32 <error: Cannot access memory at address 0x33c820fc8f9dbd32>, s_dvb_prefcapid = 54523, s_dvb_prefcapid_lock = -261314733, s_dvb_created = 7371880113509142288, s_dvb_last_seen = 3108637607182612724, s_dvb_check_seen = 5190608349874739831, s_dvb_eit_enable = 489398714, s_dvb_opentv_chnum = 11439299058252548538, s_dvb_opentv_id = 59365, s_atsc_source_id = 32238, s_dvb_mux_link = {le_next = 0x9953424b9d7a2b5, le_prev = 0xdadc9aa277c0df4a}, s_dvb_mux = 0x356688023fb2300c, s_dvb_active_input = 0xe79c608c0922918f, s_tsbuf = {sb_data = 0xe149c3153d972a75 <error: Cannot access memory at address 0xe149c3153d972a75>, sb_ptr = 1485253010, sb_size = 1331290211, sb_err = 39595, sb_bswap = 56 '8'}, s_tsbuf_last = -2791923520388877994, s_pcr_drift = 44789113648963875, s_cat_opened = 0 '\000', s_pmt_mon = 0x21, s_cat_mon = 0x560b6a17caa0}

Can't we just look at what's written in the memory right before that to see what has to overflow to corrupt memory there? Of course it wouldn't help if something is freed and used afterwards though.

#7

Updated by Jaroslav Kysela almost 6 years ago

So the problem is also in mpegts_input_process() - the service structure is wrong in frame 8.

gdb) frame 9
gdb) print *mps

Perhaps also frame 10 (mpegts_input_thread):

gdb) frame 10
gdb) print *mp
gdb) print *mp->mp_mux
#8

Updated by Flole Systems almost 6 years ago

Alright, this is from the first crash:

(gdb) frame 9
#9  0x000055a5fb6b84ec in mpegts_input_process (mi=0x55a600571b70, mpkt=0x55a60019d420) at src/input/mpegts/mpegts_input.c:1519
1519              ts_recv_raw((mpegts_service_t *)mps->mps_owner, tspos, tsb, llen);
(gdb) print *mps
$4 = {mps_link = {left = 0x0, right = 0x0, parent = 0x55a5ff8e6b60, color = 1}, mps_raw_link = {le_next = 0x0, le_prev = 0x55a5ff642080}, mps_svcraw_link = {le_next = 0x0, le_prev = 0x0}, mps_type = 2, mps_weight = 400, mps_owner = 0x55a600068450}
(gdb) frame 10
#10 0x000055a5fb6b91de in mpegts_input_thread (p=0x55a600571b70) at src/input/mpegts/mpegts_input.c:1746
1746        bytes += mpegts_input_process(mi, mp);
(gdb) print *mp
$5 = {mp_link = {tqe_next = 0x0, tqe_prev = 0x55a600571d78}, mp_len = 21056, mp_mux = 0x55a5fd665500, mp_cc_restart = 0 '\000', mp_data = 0x55a60019d441 "G\031\361\336KU9\264\365\207\244\227q/\317A\332\004\344\344\070\313\063\254t\206^\346\341X\b\377\006\362\177\062B\250I\321\016\322\020\036\257j\377\264\220\a\306\070FU\004\365G_%^\022\335Tq\246\016\332\341\254\"\230\v\213O\362\200\247\027R\205\273\272\270\300L5q2\215\274\242\205E\370\204\022\"\226\017\261\225nW`\217#\vs\366\261a\004\351F\224\353\037\070\f\\\025\375\247_6\232\261\323\347\334zj\277\027\325\t\351}yfuGӢ\025n\243\062\n\017\032\227\021\343\346k?\232\062\352\251Uz\202\017o\306P05\260v0\223\351\no*\230!ӻ.\035\362G\031\361\337n\253\203\222yT\227\253"...}
(gdb) print *mp->mp_mux
$6 = {mm_id = {in_uuid = {bin = "5\274\215\371\020J\032\344\376\031\016\245\250\361E\274"}, in_link = {left = 0x55a5fd51ec40, right = 0x55a5fd5afd70, parent = 0x55a5fd1aeda0, color = 0}, in_domain_link = {left = 0x55a5fd380b70, right = 0x55a5fd6c7430, parent = 0x55a5fd16b790, color = 0}, in_domain = 0x55a5fd182af8, in_class = 0x55a5fcb35e80 <dvb_mux_dvbc_class>, in_access = 0x0, in_save = 0x0}, mm_refcount = 298, mm_network_link = {le_next = 0x55a5fd661b10, le_prev = 0x55a5fd669a48}, mm_network = 0x55a5fd162810, mm_nicename = 0x55a5fd664df0 "346MHz in VodafoneKabel", mm_provider_network_name = 0x55a5ffabcc90 "", mm_onid = 61441, mm_tsid = 10015, mm_tsid_checks = -100, mm_tsid_accept_zero_value = 0, mm_tsid_loglimit = {last = 0, count = 0}, mm_start_monoclock = 7450694915970, mm_created = 1516975005, mm_update_pids_flag = 0, mm_update_pids_timer = {mti_link = {le_next = 0x55a600572090, le_prev = 0x55a5fcc1c3a0 <mtimers>}, mti_magic1 = 224569822, mti_callback = 0x0, mti_opaque = 0x55a5fd665500, mti_expire = 7465947773935}, mm_services = {lh_first = 0x55a5fd668eb0}, mm_scan_first = 1516975014, mm_scan_last_seen = 1547084983, mm_scan_result = MM_SCAN_OK, mm_scan_weight = 0, mm_scan_flags = 0, mm_scan_init = 0, mm_scan_timeout = {mti_link = {le_next = 0x55a60062b7d0, le_prev = 0x55a6006375b0}, mti_magic1 = 224569822, mti_callback = 0x0, mti_opaque = 0x55a5fd665500, mti_expire = 7301580994112}, mm_scan_link = {tqe_next = 0x55a5fd48dec0, tqe_prev = 0x55a5fd1628c8}, mm_scan_state = MM_SCAN_STATE_IDLE, mm_dmc_origin = 0x0, mm_dmc_origin_expire = 0, mm_fastscan_muxes = 0x0, mm_instances = {lh_first = 0x55a6006658c0}, mm_active = 0x55a6005e9100, mm_transports = {lh_first = 0x55a5ff1f2c20}, mm_raw_subs = {lh_first = 0x55a61d11efa0}, mm_input_pos = 20829920788, mm_pids = {first = 0x55a5fdecf040, last = 0x55a5fe341c60, root = 0x55a5ffa02770, entries = 21}, mm_all_subs = {lh_first = 0x0}, mm_last_pid = 17, mm_last_mp = 0x55a5ff8c3d20, mm_num_tables = 13, mm_tables = {lh_first = 0x55a5fdae0d10}, mm_defer_tables = {tqh_first = 0x0, tqh_last = 0x55a5fd665700}, mm_tables_lock = {mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}, magic1 = 3614061450, tid = 19048, filename = 0x0, lineno = 0, tstamp = 7465947914909, waiters = {lh_first = 0x0}, link = {tqe_next = 0x0, tqe_prev = 0x0}, magic2 = 4181353505}, mm_table_queue = {tqh_first = 0x0, tqh_last = 0x55a5fd665780}, mm_descrambler_caids = {lh_first = 0x0}, mm_descrambler_tables = {tqh_first = 0x55a5ff241a50, tqh_last = 0x55a5ff241a50}, mm_descrambler_emms = {tqh_first = 0x0, tqh_last = 0x55a5fd6657a8}, mm_descrambler_lock = {mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}, magic1 = 3614061450, tid = 19048, filename = 0x0, lineno = 0, tstamp = 7465947854908, waiters = {lh_first = 0x0}, link = {tqe_next = 0x0, tqe_prev = 0x0}, magic2 = 4181353505}, mm_descrambler_flush = 0, mm_delete = 0x55a5fb6e4d26 <dvb_mux_delete>, mm_free = 0x55a5fb6c3cbe <mpegts_mux_free>, mm_config_save = 0x55a5fb6e4a2e <dvb_mux_config_save>, mm_display_name = 0x55a5fb6e4b2f <dvb_mux_display_name>, mm_is_enabled = 0x55a5fb6c3e47 <mpegts_mux_is_enabled>, mm_stop = 0x55a5fb6c3f9c <mpegts_mux_stop>, mm_open_table = 0x55a5fb6c62bd <mpegts_mux_open_table>, mm_unsubscribe_table = 0x55a5fb6c664c <mpegts_mux_unsubscribe_table>, mm_close_table = 0x55a5fb6c682e <mpegts_mux_close_table>, mm_create_instances = 0x55a5fb6c3e78 <mpegts_mux_create_instances>, mm_is_epg = 0x55a5fb6c3e64 <mpegts_mux_is_epg>, mm_crid_authority = 0x0, mm_enabled = 1, mm_epg = 101, mm_epg_module_id = 0x55a5fd6651f0 "eit", mm_charset = 0x0, mm_pmt_ac3 = 0, mm_eit_tsid_nocheck = 0, mm_sid_filter = 0}

and this is from the second

(gdb) frame 9
#9  0x0000560b65d044ec in mpegts_input_process (mi=0x560b6bc84470, mpkt=0x560b6c164270) at src/input/mpegts/mpegts_input.c:1519
1519              ts_recv_raw((mpegts_service_t *)mps->mps_owner, tspos, tsb, llen);
(gdb) print *mps
$1 = {mps_link = {left = 0x0, right = 0x0, parent = 0x560b69a08f60, color = 1}, mps_raw_link = {le_next = 0x0, le_prev = 0x560b6bac9270}, mps_svcraw_link = {le_next = 0x0, le_prev = 0x0}, mps_type = 2, mps_weight = 400, mps_owner = 0x560b6b3e8590}
(gdb) frame 10
#10 0x0000560b65d051de in mpegts_input_thread (p=0x560b6bc84470) at src/input/mpegts/mpegts_input.c:1746
1746        bytes += mpegts_input_process(mi, mp);
(gdb) print *mp
$2 = {mp_link = {tqe_next = 0x0, tqe_prev = 0x560b6bc84678}, mp_len = 19740, mp_mux = 0x560b68f451e0, mp_cc_restart = 0 '\000', mp_data = 0x560b6c164291 "G\031\361\221\375O\n\356Oo\270\363\bG\373ψ\005vZQD\004\244@h\204\063\022l\265i\206\rH\353\260>%b\\\335\301\200\247\230\362b\367\230\246\016w\375+'\371\257\270\373\061\035\306#̿\252\233\350}2\274%W\253u\266\065{`*{\330\327T.\265Y\020\264t贗\005\213\302\327`Ț\vc"}
(gdb) print *mp->mp_mux
$3 = {mm_id = {in_uuid = {bin = "5\274\215\371\020J\032\344\376\031\016\245\250\361E\274"}, in_link = {left = 0x560b68e706c0, right = 0x560b68f76230, parent = 0x560b68e796c0, color = 0}, in_domain_link = {left = 0x560b68c735a0, right = 0x560b68fb1060, parent = 0x560b68d858d0, color = 1}, in_domain = 0x560b68a65e48, in_class = 0x560b67181e80 <dvb_mux_dvbc_class>, in_access = 0x0, in_save = 0x0}, mm_refcount = 161, mm_network_link = {le_next = 0x560b68f41870, le_prev = 0x560b68f49748}, mm_network = 0x560b68a56570, mm_nicename = 0x560b690d0890 "346MHz in VodafoneKabel", mm_provider_network_name = 0x560b6b30abd0 "Kabel Deutschland", mm_onid = 61441, mm_tsid = 10015, mm_tsid_checks = -100, mm_tsid_accept_zero_value = 0, mm_tsid_loglimit = {last = 0, count = 0}, mm_start_monoclock = 7535401352293, mm_created = 1516975005, mm_update_pids_flag = 0, mm_update_pids_timer = {mti_link = {le_next = 0x560b67269220 <service_raw_remove_timer>, le_prev = 0x560b672683a0 <mtimers>}, mti_magic1 = 224569822, mti_callback = 0x0, mti_opaque = 0x560b68f451e0, mti_expire = 7551086134662}, mm_services = {lh_first = 0x560b68f48ae0}, mm_scan_first = 1516975014, mm_scan_last_seen = 1547249429, mm_scan_result = MM_SCAN_OK, mm_scan_weight = 0, mm_scan_flags = 0, mm_scan_init = 0, mm_scan_timeout = {mti_link = {le_next = 0x560b68c09130, le_prev = 0x560b6bcf2b40}, mti_magic1 = 224569822, mti_callback = 0x0, mti_opaque = 0x560b68f451e0, mti_expire = 7466029005994}, mm_scan_link = {tqe_next = 0x0, tqe_prev = 0x560b68a56628}, mm_scan_state = MM_SCAN_STATE_IDLE, mm_dmc_origin = 0x0, mm_dmc_origin_expire = 0, mm_fastscan_muxes = 0x0, mm_instances = {lh_first = 0x560b6bced150}, mm_active = 0x560b6bced150, mm_transports = {lh_first = 0x560b68f46f70}, mm_raw_subs = {lh_first = 0x0}, mm_input_pos = 21436328888, mm_pids = {first = 0x560b6c564f10, last = 0x560b69dd7090, root = 0x560b68b44b30, entries = 11}, mm_all_subs = {lh_first = 0x0}, mm_last_pid = 17, mm_last_mp = 0x560b6bac9240, mm_num_tables = 13, mm_tables = {lh_first = 0x560b6b6d5b40}, mm_defer_tables = {tqh_first = 0x0, tqh_last = 0x560b68f453e0}, mm_tables_lock = {mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}, magic1 = 3614061450, tid = 17471, filename = 0x0, lineno = 0, tstamp = 7551088179222, waiters = {lh_first = 0x0}, link = {tqe_next = 0x0, tqe_prev = 0x0}, magic2 = 4181353505}, mm_table_queue = {tqh_first = 0x0, tqh_last = 0x560b68f45460}, mm_descrambler_caids = {lh_first = 0x0}, mm_descrambler_tables = {tqh_first = 0x560b6b33cf60, tqh_last = 0x560b6b33cf60}, mm_descrambler_emms = {tqh_first = 0x0, tqh_last = 0x560b68f45488}, mm_descrambler_lock = {mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}, magic1 = 3614061450, tid = 17471, filename = 0x0, lineno = 0, tstamp = 7551088179222, waiters = {lh_first = 0x0}, link = {tqe_next = 0x0, tqe_prev = 0x0}, magic2 = 4181353505}, mm_descrambler_flush = 0, mm_delete = 0x560b65d30d26 <dvb_mux_delete>, mm_free = 0x560b65d0fcbe <mpegts_mux_free>, mm_config_save = 0x560b65d30a2e <dvb_mux_config_save>, mm_display_name = 0x560b65d30b2f <dvb_mux_display_name>, mm_is_enabled = 0x560b65d0fe47 <mpegts_mux_is_enabled>, mm_stop = 0x560b65d0ff9c <mpegts_mux_stop>, mm_open_table = 0x560b65d122bd <mpegts_mux_open_table>, mm_unsubscribe_table = 0x560b65d1264c <mpegts_mux_unsubscribe_table>, mm_close_table = 0x560b65d1282e <mpegts_mux_close_table>, mm_create_instances = 0x560b65d0fe78 <mpegts_mux_create_instances>, mm_is_epg = 0x560b65d0fe64 <mpegts_mux_is_epg>, mm_crid_authority = 0x0, mm_enabled = 1, mm_epg = 101, mm_epg_module_id = 0x560b68f44ed0 "eit", mm_charset = 0x0, mm_pmt_ac3 = 0, mm_eit_tsid_nocheck = 0, mm_sid_filter = 0}
#9

Updated by Jaroslav Kysela almost 6 years ago

Ok, so the mps_owner pointer to the service is no longer valid. Could you also print variable mp from the frame 9 to see which PID subscription causes this?

#10

Updated by Flole Systems almost 6 years ago

Sure, first crash:

(gdb) frame 9
#9  0x000055a5fb6b84ec in mpegts_input_process (mi=0x55a600571b70, mpkt=0x55a60019d420) at src/input/mpegts/mpegts_input.c:1519
1519              ts_recv_raw((mpegts_service_t *)mps->mps_owner, tspos, tsb, llen);
(gdb) print mp
$1 = (mpegts_pid_t *) 0x55a5ff8c3d20
(gdb) print *mp
$2 = {mp_pid = 17, mp_type = 22, mp_cc = 6 '\006', mp_subs = {first = 0x55a5ff642060, last = 0x55a5fe0d4c60, root = 0x55a5ff8e6b60, entries = 4}, mp_raw_subs = {lh_first = 0x55a5ff642060}, mp_svc_subs = {lh_first = 0x0}, mp_link = {left = 0x0, right = 0x0, parent = 0x55a5fea4eaa0, color = 0}} 

second crash:
(gdb) frame 9
#9  0x0000560b65d044ec in mpegts_input_process (mi=0x560b6bc84470, mpkt=0x560b6c164270) at src/input/mpegts/mpegts_input.c:1519
1519              ts_recv_raw((mpegts_service_t *)mps->mps_owner, tspos, tsb, llen);
(gdb) print *mp
$1 = {mp_pid = 17, mp_type = 22, mp_cc = 11 '\v', mp_subs = {first = 0x560b69bc2bd0, last = 0x560b72987980, root = 0x560b69a08f60, entries = 3}, mp_raw_subs = {lh_first = 0x560b72987980}, mp_svc_subs = {lh_first = 0x0}, mp_link = {left = 0x0, right = 0x0, parent = 0x560b6aec6a70, color = 0}}

#11

Updated by Flole Systems almost 6 years ago

I just had another one and I think I am slowly starting to figure out whats exactly causing it:

At the time of the crash I had a recording, a HTSP client and a SAT-IP Client all watching the same channel. When the SAT-IP Client unsubscribed, the memory corruption occured. However, I can still not reproduce this reliably as this doesn't happen everytime the SAT-IP Client disconnects. Maybe there is a check missing if someone else is still subscribed somewhere?

#12

Updated by Flole Systems almost 6 years ago

Just had another one, same situation as yesterday: One recording, one Sat-IP Client and one htsp client all on the same channel, once i turned the sat-ip device off the crash occurred

#13

Updated by Flole Systems almost 6 years ago

Could you help me setup a watchpoint in gdb for this case? I will know what channel is being watched at the time of the crash so I will know what mutex will get corrupted, so I need something that normally doesn't change during locking/unlocking which i can use to set a watch on to know what exactly writes in that area. This will not detect if "free" is called on that region and it is reallocated afterwards, but I just hope that this is not the case here.

#14

Updated by Flole Systems almost 6 years ago

I'm finally able to (almost) reliably reproduce this, the steps necessary are (I recommend waiting 10+ seconds between them):
1. Use SAT-IP Client to Stream a Channel from tvheadend (For example a TV)
2. Use HTSP to watch the same channel
3. Use HTTP to watch the same channel
4. Turn of/change channel on SAT-IP Client
5. Crash!

@Jaroslav: Could you please try if you are able to reproduce this now aswell? I would really love to see this fixed, this has already caused so many issues for me and until I recently discovered this I was feeling like a i was chasing a phantom.

#15

Updated by Flole Systems almost 6 years ago

I just reproduced this with valgrind running and 1 SAT-IP Client (DVB Viewer Pro Demo) and 1 HTSP Client (I had to try it 2 times though in order to trigger it), and this is what valgrind outputs:

==22730== Thread 56 tvh:mi-main:
==22730== Invalid read of size 4
==22730==    at 0x6235FA0: pthread_mutex_lock (pthread_mutex_lock.c:65)
==22730==    by 0x3FCCEE: ts_recv_raw (tsdemux.c:265)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc3615a0 is 512 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x6235FBE: pthread_mutex_lock (pthread_mutex_lock.c:73)
==22730==    by 0x3FCCEE: ts_recv_raw (tsdemux.c:265)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc3615a0 is 512 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x623600D: pthread_mutex_lock (pthread_mutex_lock.c:78)
==22730==    by 0x3FCCEE: ts_recv_raw (tsdemux.c:265)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361590 is 496 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x623602A: pthread_mutex_lock (pthread_mutex_lock.c:79)
==22730==    by 0x3FCCEE: ts_recv_raw (tsdemux.c:265)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361598 is 504 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid write of size 4
==22730==    at 0x623603E: pthread_mutex_lock (pthread_mutex_lock.c:157)
==22730==    by 0x3FCCEE: ts_recv_raw (tsdemux.c:265)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361598 is 504 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x6236042: pthread_mutex_lock (pthread_mutex_lock.c:159)
==22730==    by 0x3FCCEE: ts_recv_raw (tsdemux.c:265)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc36159c is 508 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x3FC277: service_set_streaming_status_flags (service.h:462)
==22730==    by 0x3FCD1F: ts_recv_raw (tsdemux.c:266)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361600 is 608 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 8
==22730==    at 0x3FC255: idnode_set_empty (idnode.h:326)
==22730==    by 0x3FCD31: ts_recv_raw (tsdemux.c:267)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361718 is 888 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x3FC230: streaming_pad_probe_type (streaming.h:477)
==22730==    by 0x3FCDC5: ts_recv_raw (tsdemux.c:277)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc36169c is 764 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x3FC277: service_set_streaming_status_flags (service.h:462)
==22730==    by 0x3FCDF4: ts_recv_raw (tsdemux.c:281)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361600 is 608 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x3FCDF9: ts_recv_raw (tsdemux.c:282)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361604 is 612 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid write of size 4
==22730==    at 0x3FCE08: ts_recv_raw (tsdemux.c:282)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361604 is 612 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x62377A0: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:38)
==22730==    by 0x62377A0: pthread_mutex_unlock (pthread_mutex_unlock.c:345)
==22730==    by 0x3FC0E5: tvh_mutex_unlock (tvh_thread.h:133)
==22730==    by 0x3FCE1F: ts_recv_raw (tsdemux.c:285)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc3615a0 is 512 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid write of size 4
==22730==    at 0x62377BA: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:48)
==22730==    by 0x62377BA: pthread_mutex_unlock (pthread_mutex_unlock.c:345)
==22730==    by 0x3FC0E5: tvh_mutex_unlock (tvh_thread.h:133)
==22730==    by 0x3FCE1F: ts_recv_raw (tsdemux.c:285)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361598 is 504 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x62377C1: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:51)
==22730==    by 0x62377C1: pthread_mutex_unlock (pthread_mutex_unlock.c:345)
==22730==    by 0x3FC0E5: tvh_mutex_unlock (tvh_thread.h:133)
==22730==    by 0x3FCE1F: ts_recv_raw (tsdemux.c:285)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc36159c is 508 bytes inside an unallocated block of size 1,168 in arena "client" 
==22730==
==22730== Invalid read of size 4
==22730==    at 0x62377CB: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:54)
==22730==    by 0x62377CB: pthread_mutex_unlock (pthread_mutex_unlock.c:345)
==22730==    by 0x3FC0E5: tvh_mutex_unlock (tvh_thread.h:133)
==22730==    by 0x3FCE1F: ts_recv_raw (tsdemux.c:285)
==22730==    by 0x3F9592: mpegts_input_process (mpegts_input.c:1519)
==22730==    by 0x3FA284: mpegts_input_thread (mpegts_input.c:1746)
==22730==    by 0x2EAB27: thread_wrapper (tvh_thread.c:91)
==22730==    by 0x62336DA: start_thread (pthread_create.c:463)
==22730==    by 0x781188E: clone (clone.S:95)
==22730==  Address 0xc361590 is 496 bytes inside an unallocated block of size 1,168 in arena "client" 

This did not yet cause a crash, most likely because the data there hasn't been overwritten yet?
So it looks like when the SAT-IP Client unsubscribes, the mutex is free'd even though the HTSP client is still subscribed and using that?

#16

Updated by Flole Systems almost 6 years ago

I tried to use gdb's record feature and mozillas rr tool in order to record and replay what exactly is happening (setting a breakpoint and doing a reverse run up to that point), unfortunately those tools are unusable with tvheadend:

gdb record and continue causes "no matching process found" error, and rr causes tvheadends deadlock detector to detect this:

REASON: deadlock (src/tvh_thread.c:512)
mutex 0x55d55ad2c1c0 locked in: src/main.c:1224 (thread 3816)
mutex 0x55d55ad2c1c0   waiting in: src/service_mapper.c:369 (thread 3912)
mutex 0x55d55ad2c1c0   waiting in: src/input/mpegts/scanfile.c:914 (thread 3847)
mutex 0x55d55ad2c1c0   waiting in: src/idnode.c:1956 (thread 3837)
mutex 0x55d55ad2c1c0   waiting in: src/notify.c:103 (thread 3831)

So it seems like I am stuck for now, there's nothing I can think of to further debug this now. Valgrind is not telling me what free'd that memory (I assume that is what happened), and my few attempts to skip some lines using gdb that I thought could cause issues did not help at all.

#17

Updated by saen acro almost 6 years ago

@Flote Systems
Try with https://www.sonarqube.org/ to check source code for looping and dead-ends.

#18

Updated by Flole Systems almost 6 years ago

I highly doubt that static code analysis could find this, the issue is probably in shared memory between 2 threads (one thread free's the memory while the other one still uses it), which would be impossible to find by static analysis. Anyways, I'm running it right now, and will probably open a new issue soon with all the stuff it finds

#19

Updated by Jaroslav Kysela almost 6 years ago

Flole Systems wrote:

I tried to use gdb's record feature and mozillas rr tool in order to record and replay what exactly is happening (setting a breakpoint and doing a reverse run up to that point), unfortunately those tools are unusable with tvheadend:

gdb record and continue causes "no matching process found" error, and rr causes tvheadends deadlock detector to detect this:
[...]

Add 'export TVHEADEND_THREAD_WATCH_LIMIT=60' to your debug script (the value is in seconds).

#20

Updated by Jaroslav Kysela almost 6 years ago

The issue is because the service_t / mpegts_service_t is probably freed and reused. Because it's a runtime error, it would be probably good to trace the service free / creating calls - src/service.c - and the PID management (owner) in src/input/mpegts/mpegts_input.c . The PID open/close calls are already printed to traces (--trace mpegts), so it might be enough to verify, if all PIDs are closed.

#21

Updated by Flole Systems almost 6 years ago

I am not sure if I get what you mean: "verify if all PIDs are closed", so you think there is a PID still "open" even though it is free'd and when data for that PID arrives, it causes a mutex error as the mutex is no longer there? I was thinking about something like this: The mutex of both subscriptions is somehow destroyed because the other subscription is causing a free. The logs kinda suggest that:

Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - add raw service
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - tuning on SAT>IP DVB-C Tuner #1 
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - started
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0000 (0) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: subscription: 0045: "SAT>IP" subscribing to mux "330MHz", weight: 100, adapter: "SAT>IP DVB-C Tuner #1 ", network: "MainNetwork", service: "Raw
PID Subscription", hostname="10.0.0.2" 
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 0000 (0) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0012 (18) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0000 (0) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0011 (17) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13EC (5100) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13ED (5101) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13EE (5102) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13F0 (5104) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0000 (0) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0011 (17) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EC (5100) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13ED (5101) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EE (5102) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13F0 (5104) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0000 (0) [20/0x5604c841d910]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0001 (1) [16/0x5604c8420c70]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0010 (16) [20/0x5604c8422860]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0011 (17) [20/0x5604c84248c0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0011 (17) [16/0x5604c8426a80]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0001 (1) [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0000 (0) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0001 (1) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EC (5100) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13ED (5101) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EE (5102) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:51 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13F0 (5104) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 0001 (1) [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0000 (0) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EC (5100) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13ED (5101) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EE (5102) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13F0 (5104) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13EF (5103) [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0000 (0) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EC (5100) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13ED (5101) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EE (5102) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EF (5103) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13F0 (5104) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13F1 (5105) [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0000 (0) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EC (5100) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13ED (5101) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EE (5102) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EF (5103) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13F0 (5104) weight 400 [2/0x5604c76997d0]
Jan 24 15:06:52 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13F1 (5105) weight 400 [2/0x5604c76997d0]

Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - already active
Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13EC (5100) [8/0x5604c3d46cd0]
Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13ED (5101) [8/0x5604c3d46cd0]
Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13EE (5102) [8/0x5604c3d46cd0]
Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13EF (5103) [8/0x5604c3d46cd0]
Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13F0 (5104) [8/0x5604c3d46cd0]
Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13F1 (5105) [8/0x5604c3d46cd0]
Jan 24 15:07:03 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 13EC (5100) [16/0x5604c7568cd0]
Jan 24 15:07:03 tvheadend[9836]: subscription: 0046: "10.0.0.2 [ TestUser | Kodi Media Center ]" subscribing on channel "Das Erste HD", weight: 150, adapter: "SAT>IP DVB-C Tuner #1 ", network: "MainNetwork", mux: "330MHz", provider: "ARD", service: "Das Erste HD", profile="htsp", hostname="10.0.0.2", username="TestUser", client="Kodi Media Center" 
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 0000 (0) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 13ED (5101) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 13EE (5102) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 13EF (5103) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 13F0 (5104) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 13F1 (5105) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 13EC (5100) weight 400 [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 13EC (5100) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0012 (18) weight 400 [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 0012 (18) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - open PID 0000 (0) [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - update PID 0000 (0) weight 400 [2/0x5604c76997d0]
Jan 24 15:07:09 tvheadend[9836]: subscription: 0045: "SAT>IP" unsubscribing, hostname="10.0.0.2" 
Jan 24 15:07:09 tvheadend[9836]: mpegts: 330MHz in MainNetwork - close PID 0000 (0) [2/0x5604c76997d0]

Now looking at how it should be (HTTP and HTSP Scubscription), there is no closing of the PIDs until the lasz client exits:

Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - tuning on SAT>IP DVB-C Tuner #1 
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - started
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 13EC (5100) [8/0x56342bc53e30]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 13ED (5101) [8/0x56342bc53e30]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 13EE (5102) [8/0x56342bc53e30]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 13EF (5103) [8/0x56342bc53e30]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 13F0 (5104) [8/0x56342bc53e30]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 13F1 (5105) [8/0x56342bc53e30]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 13EC (5100) [16/0x56342f552d60]
Jan 24 15:11:01 tvheadend[15273]: subscription: 0046: "HTTP" subscribing on channel "Das Erste HD", weight: 100, adapter: "SAT>IP DVB-C Tuner #1", network: "MainNetwork", mux:"330MHz", provider: "ARD", service: "Das Erste HD", profile="NoTranscode", hostname="x.x.x.x", username="TestUser", client="VLC/3.0.6 LibVLC/3.0.6" 
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 0000 (0) [20/0x56342f4b0650]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 0001 (1) [16/0x56342f56de00]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 0010 (16) [20/0x56342f455d70]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 0011 (17) [20/0x56342f41a220]
Jan 24 15:11:01 tvheadend[15273]: mpegts: 330MHz in MainNetwork - open PID 0011 (17) [16/0x56342f4c7c20]
Jan 24 15:11:05 tvheadend[15273]: subscription: 0047: "10.0.0.2 [ TestUser | Kodi Media Center ]" subscribing on channel "Das Erste HD", weight: 150, adapter: "SAT>IP DVB-C Tuner #1 ", network: "MainNetwork", mux: "330MHz", provider: "ARD", service: "Das Erste HD", profile="htsp", hostname="10.0.0.2", username="TestUser", client="Kodi Media Center" 
Jan 24 15:11:19 tvheadend[15273]: subscription: 0046: "HTTP" unsubscribing from "Das Erste HD", hostname="10.0.0.2", username="TestUser", client="VLC/3.0.6 LibVLC/3.0.6" 
Jan 24 15:11:23 tvheadend[15273]: subscription: 0047: "10.0.0.2 [ TestUser | Kodi Media Center ]" unsubscribing from "Das Erste HD", hostname="10.0.0.2", username="TestUser", client="Kodi MediaCenter" 
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 13EC (5100) [16/0x56342f552d60]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 13EC (5100) [8/0x56342bc53e30]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 13ED (5101) [8/0x56342bc53e30]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 13EE (5102) [8/0x56342bc53e30]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 13EF (5103) [8/0x56342bc53e30]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 13F0 (5104) [8/0x56342bc53e30]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 13F1 (5105) [8/0x56342bc53e30]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - stopping mux
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 0000 (0) [20/0x56342f4b0650]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 0001 (1) [16/0x56342f56de00]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 0010 (16) [20/0x56342f455d70]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 0011 (17) [16/0x56342f4c7c20]
Jan 24 15:11:23 tvheadend[15273]: mpegts: 330MHz in MainNetwork - close PID 0011 (17) [20/0x56342f41a220]

So is it possible that unsubscription of the SAT-IP Client destroys the subscriptions from my HTSP client and by doing so causes the crash?

Unfortunately when running under rr, it is not happening, so not much help there :(

#22

Updated by Flole Systems almost 6 years ago

I think I am getting closer: While on the SAT-IP Client in the Subscription PIDs 0,1,16,17,18 are listed, those aren't listed in the HTSP Subscription. So what I tried next was using VLC to just subscribe to 5100,5101,5102 and then I am no longer getting that crash. However, if I add 0,1,16,17,18 to the VLC Subscription, I am experiencing the crash again. In both cases the subscription lists 0,1,16,17,18 though in the web interface, but with missing PIDs 0,1,16,17,18 VLC is not able to play the stream.

#23

Updated by Jaroslav Kysela almost 6 years ago

Each open must match with close (PID + owner):

open PID 0000 (0)  [2/0x5604c76997d0]
close PID 0000 (0) [2/0x5604c76997d0]
          ^^^^        ^^^^^^^^^^^^^^
          PID         owner
#24

Updated by Flole Systems almost 6 years ago

After close of PID 0 there is already the crash, so the others are not closed then because the crash occurs before that. It seems to have something to do with close of that PID 0.

Have you tried to reproduce this? As you know how the code should behave I would really appreciate it if you can have a look at it, the chance that you find the bug there is probably a lot higher than the chance that I'll find it (even though I think I am close).

#25

Updated by Flole Systems almost 6 years ago

Disregard my last message, I didn't read the logs right.

So here's what I found out, I am not sure if my conclusions are correct or not though: It seems like PID 17 is causing the issues: As you can see there is no deletion of it and it is opened by 2 different owners, but it never deletes it again. I double-checked this by removing PID 17 from my VLC Test Subscription and it is working perfectly fine then (of course VLC doesn't get the SDT then). So all that is left now is to try to figure out why exactly PID 17 is not deleted when it comes to the unsubscribe.

Normally this is not a problem as the mux deletion will clear them anyways, but when there is another client getting data this becomes a problem as it's trying to distribute the data then and tries to access an already free'd mutex? (I have absolutely no clue if this is how it works, but the basic idea should be correct)

#26

Updated by Oleg Antonov over 5 years ago

I think I am having the same issue (SAT-IP unsubscribe while another client is still subscribed causes crash). Has this been fixed yet? If not, is there anything I can provide to help?

#27

Updated by Flole Systems over 5 years ago

Unfortunately I am stuck debugging, I have figured out that PID 17 is causing those issues but I am unable to figure out why it's not unsubscribed properly, I suppose it has something to do with the flags ( the [20...] and [16....] in above logs, while 20 and 16 are the flags) but I am not sure at all. All I can see is that it is not even trying to unsubscribe from pid 17.....

@Jaroslav: If you have some time, could you take a look at this please? I am stuck here and can't figure out why it won't unsubscribe. I have tried looking into it using gdb and step through the unsubscription stuff but had no luck.

#28

Updated by Oleg Antonov over 5 years ago

Thanks for your answer. Sounds like this is a difficult problem and not something easy like I thought in the beginning. I hope we get a fix here soon :)

Big thanks to everyone working on this!!!! And also big thanks to the entire tvheadend team, this is the first real issue I am experiencing since quite a few months of usage, it is just great!

#29

Updated by Flole Systems over 5 years ago

I am in mpegts_input_close_pid right now in line 597 and unfortunately the RB_FIND doesn't find the entry, even though it is definitely there (there are 3 nodes, root and right of root is the searched one).
Root has type 20, left of root has type 16, right of root has type 2 (RAW, this is the one we are searching for).
Now we are at root with type 20 and mpegts_mps_cmp is called. 2 is obviously not 20, so we do 2 & 0x08 now, which is 0, so we return -1 here. Now the function to search in the tree interprets -1 as "go to the left", which is does. Unfortunately this is wrong and therefor the correct knot on the right is never found.

I made a few pretty sketchy drawings to try to understand what's wrong here and came up with this fix, however I have absolutely no idea if this will have any side-effects (so far I haven't noticed any but I will keep testing)

--- src/input/mpegts/mpegts_input.c
+++ src/input/mpegts/mpegts_input.c
@@ -465,7 +465,7 @@
 mpegts_mps_cmp ( mpegts_pid_sub_t *a, mpegts_pid_sub_t *b )
 {
   if (a->mps_type != b->mps_type) {
-    if (a->mps_type & MPS_SERVICE)
+    if (a->mps_type & (MPS_SERVICE | MPS_RAW))
       return 1;
     else
       return -1;

@Jaroslav: Would you consider this a proper fix or do you consider this patch "dangerous"? If a Type 2 entry is on the left it will no longer be found, but I am not sure if there is any valid scenario where a type 2 entry ends up on the left. I can share my sketchy drawings with you if you want, they show the 3 nodes with each address, type and owner, not sure if that would be any kind of help to you. I made them while using gdb right before a crash happened.

(If anybody wonders why this comment seems like there are multiple comments in one: I wrote them while troubleshooting, everytime I did something or found something I mentioned it here, so it's a few hours that this comment was written in)

#30

Updated by Jaroslav Kysela over 5 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset commit:tvheadend|57b766ab7e8ab3dbec2476cc269eaf8101d48b64.

#31

Updated by Jaroslav Kysela over 5 years ago

Thanks, I think that you hit the real culprit. Could you retest with latest? The above change should resolve this.

Also available in: Atom PDF