Project

General

Profile

Bug #5259

Bus Error at startup when compiled for ARMHF architecture

Added by Shaun Schembri over 6 years ago. Updated over 6 years ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
Crashes
Target version:
-
Start date:
2018-10-14
Due date:
% Done:

0%

Estimated time:
Found in version:
4.3-926
Affected Versions:

Description

Hi everyone,

I have been trying to crosscompile for my mediabox (KII Pro Amlogic S905 running Libreelec 8.2), the latest version from the master branch and while it successfully compiles, I do get a bus error immediately as I execute the binary. I have analysed via gdb the coredump created and the output point at this particular line the code.

Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `./tvheadend -c tvh'.
Program terminated with signal SIGBUS, Bus error.
#0 htsmsg_field_set_msg (sub=0xac8c22c8, f=<optimized out>, f=<optimized out>;) at src/htsmsg.c:604

Code in src/htsmsg.c:604 is m->hm_data_size = 0; which seems innocent enough.

I have eventually traced back that the last version that works for me is 4.3-925. The subsequent commit (https://github.com/tvheadend/tvheadend/commit/d35359fd828ea4011a6ae22e723269d2b04b8b31) broke tvheadend for me, with the above mentioned error.

Since my knowledge of C is almost non-existent, I was wondering if you guys are aware of this issue and what can be done to fix it. Are there any compiler flags I can use, or does the code needs to be modified to work for my box? I have already tried using both the toolchain built by LibreELEC's build process and the the cross-compiler supplied by Ubuntu 18.04.

Thanks for your help guys and keep up the awesome work.

History

#1

Updated by Luis Alves over 6 years ago

The line "m->hm_data_size = 0;" isn't that innocent if the location of the variable pointer is an unaligned memory address
You can try to change line 81 on htsmsg.h from "size_t hmf_edata_size;" to "size_t hmf_edata_size attribute((packed));"
If that fails, you could try the compiler flag "-mno-unaligned-access".

On arm, the linux kernel can usually deal with unaligned accesses by software emulation. Can you paste here the output of:
cat /proc/cpu/alignment

#2

Updated by Luis Alves over 6 years ago

The keywork "attribute" above should be enclosed by double "_" (forgot to use code formating)
size_t hmf_edata_size __attribute__((packed));

#3

Updated by Luis Alves over 6 years ago

Forgot to mention that you should be able to control behaviour with:
echo N > /proc/cpu/alignment

where N is:
0 - Do nothing (default behavior)
1 - Warning in kernel-log with PC and Memory-Address printed.
2 - Fixup error
3 - Warn and Fixup
4 - Send a SIGBUS to the process
5 - Send SIGBUS and output Warning

Using 2 should allow you to run tvh (although with some overhead to fix mem access)

More info @ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm/mem_alignment

Paste the output of the current alignment state and then try:
echo 2 > /proc/cpu/alignment

#4

Updated by Jaroslav Kysela over 6 years ago

Luis Alves wrote:

The line "m->hm_data_size = 0;" isn't that innocent if the location of the variable pointer is an unaligned memory address
You can try to change line 81 on htsmsg.h from "size_t hmf_edata_size;" to "size_t hmf_edata_size attribute((packed));"

The '__attribute__((packed))' forces the unaligned offsets for the variables!!! The compiler should align variables properly when this attribute is not used.

#5

Updated by Jaroslav Kysela over 6 years ago

It looks like a broken compiler or a bad compiler configuration. Could you print addresses of 'm' and 'm->hm_data_size' in gdb?

Run program in gdb, wait for the signal.
   (gdb) bt
   (gdb) frame 1           # might be different number - analyze 'bt'
   (gdb) print &m
   (gdb) print &m->hm_data_size
#6

Updated by Shaun Schembri over 6 years ago

Unfortunately under LibreELEC /proc/cpu/alignment is unavailable. I did look for it in other places but setting it or trying to cat from it fails. I guess its disabled when they build the kernel for it.

I tried Luis change and by adding the attribute you mentioned and the compiler flag it worked but I got the same error buserror 2 lines further down. Grrrrrrrr

But there is some light....I managed to get a working binary (at least it starts and doesn't crash at startup) by compiling using -O1 optimisations instead of -O2. I don't know how unoptimised the binary is, but tonight I try to pin-point which optimiser flag actually causes the problem. I think if I manage to get a binary with -O2 and exclude the problematic optimisation I should be good to go.

Thanks again

#7

Updated by Luis Alves over 6 years ago

Jaroslav Kysela wrote:

The '__attribute__((packed))' forces the unaligned offsets for the variables!!! The compiler should align variables properly when this attribute is not used.

Not true when casting an arbitrary data block to something else.
Example on line 144 (https://github.com/tvheadend/tvheadend/blob/master/src/htsmsg.c#L144)

f->hmf_msg = (htsmsg_t *)(f->hmf_name + nsize);

It's being assigned to the "msg" pointer something based on a size which sometimes will end up on an unaligned address...

Shaun,
As workaround, you can change the htsmsg.h "typedef struct htsmsg" to this: https://pastebin.com/2J8Shcf1
(add the packed attribute to all fields)

#9

Updated by Luis Alves over 6 years ago

Forgot to post my "hack"...
https://pastebin.com/YisGtCuq

It basically aligns the htsmsg_t pointer to the cpu word boundary.
Wastes some padding bytes but you don't have to force the compiler to do aligned accesses with the packed attribute and performance wise should be better for all architectures)

#10

Updated by Luis Alves over 6 years ago

ps: the above hack is not a fix (other parts of the code would need adjutments), just a proof of concept...

#11

Updated by Shaun Schembri over 6 years ago

This is the gdb output Jaroslav requested...hope its what you needed

(gdb) bt
#0 htsmsg_field_set_msg (sub=0xac5f8ce8, f=<optimized out>, f=<optimized out>;) at src/htsmsg.c:604
#1 0xaadc860a in json_parse_value (
s=0xac5fa510 " [\n\t\t\"login\",\n\t\t\"storage\",\n\t\t\"time\"\n\t],\n\t\"chname_num\": true,\n\t\"chname_src\": false,\n\t\"date_mask\": \"\",\n\t\"label_formatting\": false,\n\t\"language\": [\n\t],\n\t\"epg_cutwindow\": 300,\n\t\"epg_window\": 86400,\n\t\"prefe"..., parent=parent@entry=0xac5f8b18, name=name@entry=0xac5f8c48 "info_area", jd=jd@entry=0xabf4f9e0 <json_to_htsmsg>, opaque=opaque@entry=0x0,
failp=failp@entry=0xff9dee10, failmsg=failmsg@entry=0xff9dee0c) at src/misc/json.c:367
#2 0xaadc8848 in json_parse_map (s=<optimized out>,
s@entry=0xac5fa438 "{\n\t\"server_name\": \"Tvheadend\",\n\t\"version\": 24,\n\t\"full_version\": \"4.3-1468~g3f74523d2-dirty\",\n\t\"theme_ui\": \"blue\",\n\t\"ui_quicktips\": true,\n\t\"uilevel\": 0,\n\t\"uilevel_nochange\": false,\n\t\"caclient_ui\": fals"..., endp=endp@entry=0xff9dee08, jd=jd@entry=0xabf4f9e0 <json_to_htsmsg>, opaque=opaque@entry=0x0, failp=failp@entry=0xff9dee10,
failmsg=failmsg@entry=0xff9dee0c) at src/misc/json.c:194
#3 0xaadc8a06 in json_deserialize (
src=src@entry=0xac5fa438 "{\n\t\"server_name\": \"Tvheadend\",\n\t\"version\": 24,\n\t\"full_version\": \"4.3-1468~g3f74523d2-dirty\",\n\t\"theme_ui\": \"blue\",\n\t\"ui_quicktips\": true,\n\t\"uilevel\": 0,\n\t\"uilevel_nochange\": false,\n\t\"caclient_ui\": fals"..., jd=0xabf4f9e0 <json_to_htsmsg>, opaque=opaque@entry=0x0, errbuf=errbuf@entry=0x0, errlen=errlen@entry=0) at src/misc/json.c:415
#4 0xaadc6f12 in htsmsg_json_deserialize (
src=src@entry=0xac5fa438 "{\n\t\"server_name\": \"Tvheadend\",\n\t\"version\": 24,\n\t\"full_version\": \"4.3-1468~g3f74523d2-dirty\",\n\t\"theme_ui\": \"blue\",\n\t\"ui_quicktips\": true,\n\t\"uilevel\": 0,\n\t\"uilevel_nochange\": false,\n\t\"caclient_ui\": fals"...) at src/htsmsg_json.c:213
#5 0xaadc9536 in hts_settings_load_one (filename=filename@entry=0xff9dee8c "/tmp/tvh/config") at src/settings.c:257
#6 0xaadc9684 in hts_settings_load_path (fullpath=0xff9dee8c "/tmp/tvh/config", depth=0) at src/settings.c:314
#7 0xaadc96f6 in hts_settings_vload (pathfmt=0xaae92988 "config", pathfmt@entry=0xabf82668 "8\325\"\001", ap=..., ap@entry=..., depth=depth@entry=0) at src/settings.c:334
#8 0xaadc9a84 in hts_settings_load (pathfmt=0xaae92988 "config") at src/settings.c:357
#9 0xaadcee86 in config_boot (path=0xff9e17cf "tvh", gid=4294967295, uid=4294967295, http_user_agent=<optimized out>;) at src/config.c:1756
#10 0xaad93b74 in main (argc=<optimized out>, argv=<optimized out>;) at src/main.c:1089
(gdb) frame 0
#0 htsmsg_field_set_msg (sub=0xac5f8ce8, f=<optimized out>, f=<optimized out>;) at src/htsmsg.c:604
604 in src/htsmsg.c
(gdb) print &m
Address requested for identifier "m" which is in register $r4
(gdb) print &m->hm_data_size
$1 = (size_t *) 0xac5fbb4a

Now I am gonna try...Luis "hack" and report the outcome

#12

Updated by Shaun Schembri over 6 years ago

Sorry forgot to format

(gdb) bt
#0 htsmsg_field_set_msg (sub=0xac5f8ce8, f=<optimized out>, f=<optimized out>) at src/htsmsg.c:604
#1 0xaadc860a in json_parse_value (
s=0xac5fa510 " [\n\t\t\"login\",\n\t\t\"storage\",\n\t\t\"time\"\n\t],\n\t\"chname_num\": true,\n\t\"chname_src\": false,\n\t\"date_mask\": \"\",\n\t\"label_formatting\": false,\n\t\"language\": [\n\t],\n\t\"epg_cutwindow\": 300,\n\t\"epg_window\": 86400,\n\t\"prefe"..., parent=parent@entry=0xac5f8b18, name=name@entry=0xac5f8c48 "info_area", jd=jd@entry=0xabf4f9e0 <json_to_htsmsg>, opaque=opaque@entry=0x0,
failp=failp@entry=0xff9dee10, failmsg=failmsg@entry=0xff9dee0c) at src/misc/json.c:367
#2 0xaadc8848 in json_parse_map (s=<optimized out>,
s@entry=0xac5fa438 "{\n\t\"server_name\": \"Tvheadend\",\n\t\"version\": 24,\n\t\"full_version\": \"4.3-1468~g3f74523d2-dirty\",\n\t\"theme_ui\": \"blue\",\n\t\"ui_quicktips\": true,\n\t\"uilevel\": 0,\n\t\"uilevel_nochange\": false,\n\t\"caclient_ui\": fals"..., endp=endp@entry=0xff9dee08, jd=jd@entry=0xabf4f9e0 <json_to_htsmsg>, opaque=opaque@entry=0x0, failp=failp@entry=0xff9dee10,
failmsg=failmsg@entry=0xff9dee0c) at src/misc/json.c:194
#3 0xaadc8a06 in json_deserialize (
src=src@entry=0xac5fa438 "{\n\t\"server_name\": \"Tvheadend\",\n\t\"version\": 24,\n\t\"full_version\": \"4.3-1468~g3f74523d2-dirty\",\n\t\"theme_ui\": \"blue\",\n\t\"ui_quicktips\": true,\n\t\"uilevel\": 0,\n\t\"uilevel_nochange\": false,\n\t\"caclient_ui\": fals"..., jd=0xabf4f9e0 <json_to_htsmsg>, opaque=opaque@entry=0x0, errbuf=errbuf@entry=0x0, errlen=errlen@entry=0) at src/misc/json.c:415
#4 0xaadc6f12 in htsmsg_json_deserialize (
src=src@entry=0xac5fa438 "{\n\t\"server_name\": \"Tvheadend\",\n\t\"version\": 24,\n\t\"full_version\": \"4.3-1468~g3f74523d2-dirty\",\n\t\"theme_ui\": \"blue\",\n\t\"ui_quicktips\": true,\n\t\"uilevel\": 0,\n\t\"uilevel_nochange\": false,\n\t\"caclient_ui\": fals"...) at src/htsmsg_json.c:213
#5 0xaadc9536 in hts_settings_load_one (filename=filename@entry=0xff9dee8c "/tmp/tvh/config") at src/settings.c:257
#6 0xaadc9684 in hts_settings_load_path (fullpath=0xff9dee8c "/tmp/tvh/config", depth=0) at src/settings.c:314
#7 0xaadc96f6 in hts_settings_vload (pathfmt=0xaae92988 "config", pathfmt@entry=0xabf82668 "8\325\"\001", ap=..., ap@entry=..., depth=depth@entry=0) at src/settings.c:334
#8 0xaadc9a84 in hts_settings_load (pathfmt=0xaae92988 "config") at src/settings.c:357
#9 0xaadcee86 in config_boot (path=0xff9e17cf "tvh", gid=4294967295, uid=4294967295, http_user_agent=<optimized out>) at src/config.c:1756
#10 0xaad93b74 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:1089
(gdb) frame 0
#0 htsmsg_field_set_msg (sub=0xac5f8ce8, f=<optimized out>, f=<optimized out>) at src/htsmsg.c:604
604 in src/htsmsg.c
(gdb) print &m
Address requested for identifier "m" which is in register $r4
(gdb) print &m->hm_data_size
$1 = (size_t *) 0xac5fbb4a

#13

Updated by Jaroslav Kysela over 6 years ago

Luis Alves wrote:

Jaroslav Kysela wrote:

The '__attribute__((packed))' forces the unaligned offsets for the variables!!! The compiler should align variables properly when this attribute is not used.

Not true when casting an arbitrary data block to something else.

Yep, I see the problem now. The casting to the allocated memory with an unaligned offset makes that. It has nothing to do with the compiler.

Shaun Schembri wrote:

$1 = (size_t *) 0xac5fbb4a

Yes, this is completely unaligned address.

#14

Updated by Shaun Schembri over 6 years ago

Luis Alves wrote:

Forgot to post my "hack"...
https://pastebin.com/YisGtCuq

It basically aligns the htsmsg_t pointer to the cpu word boundary.
Wastes some padding bytes but you don't have to force the compiler to do aligned accesses with the packed attribute and performance wise should be better for all architectures)

You "hack" works and I managed to start tvheadend compiled with -O2. However I could only do limited tests with a fresh configuration to see if it crashes somewhere else. I haven't updated my conf directory since last December and some things seems to be broken (ex Kodi which I use as a client doesn't list any channels).

Any idea how its best to update my configuration without starting afresh? Now that I can build any version with -O1 maybe its a good idea I build interim versions and update my configuration till the latest commit.

Thanks again guys....you rock

#15

Updated by Luis Alves over 6 years ago

Luis Alves wrote:

ps: the above hack is not a fix (other parts of the code would need adjutments), just a proof of concept...

Actually, I don't think anything else needs to be changed...

Shaun,
Can you try build and run the with the following changes: https://pastebin.com/YisGtCuq
(it's just adding 1 line on each file: htsmsg.c and htsmsg_binary2.c)

I'm sure Jaroslav can eventually find a prettier fix...

#16

Updated by Shaun Schembri over 6 years ago

Luis Alves wrote:

Shaun,
Can you try build and run the with the following changes: https://pastebin.com/YisGtCuq
(it's just adding 1 line on each file: htsmsg.c and htsmsg_binary2.c)

Actually my results posted above was with you patch applied.

Any idea about how best to upgrade my configuration?

#17

Updated by Shaun Schembri over 6 years ago

Found my issue why my channels where all gone...I didn't build with zlib so the new binary could not read my old compressed mux files. Now with zlib all is OK....I will keep testing and report and further issues.

#18

Updated by Shaun Schembri over 6 years ago

Did a fresh build and I got another crash.....I believe I tested earlier with -O1 build before hence why I didn't get the error. Too much trail and error lately.

@Program terminated with signal SIGBUS, Bus error.
#0 htsmsg_binary_des0 (buf=0xabb03054 "\003\006", buf@entry=0xabb03048 "\001\006", len=3123, len@entry=3135, msg=<optimized out>;) at src/htsmsg_binary.c:119

#19

Updated by Luis Alves over 6 years ago

Shaun Schembri wrote:

Did a fresh build and I got another crash.....I believe I tested earlier with -O1 build before hence why I didn't get the error. Too much trail and error lately.

@Program terminated with signal SIGBUS, Bus error.
#0 htsmsg_binary_des0 (buf=0xabb03054 "\003\006", buf@entry=0xabb03048 "\001\006", len=3123, len@entry=3135, msg=<optimized out>;) at src/htsmsg_binary.c:119

Missed that one... just apply the same kind of "fix": https://pastebin.com/AaGuhfaV

#20

Updated by Jaroslav Kysela over 6 years ago

Could you retest with v4.3-1470-gfdaa48b32 ?

#22

Updated by Luis Alves over 6 years ago

Just one comment on: [[https://github.com/tvheadend/tvheadend/commit/fdaa48b32b0566fbf56a588e54fec28b4350d35a#diff-397651fb104a69a9290d09e030123030R110]]
This alignment function is hard-coded to a 64-bit word alignment that will "waste" memory on 32 bit machines.

These are the methods I know of to do word alignment:
a) Use sizeof(int) which should always be the size of the machine for which it is being compiled on: mask = "sizeof(int)-1"
b) or #include <bits/wordsize.h> and use the "__WORDSIZE" definition: mask = "(__WORDSIZE >> 3) -1"
(both are replaced on compilation time so it doesn't cause any runtime overhead)

Anyway, this is not critical - at least until 128-bit ARMs that don't deal with unaligned mem addresses show up :)

Keep up the great work Jaroslav!

#23

Updated by Shaun Schembri over 6 years ago

I have built a new binary however I will only be capable of doing a full scale test (covering all the features I use) over this coming weekend as I will be away from home.

Thanks a lot guys....I am sure other will find this patch useful as more and more people move away from 4.2 to newer versions

#24

Updated by Jaroslav Kysela over 6 years ago

Luis Alves wrote:

Just one comment on: [[https://github.com/tvheadend/tvheadend/commit/fdaa48b32b0566fbf56a588e54fec28b4350d35a#diff-397651fb104a69a9290d09e030123030R110]]
This alignment function is hard-coded to a 64-bit word alignment that will "waste" memory on 32 bit machines.

These are the methods I know of to do word alignment:
a) Use sizeof(int) which should always be the size of the machine for which it is being compiled on: mask = "sizeof(int)-1"
b) or #include <bits/wordsize.h> and use the "__WORDSIZE" definition: mask = "(__WORDSIZE >> 3) -1"
(both are replaced on compilation time so it doesn't cause any runtime overhead)

I though about this, but the null strings are eliminated now (no overhead because there is no extra offset) and usually we need more than 4 bytes for the variable names, thus almost all allocations will end with
8+ alignments. The cache alignment is mostly bigger than 4+, so we're fine.

#25

Updated by Shaun Schembri over 6 years ago

Just a small update....have been running this a few days and today I "stress" tested it quite a bit. All looks ok now so you can close this issue.

Keep up the awesome work

#26

Updated by Jaroslav Kysela over 6 years ago

  • Status changed from New to Fixed

Also available in: Atom PDF