Subject:
[ruby-ffi] Re: Segfault when using threads
From:
Wayne Meissner
Date:
10/26/13 4:46 PM
To:
ruby-ffi@googlegroups.com


The amount of debug info you provided, made it easy to track down whether other people had hit the same thing in other languages.

But, I think you're boned.  Try and push to get that patch included in corosync, since thats the only real solution.

You *could* do something like compiling a lib that runs the dispatch thread on a custom pthread with an increased stack, but then you run into the tricky situation of other threads also hitting that code path, and blowing up.

On Sunday, 27 October 2013 07:15:41 UTC+11, patrick...@gmail.com wrote:
I could have sworn when I tried doing everything on the thread that it worked. But I just tried it again and it segfaults.

When I look at ruby's source code, it looks like it allocates a 512kb stack for threads. When I go into corosync and look at the size of the buffer it's allocating on the stack, it's 1mb.
So the reason it works on the main thread is that the main thread has an 8mb stack.

I don't suppose there's any way to work around this behavior? If not I'm fairly well hosed and can only run cpg_dispatch on the main thread :-(

Thank you for that. I don't know that I would have found that otherwise.

-Patrick


On Saturday, October 26, 2013 8:06:14 AM UTC-4, Wayne Meissner wrote:

Try moving all the initialization into the new thread as well, to isolate any cross-threading issues (some libraries have thread-local data, so if you init an object on one thread, but access from another, it could be only partially initialized).

If that *still* segfaults, then it could be the library - see http://lists.corosync.org/pipermail/discuss/2013-April/002514.html
 - non-main threads get a smaller and fixed size stack, whereas the main thread gets a growable stack.




On Saturday, 26 October 2013 16:15:25 UTC+11, patrick...@gmail.com wrote:
Ok, the subject probably sounds like I've got some variable not being locked or whatnot, but I highly doubt that's the case :-)

So, what I've got going on is that I'm developing a gem for interfacing with corosync. In single threaded mode, it works perfectly. But as soon as I call the corosync library from within a thread, it blows up (sigsegv). The interesting part is that in my test case, I'm not even doing anything in the main thread other than waiting for the new thread to complete. No modifying of shared variables or anything.

This is the code:
require 'corosync/cpg'
cpg = Corosync::CPG.new('mygroup')
#cpg.dispatch(0) # this line runs perfectly
Thread.new { cpg.dispatch(0) }.join # this line segfaults

Move the comment from line 3 to line 4 and it runs fine.

This is the ruby stack trace:
test.rb:4:in `block in <main>'
/home/phemmer/git/ruby-corosync/lib/corosync/cpg.rb:145:in `dispatch'
/home/phemmer/git/ruby-corosync/lib/corosync/cpg.rb:145:in `cpg_dispatch'


When I throw GDB on it, this is what the stack trace looks like:
[New Thread 0x7ffff7ff9700 (LWP 30441)]
[New Thread 0x7ffff7ecf700 (LWP 30442)]
[New Thread 0x7fffeef7b700 (LWP 30443)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeef7b700 (LWP 30443)]
0x00007fffefae4c86 in hdb_handle_get (instance=0x7fffeee78580, handle_in=7749363892505018368, 
    handle_database=0x7fffefce7060 <cpg_handle_t_db>) at ../include/corosync/hdb.h:110
110 ../include/corosync/hdb.h: No such file or directory.
(gdb) where
#0  0x00007fffefae4c86 in hdb_handle_get (instance=0x7fffeee78580, handle_in=7749363892505018368, 
    handle_database=0x7fffefce7060 <cpg_handle_t_db>) at ../include/corosync/hdb.h:110
#1  cpg_dispatch (handle=7749363892505018368, dispatch_types=CS_DISPATCH_ONE_NONBLOCKING)
    at cpg.c:357
#2  0x00007fffefef1010 in ffi_call_unix64 () from /usr/lib64/libffi.so.6
#3  0x00007fffefef0a8a in ffi_call () from /usr/lib64/libffi.so.6
#4  0x00007ffff0101f3e in rbffi_CallFunction ()
   from /home/phemmer/.gem/ruby/1.9.1/gems/ffi-1.9.0/lib/ffi_c.so
#5  0x00007ffff0105956 in custom_trampoline ()
   from /home/phemmer/.gem/ruby/1.9.1/gems/ffi-1.9.0/lib/ffi_c.so
#6  0x00007ffff7af9898 in call_cfunc (func=0x7ffff7fee0d8, recv=9748160, len=-1, argc=2, 
    argv=0x7fffeef7c070) at vm_insnhelper.c:317
#7  0x00007ffff7afa1b6 in vm_call_cfunc (th=0x837840, reg_cfp=0x7fffef07beb0, num=2, recv=9748160, 
    blockptr=0x0, me=0x87c420) at vm_insnhelper.c:404
#8  0x00007ffff7afa893 in vm_call_method (th=0x837840, cfp=0x7fffef07beb0, num=2, blockptr=0x0, 
    flag=0, id=15088, me=0x87c420, recv=9748160) at vm_insnhelper.c:530
#9  0x00007ffff7b001ab in vm_exec_core (th=0x837840, initial=0) at insns.def:1018
#10 0x00007ffff7b0d5a5 in vm_exec (th=0x837840) at vm.c:1236
#11 0x00007ffff7b0beff in invoke_block_from_c (th=0x837840, block=0x607ad0, self=6711000, argc=0, 
    argv=0xa8e7e8, blockptr=0x0, cref=0x0) at vm.c:640
#12 0x00007ffff7b0c111 in rb_vm_invoke_proc (th=0x837840, proc=0x607ad0, self=6711000, argc=0, 
    argv=0xa8e7e8, blockptr=0x0) at vm.c:686
#13 0x00007ffff7b12eaa in thread_start_func_2 (th=0x837840, stack_start=0x7fffeef7c000)
    at thread.c:466
#14 0x00007ffff7b11b88 in thread_start_func_1 (th_ptr=0x837840) at thread_pthread.c:657
#15 0x00007ffff73ba03a in start_thread () from /lib64/libpthread.so.0
#16 0x00007ffff76b740d in clone () from /lib64/libc.so.6
(hdb_handle_get hdb.h:110 is found here)
(cpg_dispatch cpg.c:357 is found here)


The corosync gem currently lives at http://github.com/phemmer/ruby-corosync
For the libraries, I'm running libffi 3.0.11, corosync 2.3.2, and libqb 0.14.4.
I'm using MRI ruby 1.9.3p448 and ffi gem 1.9.0.

Any ideas and help would be appreciated. I've been pounding my head on the desk all day long.

-Patrick

--
 
---
You received this message because you are subscribed to the Google Groups "ruby-ffi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-ffi+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.