python - numpy calling sse2 via ctypes -


In short, I'm trying to call in Python Shared Library, more specifically, by numpy. Using CD2 instructions has been implemented in Shared Library C. Enabling optimization, that is, building a library with O2 or -O1, I am facing strange segfelt while calling in the Shared Library through CTIP. By disabling optimization (-O0), everything works as expected, as it happens when the library is directly linked to the C-program (not optimized or not). Attached You've got a snap that shows delineated behaviors on my system. With optimization enabled, GMB reports a segfault in __builtin_ia32_loadupd (__P) at emmintrin.h: 113. The value of __P is optimized.

test.c:

  #include & lt; Emmintrin.h & gt; # Include & lt; Complex.h & gt; Zero test (const int m, const double * x, double complex * y) {int i; __m128d _f, _x, _b; Double complex __trate __ ((aligned (16))); Double Complex B __trate __ ((aligned (16))); __m128d * _p; B = 1; _b = _mm_loadu_pd ((* Double *) and B); _p = (__m128d *) y; For (i = 0; i  

compiler flags: gcc -o libtest.so -shared -std = c99 -msse2 -fPIC -O2 -g -lm test.c

test.py Import as NMP import os def zerovec_aligned (NR, dtype = np.float64, limit = 16) as import: '' 'Create zero coalition array as''. '' 'Size = nr * np.dtype (dtype) .imasap tmp = np.zeros (size + range, dtype = np.uint8) address = tmp.__ array_interface __ [' data '] [0] offset = boundary - Address% Boundary TMP [offset: offset + size] .view (dtype = dtype) lib = np.ctypeslib.load_library ('libtest', '.') Lib.test.restype = None. Lib.test.argtypes = [np.ctypeslib .ctypes.c_int, np.ctypeslib.ndpointer (np.float64, flags = ('C', 'A')), np.ctypeslib.ndpointer (np.complex128, flags = ('C', 'A', 'W'))] n = 13 y = zerovec_aligned (n, dtype = np.complex128) x = np.ones (n, dtype = np.float64) # x = zerovec_aligned (n , Dtype = np.float64) #x [acts as candidate:

call_from_c.c:

ex> #include & lt; Stdio.h & gt; # Include & lt; Complex.h & gt; # Include & lt; Stdlib.h & gt; #include & lt; Emmintrin.h & gt; Zero test (const int meter, const double * x, double complex * y); Int main () {int i; Const int n = 11; Dual complex * y = (double complex *) _mm_malloc (n * sizeof (double complex), 16); Double * x = (double *) malloc (n * sizeof (double)); (I = 0; i & lt; n; ++ i) for {x [i] = 1; Y [ii] = 0; } Examination (N, X, Y); For (i = 0; i & lt; n; ++ i) printf ("[% f% f] \ n", cream (y [i]), semag (wi [ii])); Return 1; }

compile and call:
GCC -STD = c99 -otestc -msse2 -L -ltest call_from_c.c
Export LD_LIBRARY_PATH = $ {LD_LIBRARY_PATH} :.
./testc
... works.

My system:

  • Ubuntu Linux i686 2.6.31-22-generic
  • compiler: GCC (Ubuntu 4.4.1-4ubuntu9)
  • Python: Python 2.6.4 (R264: 75706, December 7, 2009), 18:45:15) [GCC 4.4.1]
  • Irritability: 1.4.0

I have taken the provisions (CF dragon code) which is y alignment and the alignment of the x should not make any difference (I think, clearly aligning x does not solve x problem ).

Note that when I use _mm_load__pd instead of _mm_load_pd, while loading B and F, the _mm_load_pd works for the C-only version (expected) However, when calling the function via CTIP _mm_load_pd Always using segfault (independent of customization).

I have tried many days to solve this issue without success ... and I die my monitor. Any input is welcome Daniel

Someone trying to call me some SSE-code from Python The problem is that the GCC wants to assume that the stack is a coalition of 16-byte boundaries (architecture, which is the largest basic type of SSE-type), and calculates all the offsets from that assumption when this assumption is false, Then the SSE-instructions will be net.

The answer seems to be compiled with

 gcc -mstackrealign 
which is the stack always 16 bytes.


Comments

Popular posts from this blog

windows - Heroku throws SQLITE3 Read only exception -

lex - Building a lexical Analyzer in Java -

python - rename keys in a dictionary -