From HandWiki - Reading time: 6 min
| Original author(s) | Seiya Tokui |
|---|---|
| Developer(s) | Community, Preferred Networks, Inc. |
| Initial release | September 2, 2015.[1] |
| Stable release | |
| Preview release | |
| Repository | github |
| Written in | Python, Cython, CUDA |
| Operating system | Linux, Windows |
| Platform | Cross-platform |
| Type | Numerical analysis |
| License | MIT |
| Website | cupy |
CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them.[3] CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports NVIDIA CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.[4][5]
CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017.[6]
CuPy is a part of the NumPy ecosystem array libraries[7] and is widely adopted to utilize GPU with Python,[8] especially in high-performance computing environments such as Summit,[9] Perlmutter,[10] EULER,[11] and ABCI.[12]
CuPy is a NumFOCUS affiliated project.[13]
CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs.[14][15]
The same set of APIs defined in the NumPy package (numpy.*) are available under cupy.* package.
cupy.ndarray) for boolean, integer, float, and complex data typesThe same set of APIs defined in the SciPy package (scipy.*) are available under cupyx.scipy.* package.
cupyx.scipy.sparse.*_matrix) of CSR, COO, CSC, and DIA formatcupyx.distributed), providing collective and peer-to-peer primitives__array_ufunc__)[19]__array_function__)[20][21]>>> import cupy as cp
>>> x = cp.array([1, 2, 3])
>>> x
array([1, 2, 3])
>>> y = cp.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> import cupy as cp
>>> x = cp.arange(12).reshape(3, 4).astype(cp.float32)
>>> x
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]], dtype=float32)
>>> x.sum(axis=1)
array([ 6., 22., 38.], dtype=float32)
>>> import cupy as cp
>>> kern = cp.RawKernel(r'''
... extern "C" __global__
... void multiply_elemwise(const float* in1, const float* in2, float* out) {
... int tid = blockDim.x * blockIdx.x + threadIdx.x;
... out[tid] = in1[tid] * in2[tid];
... }
... ''', 'multiply_elemwise')
>>> in1 = cp.arange(16, dtype=cp.float32).reshape(4, 4)
>>> in2 = cp.arange(16, dtype=cp.float32).reshape(4, 4)
>>> out = cp.zeros((4, 4), dtype=cp.float32)
>>> kern((4,), (4,), (in1, in2, out)) # grid, block and arguments
>>> out
array([[ 0., 1., 4., 9.],
[ 16., 25., 36., 49.],
[ 64., 81., 100., 121.],
[144., 169., 196., 225.]], dtype=float32)