CuPy

CuPy
Original author(s)	Seiya Tokui
Developer(s)	Community, Preferred Networks, Inc.
Initial release	September 2, 2015; 10 years ago.
Stable release	v13.3.0 / August 22, 2024; 17 months ago
Repository	github.com/cupy/cupy
Written in	Python, Cython, CUDA
Operating system	Linux, Windows
Platform	Cross-platform
Type	Numerical analysis
License	MIT
Website	cupy.dev

CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them.^[3] CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports Nvidia CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.^[4]^[5]

CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017.^[6]

CuPy is a part of the NumPy ecosystem array libraries^[7] and is widely adopted to utilize GPU with Python,^[8] especially in high-performance computing environments such as Summit,^[9] Perlmutter,^[10] EULER,^[11] and ABCI.^[12]

CuPy is a NumFOCUS sponsored project.^[13]

Features

CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs.^[14]^[15]

NumPy-compatible APIs

The same set of APIs defined in the NumPy package (numpy.*) are available under cupy.* package.

Multi-dimensional array (cupy.ndarray) for boolean, integer, float, and complex data types
Module-level functions
Linear algebra functions
Fast Fourier transform
Random number generator

SciPy-compatible APIs

The same set of APIs defined in the SciPy package (scipy.*) are available under cupyx.scipy.* package.

Sparse matrices (cupyx.scipy.sparse.*_matrix) of CSR, COO, CSC, and DIA format
Discrete Fourier transform
Advanced linear algebra
Multidimensional image processing
Sparse linear algebra
Special functions
Signal processing
Statistical functions

User-defined GPU kernels

Kernel templates for element-wise and reduction operations
Raw kernel (CUDA C/C++)
Just-in-time transpiler (JIT)
Kernel fusion

Distributed computing

Distributed communication package (cupyx.distributed), providing collective and peer-to-peer primitives

Low-level CUDA features

Stream and event
Memory pool
Profiler
Host API binding
CUDA Python support^[16]

Interoperability

DLPack^[17]
CUDA Array Interface^[18]
NEP 13 (__array_ufunc__)^[19]
NEP 18 (__array_function__)^[20]^[21]
Array API Standard^[22]^[23]

Examples

Array creation

import cupy as cp
from cupy.typing import NDArray

x: NDArray[int] = cp.array([1, 2, 3])
print(x)
# prints array([1, 2, 3])
y: NDArray[int] = cp.arange(10)
print(y)
# prints array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Basic operations

import cupy as cp
from cupy.typing import NDArray, float32

x: NDArray[float32] = cp.arange(12).reshape(3, 4).astype(cp.float32)
print(x)
# prints: 
# array([[ 0.,  1.,  2.,  3.],
#        [ 4.,  5.,  6.,  7.],
#        [ 8.,  9., 10., 11.]], dtype=float32)
print(x.sum(axis=1))
# prints array([ 6., 22., 38.], dtype=float32)

Raw CUDA C/C++ kernel

import cupy as cp
from cupy.driver import In, Out, RawKernel

kern: RawKernel = cp.RawKernel(r'''
extern "C" __global__
void multiply_elemwise(const float* in1, const float* in2, float* out) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    out[tid] = in1[tid] * in2[tid];
}
''', 'multiply_elemwise')
in1: In = cp.arange(16, dtype=cp.float32).reshape(4, 4)
in2: In = cp.arange(16, dtype=cp.float32).reshape(4, 4)
out: Out = cp.zeros((4, 4), dtype=cp.float32)
kern((4,), (4,), (in1, in2, out))  # grid, block and arguments
print(out)
# prints:
# array([[  0.,   1.,   4.,   9.],
#        [ 16.,  25.,  36.,  49.],
#        [ 64.,  81., 100., 121.],
#        [144., 169., 196., 225.]], dtype=float32)

Applications

References

↑ "Release v1.3.0 – chainer/chainer". https://github.com/chainer/chainer/releases/v1.3.0.
↑ ^2.0 ^2.1 "Releases – cupy/cupy". https://github.com/cupy/cupy/releases.
↑ Okuta, Ryosuke; Unno, Yuya; Nishino, Daisuke; Hido, Shohei; Loomis, Crissman (2017). "CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations". Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS). http://learningsys.org/nips17/assets/papers/paper_16.pdf.
↑ "CuPy 9.0 Brings AMD GPU Support To This Numpy-Compatible Library - Phoronix". 29 April 2021. https://www.phoronix.com/scan.php?page=news_item&px=CuPy-9.0-Released.
↑ "AMD Leads High Performance Computing Towards Exascale and Beyond". 28 June 2021. https://ir.amd.com/news-events/press-releases/detail/1012/amd-leads-high-performance-computing-towards-exascale-and. "Most recently, CuPy, an open-source array library with Python, has expanded its traditional GPU support with the introduction of version 9.0 that now offers support for the ROCm stack for GPU-accelerated computing."
↑ "Preferred Networks released Version 2 of Chainer, an Open Source framework for Deep Learning - Preferred Networks, Inc.". 2 June 2017. https://www.preferred.jp/en/news/pr20170602/.
↑ "NumPy". numpy.org. https://numpy.org/.
↑ Gorelick, Micha; Ozsvald, Ian (April 2020). High Performance Python: Practical Performant Programming for Humans (2nd ed.). O'Reilly Media, Inc.. p. 190. ISBN 9781492055020.
↑ Oak Ridge Leadership Computing Facility. "Installing CuPy". OLCF User Documentation. https://docs.olcf.ornl.gov/software/python/cupy.html.
↑ National Energy Research Scientific Computing Center. "Using Python on Perlmutter". NERSC Documentation. https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#cupy.
↑ ETH Zurich. "CuPy". ScientificComputing. https://scicomp.ethz.ch/wiki/CuPy.
↑ National Institute of Advanced Industrial Science and Technology. "Chainer". ABCI 2.0 User Guide. https://docs.abci.ai/en/apps/chainer/.
↑ "Sponsored Projects - NumFOCUS". https://numfocus.org/sponsored-projects.
↑ "Overview". CuPy documentation. https://docs.cupy.dev/en/latest/overview.html.
↑ "Comparison Table". CuPy documentation. https://docs.cupy.dev/en/latest/reference/comparison.html.
↑ "CUDA Python | NVIDIA Developer". https://developer.nvidia.com/cuda-python.
↑ "Welcome to DLPack's documentation!". DLPack 0.6.0 documentation. https://dmlc.github.io/dlpack/latest/.
↑ "CUDA Array Interface (Version 3)". Numba 0.55.2+0.g2298ad618.dirty-py3.7-linux-x86_64.egg documentation. https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html.
↑ "NEP 13 — A mechanism for overriding Ufuncs — NumPy Enhancement Proposals". numpy.org. https://numpy.org/neps/nep-0013-ufunc-overrides.html.
↑ "NEP 18 — A dispatch mechanism for NumPy's high level array functions — NumPy Enhancement Proposals". numpy.org. https://numpy.org/neps/nep-0018-array-function-protocol.html.
↑ , Wikidata Q99413970
↑ "2021 report - Python Data APIs Consortium". https://data-apis.org/files/2021_annual_report_DataAPIs_Consortium.pdf.
↑ "Purpose and scope". Python array API standard 2021.12 documentation. https://data-apis.org/array-api/latest/purpose_and_scope.html.
↑ "Install spaCy". spaCy Usage Documentation. https://spacy.io/usage#gpu.
↑ Patel, Ankur A.; Arasanipalai, Ajay Uppili (May 2021). Applied Natural Language Processing in the Enterprise (1st ed.). O'Reilly Media, Inc.. p. 68. ISBN 9781492062578.
↑ "Python Package Introduction". xgboost 1.6.1 documentation. https://xgboost.readthedocs.io/en/stable/python/python_intro.html#data-interface.
↑ "UCBerkeleySETI/turbo_seti: turboSETI -- python based SETI search algorithm.". https://github.com/UCBerkeleySETI/turbo_seti#turbo_seti.
↑ "Open GPU Data Science | RAPIDS". https://rapids.ai/.
↑ "API Docs". RAPIDS Docs. https://docs.rapids.ai/api.
↑ "Efficient Data Sharing between CuPy and RAPIDS". https://medium.com/rapids-ai/using-rapids-memory-manager-with-cupy-8d08fe8f58fa.
↑ "10 Minutes to cuDF and CuPy". https://medium.com/rapids-ai/10-minutes-to-cudf-and-cupy-e131cac0439b.
↑ Alex, Rogozhnikov (2022). "Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation". International Conference on Learning Representations. https://openreview.net/forum?id=oapKSVM2bcj.
↑ "arogozhnikov/einops: Deep learning operations reinvented (for pytorch, tensorflow, jax and others)". https://github.com/arogozhnikov/einops.
↑ "Array API support (experimental) — scikit-learn documentation". https://scikit-learn.org/stable/modules/array_api.html.
↑ Tokui, Seiya; Okuta, Ryosuke; Akiba, Takuya; Niitani, Yusuke; Ogawa, Toru; Saito, Shunta; Suzuki, Shuji; Uenishi, Kota et al. (2019). "Chainer: A Deep Learning Framework for Accelerating the Research Cycle". Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. doi:10.1145/3292500.3330756. https://dl.acm.org/doi/10.1145/3292500.3330756.

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/CuPy. Read more

[1] "Release v1.3.0 – chainer/chainer". https://github.com/chainer/chainer/releases/v1.3.0.

[github-releases-2] 2.0 ^2.1 "Releases – cupy/cupy". https://github.com/cupy/cupy/releases.

[3] Okuta, Ryosuke; Unno, Yuya; Nishino, Daisuke; Hido, Shohei; Loomis, Crissman (2017). "CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations". Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS). http://learningsys.org/nips17/assets/papers/paper_16.pdf.

[4] "CuPy 9.0 Brings AMD GPU Support To This Numpy-Compatible Library - Phoronix". 29 April 2021. https://www.phoronix.com/scan.php?page=news_item&px=CuPy-9.0-Released.

[5] "AMD Leads High Performance Computing Towards Exascale and Beyond". 28 June 2021. https://ir.amd.com/news-events/press-releases/detail/1012/amd-leads-high-performance-computing-towards-exascale-and. "Most recently, CuPy, an open-source array library with Python, has expanded its traditional GPU support with the introduction of version 9.0 that now offers support for the ROCm stack for GPU-accelerated computing."

[6] "Preferred Networks released Version 2 of Chainer, an Open Source framework for Deep Learning - Preferred Networks, Inc.". 2 June 2017. https://www.preferred.jp/en/news/pr20170602/.

[7] "NumPy". numpy.org. https://numpy.org/.

[8] Gorelick, Micha; Ozsvald, Ian (April 2020). High Performance Python: Practical Performant Programming for Humans (2nd ed.). O'Reilly Media, Inc.. p. 190. ISBN 9781492055020.

[9] Oak Ridge Leadership Computing Facility. "Installing CuPy". OLCF User Documentation. https://docs.olcf.ornl.gov/software/python/cupy.html.

[10] National Energy Research Scientific Computing Center. "Using Python on Perlmutter". NERSC Documentation. https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#cupy.

[11] ETH Zurich. "CuPy". ScientificComputing. https://scicomp.ethz.ch/wiki/CuPy.

[12] National Institute of Advanced Industrial Science and Technology. "Chainer". ABCI 2.0 User Guide. https://docs.abci.ai/en/apps/chainer/.

[13] "Sponsored Projects - NumFOCUS". https://numfocus.org/sponsored-projects.

[14] "Overview". CuPy documentation. https://docs.cupy.dev/en/latest/overview.html.

[15] "Comparison Table". CuPy documentation. https://docs.cupy.dev/en/latest/reference/comparison.html.

[16] "CUDA Python | NVIDIA Developer". https://developer.nvidia.com/cuda-python.

[17] "Welcome to DLPack's documentation!". DLPack 0.6.0 documentation. https://dmlc.github.io/dlpack/latest/.

[18] "CUDA Array Interface (Version 3)". Numba 0.55.2+0.g2298ad618.dirty-py3.7-linux-x86_64.egg documentation. https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html.

[19] "NEP 13 — A mechanism for overriding Ufuncs — NumPy Enhancement Proposals". numpy.org. https://numpy.org/neps/nep-0013-ufunc-overrides.html.

[20] "NEP 18 — A dispatch mechanism for NumPy's high level array functions — NumPy Enhancement Proposals". numpy.org. https://numpy.org/neps/nep-0018-array-function-protocol.html.

[21] , Wikidata Q99413970

[22] "2021 report - Python Data APIs Consortium". https://data-apis.org/files/2021_annual_report_DataAPIs_Consortium.pdf.

[23] "Purpose and scope". Python array API standard 2021.12 documentation. https://data-apis.org/array-api/latest/purpose_and_scope.html.

[24] "Install spaCy". spaCy Usage Documentation. https://spacy.io/usage#gpu.

[25] Patel, Ankur A.; Arasanipalai, Ajay Uppili (May 2021). Applied Natural Language Processing in the Enterprise (1st ed.). O'Reilly Media, Inc.. p. 68. ISBN 9781492062578.

[26] "Python Package Introduction". xgboost 1.6.1 documentation. https://xgboost.readthedocs.io/en/stable/python/python_intro.html#data-interface.

[27] "UCBerkeleySETI/turbo_seti: turboSETI -- python based SETI search algorithm.". https://github.com/UCBerkeleySETI/turbo_seti#turbo_seti.

[28] "Open GPU Data Science | RAPIDS". https://rapids.ai/.

[29] "API Docs". RAPIDS Docs. https://docs.rapids.ai/api.

[30] "Efficient Data Sharing between CuPy and RAPIDS". https://medium.com/rapids-ai/using-rapids-memory-manager-with-cupy-8d08fe8f58fa.

[31] "10 Minutes to cuDF and CuPy". https://medium.com/rapids-ai/10-minutes-to-cudf-and-cupy-e131cac0439b.

[32] Alex, Rogozhnikov (2022). "Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation". International Conference on Learning Representations. https://openreview.net/forum?id=oapKSVM2bcj.

[33] "arogozhnikov/einops: Deep learning operations reinvented (for pytorch, tensorflow, jax and others)". https://github.com/arogozhnikov/einops.

[34] "Array API support (experimental) — scikit-learn documentation". https://scikit-learn.org/stable/modules/array_api.html.

[35] Tokui, Seiya; Okuta, Ryosuke; Akiba, Takuya; Niitani, Yusuke; Ogawa, Toru; Saito, Shunta; Suzuki, Shuji; Uenishi, Kota et al. (2019). "Chainer: A Deep Learning Framework for Accelerating the Research Cycle". Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. doi:10.1145/3292500.3330756. https://dl.acm.org/doi/10.1145/3292500.3330756.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]