Benchmarks【免费下载链接】asnumpy-docs项目地址: https://gitcode.com/cann/asnumpy-docsThis document contains the full performance benchmark comparing AsNumpy (NPU) against NumPy (CPU) on themultiply()operation.Test EnvironmentItemAsNumpy (NPU)NumPy (CPU)ProcessorAscend 910B NPUServer CPU (AArch64) on the same machineNPU RuntimeCANN 8.2.RC1.alpha003—PythonPython 3.9Python 3.9Library versionAsNumpy 0.2.0NumPy 1.26Data typefloat32float32Operationmultiply() — element-wise multiplicationmultiply() — element-wise multiplicationTimertime.perf_counter() (high-resolution)time.perf_counter() (high-resolution)Controlled VariablesBoth sides useidentical input data: arrays are generated by NumPy and transferred to NPU viafrom_numpy()before timing starts.Data transfer time is excluded: only themultiply()computation is timed.Results are single-run wall-clock times (no warmup, no averaging).ResultsShapeAsNumpy (NPU)NumPy (CPU)Speedup(500, 500)1.9355 s0.1708 s0.09×(1000, 1000)0.0692 s0.7029 s10.16×(2000, 2000)0.1033 s3.8387 s37.17×(3000, 3000)0.1115 s14.3567 s128.70×Key observation:For small tensors (500×500), NPU launch overhead dominates and CPU is faster. As tensor size grows, NPUs massive parallelism takes over — reaching128.70× speedupat 3000×3000.Reproducing the ResultsRun the benchmark script from the project root:python examples/03_multiply.pyThe script tests all four shapes with 50 iterations each, reports average and minimum times, and verifies numerical correctness against NumPy (relative diff 1e-4).The benchmark script is available atexamples/03_multiply.py.【免费下载链接】asnumpy-docs项目地址: https://gitcode.com/cann/asnumpy-docs创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考