算子调用【免费下载链接】ops-cv本项目是CANN提供的图像处理、目标检测相关的算子库实现网络在NPU上加速计算。项目地址: https://gitcode.com/cann/ops-cv使用须知前提说明算子调用前请参考本项目README完成环境准备和源码下载再完成源码构建此处不再赘述。调用算子支持调用非experimental目录算子内置算子清单详见算子列表和experimental目录算子贡献算子。调用算子样例如何快速调用项目中的算子本章介绍最简调用方法通过build.sh命令执行算子样例。说明对于Ascend 950PR产品可通过Simulator仿真工具执行算子样例详见仿真指导。基于自定义算子包执行算子样例包安装后执行如下命令bash build.sh --run_example ${op} ${mode} ${pkg_mode} [--vendor_name${vendor_name}] [--soc${soc_version}] [--experimental] # 以grid_sample算子example执行为例 # bash build.sh --run_example grid_sample eager cust --vendor_namecustom # 以grid_sample算子experimental执行为例 # bash build.sh --experimental --run_example grid_sample eager cust --vendor_namecustom${op}表示待执行算子算子名小写下划线形式如grid_sample。${mode}表示调用方式目前支持eageraclnn调用、graph图模式调用。${pkg_mode}表示包模式目前仅支持cust即自定义算子包。${vendor_name}可选与构建的自定义算子包设置一致默认名为custom。${soc_version}可选表示NPU型号。${experimental}可选表示执行用户保存在experimental贡献目录下的算子。说明${mode}为graph时不指定${pkg_mode}和${vendor_name}基于ops-cv包执行算子样例安装后执行如下命令bash build.sh --run_example ${op} ${mode} [--soc${soc_version}] # 以grid_sample算子example执行为例 # bash build.sh --run_example grid_sample eager${op}表示待执行算子算子名小写下划线形式如grid_sample。${mode}表示调用方式目前支持eageraclnn调用、graph图模式调用。${soc_version}可选表示NPU型号。基于ops-cv静态库执行算子样例前提条件ops-cv静态库依赖于ops-legacy静态库和ops-math静态库将上述静态库准备好解压并将所有lib64、include目录移动至统一目录${static_lib_path}下。说明ops-legacy静态库cann-${soc_name}-ops-legacy-static_${cann_version}_linux-${arch}.tar.gz可通过单击下载链接获取ops-cv静态库、ops-math静态库暂未提供软件包请通过本地编译生成。创建run.sh在待执行算子examples/test_aclnn_${op_name}.cpp同级目录下创建run.sh文件。以grid_sample算子执行test_aclnn_grid_sample2_d.cpp、test_aclnn_grid_sample3_d为例示例如下# 静态库文件路径 static_lib_path # 环境变量生效 if [ -n $ASCEND_INSTALL_PATH ]; then _ASCEND_INSTALL_PATH$ASCEND_INSTALL_PATH elif [ -n $ASCEND_HOME_PATH ]; then _ASCEND_INSTALL_PATH$ASCEND_HOME_PATH else _ASCEND_INSTALL_PATH/usr/local/Ascend/cann fi source ${_ASCEND_INSTALL_PATH}/bin/setenv.bash # 编译可执行文件 g test_aclnn_grid_sample2_d.cpp \ -I ${static_lib_path}/include \ -L ${static_lib_path}/lib64 \ -I ${_ASCEND_INSTALL_PATH}/include \ -I ${_ASCEND_INSTALL_PATH}/include/aclnnop \ -L ${_ASCEND_INSTALL_PATH}/lib64 \ -Wl,--allow-multiple-definition \ -Wl,--start-group -lcann_cv_static -lcann_math_static -lcann_legacy_static -Wl,--end-group -lgraph -lgraph_base \ -lpthread -lmmpa -lmetadef -lascendalog -lregister -lopp_registry -lops_base -lascendcl -ltiling_api -lplatform \ -ldl -lc_sec -lnnopbase -lruntime -lerror_manager -lunified_dlog \ -o test_aclnn_grid_sample2_d # 替换为实际算子可执行文件名 g test_aclnn_grid_sample3_d.cpp \ -I ${static_lib_path}/include \ -L ${static_lib_path}/lib64 \ -I ${_ASCEND_INSTALL_PATH}/include \ -I ${_ASCEND_INSTALL_PATH}/include/aclnnop \ -L ${_ASCEND_INSTALL_PATH}/lib64 \ -Wl,--allow-multiple-definition \ -Wl,--start-group -lcann_cv_static -lcann_math_static -lcann_legacy_static -Wl,--end-group -lgraph -lgraph_base \ -lpthread -lmmpa -lmetadef -lascendalog -lregister -lopp_registry -lops_base -lascendcl -ltiling_api -lplatform \ -ldl -lc_sec -lnnopbase -lruntime -lerror_manager -lunified_dlog \ -o test_aclnn_grid_sample3_d # 替换为实际算子可执行文件名 # 执行程序 ./test_aclnn_grid_sample2_d ./test_aclnn_grid_sample3_d${static_lib_path}表示静态库统一放置路径 ${ASCEND_INSTALL_PATH}已通过环境变量配置表示CANN toolkit包安装路径 最终可执行文件名请替换为实际算子可执行文件名。其中lcann_cv_static、lcann_legacy_static表示算子依赖的静态库文件从静态库统一放置路径${static_lib_path}中获取 lgraph、lmetadef等表示算子依赖的底层库文件可在CANN toolkit包获取。执行run.shbash run.sh无论上述哪种方式算子样例执行后会打印结果以grid_sample算子执行为例This environment does not have the ASAN library, no need enable ASAN CMAKE_ARGS: -DENABLE_UT_EXECTRUE ---------------------------------------------------------------- Start to run examples,name:grid_sample mode:eager Start compile and run examples file: ../image/grid_sample/examples/test_aclnn_grid_sample2_d.cpp pkg_mode:cust vendor_name:custom resultData[0] is: 0.250000 resultData[1] is: 2.250000 resultData[2] is: 2.000000 resultData[3] is: 8.500000 resultData[4] is: 20.500000 resultData[5] is: 12.000000 resultData[6] is: 8.250000 resultData[7] is: 18.250000 resultData[8] is: 10.000000 Start compile and run examples file: ../image/grid_sample/examples/test_aclnn_grid_sample3_d.cpp pkg_mode:cust vendor_name:custom resultData[0] is: 0.250000 resultData[1] is: 0.875000 resultData[2] is: 2.000000 resultData[3] is: 4.000000调用方式通过build.sh执行算子时其底层调用算子的原理目前包括如下几种方式PyTorch调用**建设中**通过方式启动算子Kernel流程极简实现PyTorch方式调用NPU算子。aclnn调用推荐Host侧提供算子对应的C语言API前缀aclnn无需提供IR定义实现aclnn API方式调用算子。图模式调用提供IRIntermediate Representation定义实现构图方式调用算子。PyTorch调用建设中该方式提供一套基于Ascend Extension for PyTorchtorch_npu框架调用NPU算子的方法具体调用原理和过程请参考examples/fast_kernel_launch_example内容仍在建设和优化中欢迎您提问和建议。aclnn调用调用流程该方式也称为“单算子API调用”通过提供一套基于C的API以aclnn为前缀API实现算子调用无需提供算子IRIntermediate Representation定义。aclnn API的调用流程如下aclnn API调用示例以AddExample算子为例代码示例如下仅供参考全量代码参见test_aclnn_add_example.cpp。调用前请按照环境安装的提示信息设置环境变量。注意如需调用项目已实现的算子可访问目标算子examples目录下test_aclnn_${op_name}.cpp${op_name}表示算子名。int main() { // 1. 调用acl进行device/stream初始化 int32_t deviceId 0; aclrtStream stream; auto ret Init(deviceId, stream); CHECK_RET(ret ACL_SUCCESS, LOG_PRINT(Init acl failed. ERROR: %d\n, ret); return ret); // 2. 构造输入与输出需要根据API的接口自定义构造 aclTensor* selfX nullptr; void* selfXDeviceAddr nullptr; std::vectorint64_t selfXShape {32, 4, 4, 4}; std::vectorfloat selfXHostData(2048, 1); ret CreateAclTensor(selfXHostData, selfXShape, selfXDeviceAddr, aclDataType::ACL_FLOAT, selfX); CHECK_RET(ret ACL_SUCCESS, return ret); aclTensor* selfY nullptr; void* selfYDeviceAddr nullptr; std::vectorint64_t selfYShape {32, 4, 4, 4}; std::vectorfloat selfYHostData(2048, 1); ret CreateAclTensor(selfYHostData, selfYShape, selfYDeviceAddr, aclDataType::ACL_FLOAT, selfY); CHECK_RET(ret ACL_SUCCESS, return ret); aclTensor* out nullptr; void* outDeviceAddr nullptr; std::vectorint64_t outShape {32, 4, 4, 4}; std::vectorfloat outHostData(2048, 1); ret CreateAclTensor(outHostData, outShape, outDeviceAddr, aclDataType::ACL_FLOAT, out); CHECK_RET(ret ACL_SUCCESS, return ret); // 3. 调用CANN算子库API需要修改为具体的Api名称 uint64_t workspaceSize 0; aclOpExecutor* executor; // 4. 调用aclnnAddExample第一段接口 ret aclnnAddExampleGetWorkspaceSize(selfX, selfY, out, workspaceSize, executor); CHECK_RET(ret ACL_SUCCESS, LOG_PRINT(aclnnAddExampleGetWorkspaceSize failed. ERROR: %d\n, ret); return ret); // 根据第一段接口计算出的workspaceSize申请device内存 void* workspaceAddr nullptr; if (workspaceSize static_castuint64_t(0)) { ret aclrtMalloc(workspaceAddr, workspaceSize, ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ACL_SUCCESS, LOG_PRINT(allocate workspace failed. ERROR: %d\n, ret); return ret); } // 5. 调用aclnnAddExample第二段接口 ret aclnnAddExample(workspaceAddr, workspaceSize, executor, stream); CHECK_RET(ret ACL_SUCCESS, LOG_PRINT(aclnnAddExample failed. ERROR: %d\n, ret); return ret); // 6. 固定写法同步等待任务执行结束 ret aclrtSynchronizeStream(stream); CHECK_RET(ret ACL_SUCCESS, LOG_PRINT(aclrtSynchronizeStream failed. ERROR: %d\n, ret); return ret); // 7. 获取输出的值将device侧内存上的结果拷贝至host侧需要根据具体API的接口定义修改 PrintOutResult(outShape, outDeviceAddr); // 8. 释放aclTensor需要根据具体API的接口定义修改 aclDestroyTensor(selfX); aclDestroyTensor(selfY); aclDestroyTensor(out); // 9. 释放device资源 aclrtFree(selfXDeviceAddr); aclrtFree(selfYDeviceAddr); aclrtFree(outDeviceAddr); if (workspaceSize static_castuint64_t(0)) { aclrtFree(workspaceAddr); } aclrtDestroyStream(stream); aclrtResetDevice(deviceId); // 9. acl去初始化 aclFinalize(); return 0; }编译与运行说明对于本项目内已实现的算子非自定义算子可通过根目录下build.sh直接运行算子操作请参考执行算子样例。前提条件。 请参考源码构建完成目标算子的编译部署。创建CMakeLists.txt文件。在test_aclnn_${op_name}.cpp同级目录下创建CMakeLists.txt文件以AddExample算子为例示例如下请根据实际情况自行修改。cmake_minimum_required(VERSION 3.14) # 设置工程名 project(ACLNN_EXAMPLE) # 设置C编译标准 add_compile_options(-stdc11) # 设置编译输出目录为当前目录下的bin文件夹 set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ./bin) # 设置调试和发布模式的编译选项 set(CMAKE_CXX_FLAGS_DEBUG -fPIC -O0 -g -Wall) set(CMAKE_CXX_FLAGS_RELEASE -fPIC -O2 -Wall) # 添加可执行文件请替换为实际算子可执行文件指定算子调用的*.cpp文件 add_executable(test_aclnn_add_example test_aclnn_add_example.cpp) # ASCEND_PATHCANN软件包目录请根据实际路径修改 if(NOT $ENV{ASCEND_HOME_PATH} STREQUAL ) set(ASCEND_PATH $ENV{ASCEND_HOME_PATH}) else() set(ASCEND_PATH /usr/local/Ascend/cann) endif() # 获取自定义算子包名称存在多个自定义算子包时只会使用其中一个 set(VENDORS_DIR ${ASCEND_PATH}/opp/vendors) file(GLOB CUSTOM_DIRS ${VENDORS_DIR}/*) foreach(CUSTOM_DIR ${CUSTOM_DIRS}) if(IS_DIRECTORY ${CUSTOM_DIR}) set(TARGET_SUBDIR ${CUSTOM_DIR}) endif() endforeach() if(NOT DEFINED TARGET_SUBDIR) message(FATAL_ERROR 在路径${ASCEND_PATH}中未找到自定义算子包) endif() # 设置头文件路径 set(INCLUDE_BASE_DIR ${ASCEND_PATH}/include) include_directories( ${INCLUDE_BASE_DIR} ${TARGET_SUBDIR}/op_api/include # 仅自定义算子需要 # ${INCLUDE_BASE_DIR}/aclnn # 仅内置算子需要 ) include_directories( ${INCLUDE_BASE_DIR} ) # 链接所需的动态库 target_link_libraries(test_aclnn_add_example PRIVATE # 替换实际算子可执行文件 ${ASCEND_PATH}/lib64/libascendcl.so ${ASCEND_PATH}/lib64/libnnopbase.so ${TARGET_SUBDIR}/op_api/lib/libcust_opapi.so # 仅自定义算子需要 # ${ASCEND_PATH}/lib64/libopapi_cv.so # 仅内置算子需要 ) target_link_options(test_aclnn_add_example PRIVATE -Wl,-rpath,${TARGET_SUBDIR}/op_api/lib # 仅自定义算子需要 ) # 安装目标文件到bin目录 install(TARGETS test_aclnn_add_example DESTINATION ${CMAKE_RUNTIME_OUTPUT_DIRECTORY})创建run.sh文件。在test_aclnn_${op_name}.cpp同级目录下创建run.sh文件以AddExample算子为例示例如下请根据实际情况自行修改。if [ -n $ASCEND_INSTALL_PATH ]; then # 实际CANN包安装路径 _ASCEND_INSTALL_PATH$ASCEND_INSTALL_PATH elif [ -n $ASCEND_HOME_PATH ]; then _ASCEND_INSTALL_PATH$ASCEND_HOME_PATH else _ASCEND_INSTALL_PATH/usr/local/Ascend/cann fi source ${_ASCEND_INSTALL_PATH}/bin/setenv.bash rm -rf build mkdir -p build cd build cmake ../ -DCMAKE_CXX_COMPILERg -DCMAKE_SKIP_RPATHTRUE # 执行构建命令 make cd bin ./test_aclnn_add_example # 替换为实际算子可执行文件名运行run.sh文件。 在run.sh文件所在路径执行如下命令bash run.sh默认在当前执行路径/build/bin下生成可执行文件test_aclnn_add_example运行结果如下mean result[2046] is 2.000000 mean result[2047] is 2.000000图模式调用调用流程该方式采用算子IRIntermediate Representation构图方式调用算子调用流程如下调用示例以AddExample算子为例代码示例如下仅供参考全量代码参见test_geir_add_example.cpp。调用前请按照环境安装的提示信息设置环境变量。说明如需调用项目已实现算子可访问目标算子examples目录下test_geir_${op_name}.cpp${op_name}表示算子名。int main() { // 1. 创建图对象 Graph graph(graphName); // 2. 图全局编译选项初始化 Status ret ge::GEInitialize(globalOptions); // 3. 创建AddExample算子实例 auto add1 op::AddExample(add1); // 4. 定义图输入输出向量 std::vectorOperator inputs{}; std::vectorOperator outputs{}; // 5. 准备输入数据 std::vectorint64_t xShape {32,4,4,4}; // 宏展开方式处理变量赋值 ADD_INPUT(1, x1, inDtype, xShape); ADD_INPUT(2, x2, inDtype, xShape); ADD_OUTPUT(1, y, inDtype, xShape); outputs.push_back(add1); // 6. 设置图对象的输入算子和输出算子 graph.SetInputs(inputs).SetOutputs(outputs); // 7. 创建session对象 ge::Session* session new Session(buildOptions); // 8. session添加图 ret session-AddGraph(graphId, graph, graphOptions); // 9. 运行图 ret session-RunGraph(graphId, input, output); // 10. 释放资源 GEFinalize(); return 0; }编译与运行说明对于本项目内已实现的算子非自定义算子可通过根目录下build.sh直接运行算子操作请参考执行算子样例。前提条件。 请参考源码构建完成目标算子的编译部署。创建CMakelist文件。在test_geir_${op_name}.cpp同级目录下创建CMakelist文件以AddExample算子为例示例如下请根据实际情况自行修改。cmake_minimum_required(VERSION 3.14) # 设置工程名 project(GE_IR_EXAMPLE) if(NOT $ENV{ASCEND_OPP_PATH} STREQUAL ) get_filename_component(ASCEND_PATH $ENV{ASCEND_OPP_PATH} DIRECTORY) elseif(NOT $ENV{ASCEND_HOME_PATH} STREQUAL ) set(ASCEND_PATH $ENV{ASCEND_HOME_PATH}) else() set(ASCEND_PATH /usr/local/Ascend/cann) endif() set(FWK_INCLUDE_DIR ${ASCEND_PATH}/compiler/include) message(STATUS ASCEND_PATH: ${ASCEND_PATH}) file(GLOB files CONFIGURE_DEPENDS test_geir_add_example.cpp ) # 添加可执行文件请替换为实际算子可执行文件 add_executable(test_geir_add_example ${files}) find_library(GRAPH_LIBRARY_DIR libgraph.so ${ASCEND_PATH}/compiler/lib64/stub) find_library(GE_RUNNER_LIBRARY_DIR libge_runner.so ${ASCEND_PATH}/compiler/lib64/stub) find_library(GRAPH_BASE_LIBRARY_DIR libgraph_base.so ${ASCEND_PATH}/compiler/lib64) # 链接所需的动态库 target_link_libraries(test_geir_add_example PRIVATE ${GRAPH_LIBRARY_DIR} ${GE_RUNNER_LIBRARY_DIR} ${GRAPH_BASE_LIBRARY_DIR} ) # 设置头文件路径 target_include_directories(test_geir_add_example PRIVATE ${FWK_INCLUDE_DIR}/graph/ ${FWK_INCLUDE_DIR}/ge/ ${ASCEND_PATH}/opp/built-in/op_proto/inc/ ${CMAKE_CURRENT_SOURCE_DIR} ${ASCEND_PATH}/compiler/include )创建run.sh脚本。在test_geir_${op_name}.cpp同级目录下创建run.sh文件以AddExample算子为例示例如下请根据实际情况自行修改。if [ -n $ASCEND_INSTALL_PATH ]; then # 实际CANN包安装路径 _ASCEND_INSTALL_PATH$ASCEND_INSTALL_PATH elif [ -n $ASCEND_HOME_PATH ]; then _ASCEND_INSTALL_PATH$ASCEND_HOME_PATH else _ASCEND_INSTALL_PATH/usr/local/Ascend/cann fi source ${_ASCEND_INSTALL_PATH}/bin/setenv.bash rm -rf build mkdir -p build cd build cmake ../ -DCMAKE_CXX_COMPILERg -DCMAKE_SKIP_RPATHTRUE # 执行构建命令 make ./test_geir_add_example # 替换为实际算子可执行文件名运行run.sh脚本。 在run.sh文件所在路径执行如下命令bash run.sh默认在当前执行路径/build/bin下生成可执行文件test_geir_add_example运行结果如下INFO - [XIR]: Finalize ir graph session success【免费下载链接】ops-cv本项目是CANN提供的图像处理、目标检测相关的算子库实现网络在NPU上加速计算。项目地址: https://gitcode.com/cann/ops-cv创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考