RpSort16【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit产品支持情况产品是否支持Ascend 950PR/Ascend 950DTxAtlas A3 训练系列产品 / Atlas A3 推理系列产品xAtlas A2 训练系列产品 / Atlas A2 推理系列产品x功能说明根据Region Proposals中的score域对其进行排序score大的排前面每次排16个Region Proposals。函数原型template typename T __aicore__ inline void RpSort16(const LocalTensorT dst, const LocalTensorT src, const int32_t repeatTime)参数说明表 1模板参数说明参数名描述T操作数数据类型。表 2参数说明参数名称输入/输出含义dst输出目的操作数存储经过排序后的Region Proposals。类型为LocalTensor支持的TPosition为VECIN/VECCALC/VECOUT。LocalTensor的起始地址需要32字节对齐。src输入源操作数存储未经过排序的Region Proposals。类型为LocalTensor支持的TPosition为VECIN/VECCALC/VECOUT。LocalTensor的起始地址需要32字节对齐。repeatTime输入重复迭代次数int32_t类型每次排16个Region Proposals。取值范围repeatTime∈[0,255]。约束说明用户需保证src和dst中存储的Region Proposal数目大于实际所需数据否则会存在tensor越界错误。当存在proposal[i]与proposal[j]的score值相同时如果ij则proposal[j]将首先被选出来排在前面。操作数地址对齐要求请参见通用地址对齐约束。调用示例接口使用样例// repeatTime 2, 对2个Region Proposal进行排序 AscendC::RpSort16(dstLocal, dstLocal, 2);完整样例#include kernel_operator.h class KernelVecProposal { public: __aicore__ inline KernelVecProposal() {} __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { srcGlobal.SetGlobalBuffer((__gm__ half*)src); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half)); pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half)); } __aicore__ inline void Process() { CopyIn(); PreProcess(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { AscendC::LocalTensorhalf srcLocal inQueueSrc.AllocTensorhalf(); AscendC::DataCopy(srcLocal, srcGlobal, srcDataSize); inQueueSrc.EnQue(srcLocal); } __aicore__ inline void PreProcess() { AscendC::LocalTensorhalf srcLocal inQueueSrc.DeQuehalf(); AscendC::LocalTensorhalf dstLocal outQueueDst.AllocTensorhalf(); AscendC::ProposalConcat(dstLocal, srcLocal, repeat, mode); // sort排序是基于score的此处先创建一个有score数据的proposal需要注意的是非score处的数据可能是随机值 outQueueDst.EnQuehalf(dstLocal); inQueueSrc.FreeTensor(srcLocal); } __aicore__ inline void Compute() { AscendC::LocalTensorhalf dstLocal outQueueDst.DeQuehalf(); AscendC::RpSort16(dstLocal, dstLocal, repeat); outQueueDst.EnQuehalf(dstLocal); } __aicore__ inline void CopyOut() { AscendC::LocalTensorhalf dstLocal outQueueDst.DeQuehalf(); AscendC::DataCopy(dstGlobal, dstLocal, dstDataSize); outQueueDst.FreeTensor(dstLocal); } private: AscendC::TPipe pipe; AscendC::TQueAscendC::TPosition::VECIN, 1 inQueueSrc; AscendC::TQueAscendC::TPosition::VECOUT, 1 outQueueDst; AscendC::GlobalTensorhalf srcGlobal, dstGlobal; int srcDataSize 32; int dstDataSize 256; int repeat srcDataSize / 16; int mode 4; }; extern C __global__ __aicore__ void vec_proposal_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { KernelVecProposal op; op.Init(src, dstGm); op.Process(); }示例结果 输入数据(src_gm): [ -1.624 -42.3 -54.12 91.25 -99.4 36.72 67.44 -66.3 -52.53 3.377 -62.47 -15.85 -31.47 3.143 58.47 -83.75 21.58 63.47 7.234 35.16 -39.72 37.8 73.06 -98.7 44.1 -77.2 67.2 19.62 -87.9 -14.875 15.86 -77.75] 输出数据(dst_gm): [ 0. 0. 0. 0. 91.25 0. 0. 0. 0. 0. 0. 0. 67.44 0. 0. 0. 0. 0. 0. 0. 58.47 0. 0. 0. 0. 0. 0. 0. 36.72 0. 0. 0. 0. 0. 0. 0. 3.377 0. 0. 0. 0. 0. 0. 0. 3.143 0. 0. 0. 0. 0. 0. 0. -1.624 0. 0. 0. 0. 0. 0. 0. -15.85 0. 0. 0. 0. 0. 0. 0. -31.47 0. 0. 0. 0. 0. 0. 0. -42.3 0. 0. 0. 0. 0. 0. 0. -52.53 0. 0. 0. 0. 0. 0. 0. -54.12 0. 0. 0. 0. 0. 0. 0. -62.47 0. 0. 0. 0. 0. 0. 0. -66.3 0. 0. 0. 0. 0. 0. 0. -83.75 0. 0. 0. 0. 0. 0. 0. -99.4 0. 0. 0. 0. 0. 0. 0. 73.06 0. 0. 0. 0. 0. 0. 0. 67.2 0. 0. 0. 0. 0. 0. 0. 63.47 0. 0. 0. 0. 0. 0. 0. 44.1 0. 0. 0. 0. 0. 0. 0. 37.8 0. 0. 0. 0. 0. 0. 0. 35.16 0. 0. 0. 0. 0. 0. 0. 21.58 0. 0. 0. 0. 0. 0. 0. 19.62 0. 0. 0. 0. 0. 0. 0. 15.86 0. 0. 0. 0. 0. 0. 0. 7.234 0. 0. 0. 0. 0. 0. 0. -14.875 0. 0. 0. 0. 0. 0. 0. -39.72 0. 0. 0. 0. 0. 0. 0. -77.2 0. 0. 0. 0. 0. 0. 0. -77.75 0. 0. 0. 0. 0. 0. 0. -87.9 0. 0. 0. 0. 0. 0. 0. -98.7 0. 0. 0. ]【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考