TileRowBroadcastMulTla【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]功能说明TileRowBroadcastMulTla实现 epilogue 阶段的 TLA 风格广播乘法操作。将 UB 上行向量 (1, n) 广播到 (m, n) 矩阵后与输入逐元素相乘。通过ubOut.layout()(ubOut.coord())计算偏移后调用AscendC::Mul。适用范围所有架构无架构特化风格TLA模板原型template class ArchTag_, // 架构标签 class ElementCompute_, // 计算元素类型直接传入非 GemmType class TileShape_ // Tile 形状 struct TileRowBroadcastMulTla;模板参数说明ArchTag_架构标签ElementCompute_计算元素类型如halfTileShape_Tile 形状ShapeROW, COLUMN调用接口template class TensorUbOut, class TensorUbIn0, class TensorUbIn1 void operator()(TensorUbOut const ubOut, TensorUbIn0 const ubIn0, TensorUbIn1 const ubIn1)通过ubOut.layout()(ubOut.coord())计算偏移后调用AscendC::Mul。调用示例#include catlass/epilogue/tile/tile_broadcast_mul.hpp using namespace Catlass::Epilogue::Tile; constexpr uint32_t M 128, N 256; auto layout tla::MakeLayouthalf, layout::RowMajor(M, N); AscendC::LocalTensorhalf ubOutData, ubIn0Data, ubIn1Data; auto ubOut tla::MakeTensor(ubOutData, layout, Arch::PositionUB{}); auto ubIn0 tla::MakeTensor(ubIn0Data, layout, Arch::PositionUB{}); auto ubIn1 tla::MakeTensor(ubIn1Data, layout, Arch::PositionUB{}); TileRowBroadcastMulTlaArch::AtlasA2, half, ShapeM, N op; op(ubOut, ubIn0, ubIn1);【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考