CJE期刊表格跨栏-啸行的回答

1条回答

2019-11-03 16:34

没看懂你的这个代码。 `multicols` 环境，是模板里带的，还是你自己加的？

回答: 2019-11-03 16:34

```tex %\documentclass{article} \documentclass{dianzixuebao} \newcommand{\MDIyear}{xxxx}%年 \newcommand{\MDImonth}{xx}%月 \newcommand{\MDIissuevolume}{xx}%卷 \newcommand{\MDIissuenumber}{xx}%期 \newcommand{\MDIshorttitle}{Fers} \usepackage{newfloat,caption} \usepackage{subcaption} \usepackage{graphicx} \usepackage[svgnames]{xcolor} \usepackage{multicol} \usepackage{multirow} \usepackage{booktabs} \usepackage{tabularx} \usepackage{array} \usepackage{booktabs} % For formal tables \usepackage{xcolor} \usepackage{amssymb} \usepackage{graphicx} \usepackage{textcomp} \usepackage{tabularx} \usepackage[misc,geometry]{ifsym} %use a small envelope with superscript right after the person's name to denote corresponding authorship \usepackage{array} \begin{document} \begin{multicols}{2} To compare the image throughput performance of data parallel training and inference of DNN models on the experimental cluster with the corresponding image throughput performance obtained with the CUDA-enabled GPU workstation, CUDA-accelerated and cuDNN-accelerated DNNs are also implemented. For comparison, Figure \ref{fig:clusterthr} shows the image throughput of data parallel training and inference of YOLOv3, ResNet-152 and DenseNet-201 on the experimental ARMv8 CPU cluster and the GPU workstation. \texttt{Train\_FTCL} and \texttt{Inference \_FTCL} denote the image throughput realized by using the proposed FTCL-Darknet framework on the experimental many-core CPU cluster for the training and inference of DNN models independently. \texttt{Train\_CUDA\_1080Ti} and \texttt{Inference \_CUDA\_1080Ti} denote the image throughput obtained with the CUDA-accelerated Darknet on the GPU workstation without using cuDNN, while \texttt{Train\_CUDNN\_1080Ti} and \texttt{Inference\_CUDNN\_1080Ti} denote the image throughput achieved with the cuDNN-accelerated Darknet on the GPU workstation. The data parallel training performance of YOLOv3, ResNet-152 and DenseNet-201 on the experimental ARMv8 many-core CPU cluster reach 1.3, 2.5 and 2.8 image/s respectively. They nearly achieve 16.1\% of the training performance obtained on the CUDA-enabled GPU workstation on average, and approximately achieve 3.8\%, 7.9\% and 7.4\% of the training performance accelerated by using the cuDNN-enabled GPU workstation respectively. On the other hand, the parallel inference performance achieves 7.1, 6.2 and 5.9 images/s respectively. They achieve 17.6\% of the inference performance obtained on the CUDA-enabled GPU workstation on average, and approximately achieve 14.3\%, 16.1\% and 15.3\% of the inference performance accelerated by using the cuDNN-enabled GPU workstation. \end{multicols} \astable{ \astabletitle{\bf Table 1.\ DNN models and datasets} \astableobj{ \setlength{\tabcolsep}{3mm}{ \begin{tabular}{l l c l c c} %\begin{tabular}{lp{7em} p{3em} p{3em} p{em} p{9em}} \toprule DNN &Input size &Batchsize &Convolution layers &Dataset \\ \midrule YOLOv3 \cite{Redmon2018p} &$416\times416$ &64 &75/107 &MS COCO2014\\ \midrule ResNet-152 \cite{He20162ICoCVaPRC770} &$256\times256$ &256 &152/206 &ImageNet2012\\ \midrule DenseNet-201 \cite{Huang2017ICoCVaPRC2261} &$256\times256$ &256 &201/305 &ImageNet2012\\ \bottomrule \end{tabular}% } } } \begin{multicols}{2} To compare the image throughput performance of data parallel training and inference of DNN models on the experimental cluster with the corresponding image throughput performance obtained with the CUDA-enabled GPU workstation, CUDA-accelerated and cuDNN-accelerated DNNs are also implemented. For comparison, Figure \ref{fig:clusterthr} shows the image throughput of data parallel training and inference of YOLOv3, ResNet-152 and DenseNet-201 on the experimental ARMv8 CPU cluster and the GPU workstation. \texttt{Train\_FTCL} and \texttt{Inference \_FTCL} denote the image throughput realized by using the proposed FTCL-Darknet framework on the experimental many-core CPU cluster for the training and inference of DNN models independently. \texttt{Train\_CUDA\_1080Ti} and \texttt{Inference \_CUDA\_1080Ti} denote the image throughput obtained with the CUDA-accelerated Darknet on the GPU workstation without using cuDNN, while \texttt{Train\_CUDNN\_1080Ti} and \texttt{Inference\_CUDNN\_1080Ti} denote the image throughput achieved with the cuDNN-accelerated Darknet on the GPU workstation. The data parallel training performance of YOLOv3, ResNet-152 and DenseNet-201 on the experimental ARMv8 many-core CPU cluster reach 1.3, 2.5 and 2.8 image/s respectively. They nearly achieve 16.1\% of the training performance obtained on the CUDA-enabled GPU workstation on average, and approximately achieve 3.8\%, 7.9\% and 7.4\% of the training performance accelerated by using the cuDNN-enabled GPU workstation respectively. On the other hand, the parallel inference performance achieves 7.1, 6.2 and 5.9 images/s respectively. They achieve 17.6\% of the inference performance obtained on the CUDA-enabled GPU workstation on average, and approximately achieve 14.3\%, 16.1\% and 15.3\% of the inference performance accelerated by using the cuDNN-enabled GPU workstation. \end{multicols} \end{document} ```

CJE期刊表格跨栏

回答: 2019-11-03 16:34

一周热门更多>

相关问答

相关文章

CJE期刊表格跨栏

回答: 2019-11-03 16:34

一周热门 更多>

相关问答

相关文章

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间

一周热门更多>