STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition

Citation: Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing. http://doi.org/10.1007/s11633-021-1289-9 doi:  10.1007/s11633-021-1289-9
 Citation: Citation: Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing . http://doi.org/10.1007/s11633-021-1289-9

## STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition

###### Author Bio: Zhi-Wei Xu received the B. Eng. degree in computer science and technology from Harbin Institute of Technology, China in 2017. He is a postgraduate student at School of Artificial Intelligence and Computer Science, Jiangnan University, China. His research interests include computer vision, video understanding and action recognition. E-mail: zhiwei_xu@stu.jiangnan.edu.cn ORCID iD: 0000-0003-1472-431X Xiao-Jun Wu received the B. Sc. degree in mathematics from Nanjing Normal University, China in 1991. He received the M. Sc. and the Ph. D. degrees in pattern recognition and intelligent systems from Nanjing University of Science and Technology, China in 1996 and 2002, respectively. He is currently a professor in artificial intelligent and pattern recognition at the Jiangnan University, China. His research interests include pattern recognition, computer vision, fuzzy systems, neural networks and intelligent systems. E-mail: wu_xiaojun@jiangnan.edu.cn (Corresponding author) ORCID iD: 0000-0002-0310-5778 Josef Kittler received the B. A. degree in electrical science tripos, Ph. D. degree in pattern recognition, and D. Sc. degree from University of Cambridge, UK in 1971, 1974, and 1991, respectively. He is a Distinguished Professor of machine intelligence at Centre for Vision, Speech and Signal Processing, University of Surrey, UK. He conducts research in biometrics, video and image database retrieval, medical image analysis, and cognitive vision. He published the textbook Pattern Recognition: A Statistical Approach and over 700 scientific papers. His publications have been cited more than 66000 times (Google Scholar). He is series editor of Springer Lecture Notes on Computer Science. He currently serves on the Editorial Boards of Pattern Recognition Letters, Pattern Recognition and Artificial Intelligence, and Pattern Analysis and Applications. He also served as a member of the Editorial Board of IEEE Transactions on Pattern Analysis and Machine Intelligence during 1982−1985. He served on the Governing Board of the International Association for Pattern Recognition (IAPR) as one of the two British representatives during the period 1982-2005, and President of the IAPR during 1994−1996. His research interests include robotics, feedback control systems, and control theory. E-mail: j.kittler@surrey.ac.uk ORCID iD: 0000-0002-8110-9205
• Figure  1.  Architecture overview of STRNet. Our STRNet consists of three individual branches that focus on learning appearance, motion and temporal relation information, respectively. For comprehensively representing the information of the whole video, we apply two-stage fusion and separable (2+1)D convolution to reinforce the feature learning. Finally, we apply a decision level weight assignment to adjust the classification performance.

Figure  2.  Feature visualization of STRNet. The first column is the input frames. The second column is the feature maps of Stem. The third column is the fusion feature maps of stage 3. The last column is the output of spatiotemporal with relation feature maps of stage 5. We rescale the feature maps into original size for good comparison.

Figure  3.  The schema of building relation unit, where X denotes the original inputs of the sequential feature maps, and $\tilde{ X}$ denotes the calculated relation maps. The function Fsm(*) is to calculate the similarity measurement. And g denotes the similarity weight vector and Y denotes the final relation response maps.

•  [1] Zhen-Yi Zhao, Yang Cao, Yu Kang, Zhen-Yi Xu.  Prediction of Spatiotemporal Evolution of Urban Traffic Emissions Based on Taxi Trajectories . International Journal of Automation and Computing, doi: 10.1007/s11633-020-1271-y [2] Lu-Jie Zhou, Jian-Wu Dang, Zhen-Hai Zhang.  Fault Information Recognition for On-board Equipment of High-speed Railway Based on Multi-Neural Network Collaboration . International Journal of Automation and Computing, doi: 10.1007/s11633-021-1298-8 [3] Li-Fang Wu, Qi Wang, Meng Jian, Yu Qiao, Bo-Xuan Zhao.  A Comprehensive Review of Group Activity Recognition in Videos . International Journal of Automation and Computing, doi: 10.1007/s11633-020-1258-8 [4] Huan Liu, Gen-Fu Xiao, Yun-Lan Tan, Chun-Juan Ouyang.  Multi-source Remote Sensing Image Registration Based on Contourlet Transform and Multiple Feature Fusion . International Journal of Automation and Computing, doi: 10.1007/s11633-018-1163-6 [5] Shui-Guang Tong, Yuan-Yuan Huang, Zhe-Ming Tong.  A Robust Face Recognition Method Combining LBP with Multi-mirror Symmetry for Images with Various Face Interferences . International Journal of Automation and Computing, doi: 10.1007/s11633-018-1153-8 [6] Bing-Tao Zhang, Xiao-Peng Wang, Yu Shen, Tao Lei.  Dual-modal Physiological Feature Fusion-based Sleep Recognition Using CFS and RF Algorithm . International Journal of Automation and Computing, doi: 10.1007/s11633-019-1171-1 [7] Zhi-Heng Wang, Chao Guo, Hong-Min Liu, Zhan-Qiang Huo.  MFSR: Maximum Feature Score Region-based Captions Locating in News Video Images . International Journal of Automation and Computing, doi: 10.1007/s11633-015-0943-5 [8] Derradji Nada, Mounir Bousbia-Salah, Maamar Bettayeb.  Multi-sensor Data Fusion for Wheelchair Position Estimation with Unscented Kalman Filter . International Journal of Automation and Computing, doi: 10.1007/s11633-017-1065-z [9] Hong-Kai Chen, Xiao-Guang Zhao, Shi-Ying Sun, Min Tan.  PLS-CCA Heterogeneous Features Fusion-based Low-resolution Human Detection Method for Outdoor Video Surveillance . International Journal of Automation and Computing, doi: 10.1007/s11633-016-1029-8 [10] Fadhlan Kamaru Zaman, Amir Akramin Shafie, Yasir Mohd Mustafah.  Robust Face Recognition Against Expressions and Partial Occlusions . International Journal of Automation and Computing, doi: 10.1007/s11633-016-0974-6 [11] Zheng-Huan Zhang, Xiao-Fen Jiang, Hong-Sheng Xi.  Optimal Content Placement and Request Dispatching for Cloud-based Video Distribution Services . International Journal of Automation and Computing, doi: 10.1007/s11633-016-1025-z [12] Hai-Shun Du, Qing-Pu Hu, Dian-Feng Qiao, Ioannis Pitas.  Robust Face Recognition via Low-rank Sparse Representation-based Classification . International Journal of Automation and Computing, doi: 10.1007/s11633-015-0901-2 [13] Li Wang, Rui-Feng Li, Ke Wang, Jian Chen.  Feature Representation for Facial Expression Recognition Based on FACS and LBP . International Journal of Automation and Computing, doi: 10.1007/s11633-014-0835-0 [14] Xiao-Fei Ji, Qian-Qian Wu, Zhao-Jie Ju, Yang-Yang Wang.  Study of Human Action Recognition Based on Improved Spatio-temporal Features . International Journal of Automation and Computing, doi: 10.1007/s11633-014-0831-4 [15] Fu-Shou Lin, Bao-Qun Yin, Jing Huang, Xu-Min Wu.  Admission Control with Elastic QoS for Video on Demand Systems . International Journal of Automation and Computing, doi: 10.1007/s11633-012-0668-7 [16] Jing Wang,  Zhi-Jie Xu.  Video Analysis Based on Volumetric Event Detection . International Journal of Automation and Computing, doi: 10.1007/s11633-010-0516-6 [17] Tie-Jun Li, Gui-Qiang Chen, Gui-Fang Shao.  Action Control of Soccer Robots Based on Simulated Human Intelligence . International Journal of Automation and Computing, doi: 10.1007/s11633-010-0055-1 [18] Vincent Nozick,  Hideo Saito.  On-line Free-viewpoint Video:From Single to Multiple View Rendering . International Journal of Automation and Computing, doi: 10.1007/s11633-008-0257-y [19] Kenji Yamamoto,  Ryutaro Oi.  Color Correction for Multi-view Video Using Energy Minimization of View Networks . International Journal of Automation and Computing, doi: 10.1007/s11633-008-0234-5 [20] Sing Kiong Nguang, Ping Zhang, Steven X. Ding.  Parity Relation Based Fault Estimation for Nonlinear Systems: An LMI Approach . International Journal of Automation and Computing, doi: 10.1007/s11633-007-0164-7

##### 计量
• 文章访问数:  15
• HTML全文浏览量:  53
• PDF下载量:  29
• 被引次数: 0
##### 出版历程
• 收稿日期:  2020-10-30
• 录用日期:  2021-02-05
• 网络出版日期:  2021-03-23

## STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition

### English Abstract

Citation: Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing. http://doi.org/10.1007/s11633-021-1289-9 doi:  10.1007/s11633-021-1289-9
 Citation: Citation: Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing . http://doi.org/10.1007/s11633-021-1289-9

/

• 分享
• 用微信扫码二维码

分享至好友和朋友圈