Display Method:
Review
A Review and Outlook on Predictive Cruise Control of Vehicles and Typical Applications Under Cloud Control System
Bolin Gao, Keke Wan, Qien Chen, Zhou Wang, Rui Li, Yu Jiang, Run Mei, Yinghui Luo, Keqiang Li
doi: 10.1007/s11633-022-1395-3
Abstract PDF SpringerLink Bib
x
Abstract:
With the application of mobile communication technology in the automotive industry, intelligent connected vehicles equipped with communication and sensing devices have been rapidly promoted. The road and traffic information perceived by intelligent vehicles has important potential application value, especially for improving the energy-saving and safe-driving of vehicles as well as the efficient operation of traffic. Therefore, a type of vehicle control technology called predictive cruise control (PCC) has become a hot research topic. It fully taps the perceived or predicted environmental information to carry out predictive cruise control of vehicles and improves the comprehensive performance of the vehicle-road system. Most existing reviews focus on the economical driving of vehicles, but few scholars have conducted a comprehensive survey of PCC from theory to the status quo. In this paper, the methods and advances of PCC technologies are reviewed comprehensively by investigating the global literature, and typical applications under a cloud control system (CCS) are proposed. Firstly, the methodology of PCC is generally introduced. Then according to typical scenarios, the PCC-related research is deeply surveyed, including freeway and urban traffic scenarios involving traditional vehicles, new energy vehicles, intelligent vehicles, and multi-vehicle platoons. Finally, the general architecture and three typical applications of the cloud control system (CCS) on PCC are briefly introduced, and the prospect and future trends of PCC are proposed.
Machine Learning in Lung Cancer Radiomics
Jiaqi Li, Zhuofeng Li, Lei Wei, Xuegong Zhang
doi: 10.1007/s11633-022-1364-x
Abstract PDF SpringerLink Bib
x
Abstract:
Lung cancer is the leading cause of cancer-related deaths worldwide. Medical imaging technologies such as computed tomography (CT) and positron emission tomography (PET) are routinely used for non-invasive lung cancer diagnosis. In clinical practice, physicians investigate the characteristics of tumors such as the size, shape and location from CT and PET images to make decisions. Recently, scientists have proposed various computational image features that can capture more information than that directly perceivable by human eyes, which promotes the rise of radiomics. Radiomics is a research field on the conversion of medical images into high-dimensional features with data-driven methods to help subsequent data mining for better clinical decision support. Radiomic analysis has four major steps: image preprocessing, tumor segmentation, feature extraction and clinical prediction. Machine learning, including the high-profile deep learning, facilitates the development and application of radiomic methods. Various radiomic methods have been proposed recently, such as the construction of radiomic signatures, tumor habitat analysis, cluster pattern characterization and end-to-end prediction of tumor properties. These methods have been applied in many studies aiming at lung cancer diagnosis, treatment and monitoring, shedding light on future non-invasive evaluations of the nodule malignancy, histological subtypes, genomic properties and treatment responses. In this review, we summarized and categorized the studies on the general workflow, methods for clinical prediction and clinical applications of machine learning in lung cancer radiomic studies, introduced some commonly-used software tools, and discussed the limitations of current methods and possible future directions.
Research Article
MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition
Luequan Wang, Hongbin Xu, Wenxiong Kang
doi: 10.1007/s11633-023-1430-z
Abstract PDF SpringerLink Bib
x
Abstract:
3D shape recognition has drawn much attention in recent years. The view-based approach performs best of all. However, the current multi-view methods are almost all fully supervised, and the pretraining models are almost all based on ImageNet. Although the pretraining results of ImageNet are quite impressive, there is still a significant discrepancy between multi-view datasets and ImageNet. Multi-view datasets naturally retain rich 3D information. In addition, large-scale datasets such as ImageNet require considerable cleaning and annotation work, so it is difficult to regenerate a second dataset. In contrast, unsupervised learning methods can learn general feature representations without any extra annotation. To this end, we propose a three-stage unsupervised joint pretraining model. Specifically, we decouple the final representations into three fine-grained representations. Data augmentation is utilized to obtain pixel-level representations within each view. And we boost the spatial invariant features from the view level. Finally, we exploit global information at the shape level through a novel extract-and-swap module. Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks, and shows generalization to cross-dataset tasks.
Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval
Haoyu Lu, Yuqi Huo, Mingyu Ding, Nanyi Fei, Zhiwu Lu
doi: 10.1007/s11633-022-1386-4
Abstract PDF SpringerLink Bib
x
Abstract:
Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient.
YuNet: A Tiny Millisecond-level Face Detector
Wei Wu, Hanyang Peng, Shiqi Yu
doi: 10.1007/s11633-023-1423-y
Abstract PDF SpringerLink Bib
x
Abstract:
Great progress has been made toward accurate face detection in recent years. However, the heavy model and expensive computation costs make it difficult to deploy many detectors on mobile and embedded devices where model size and latency are highly constrained. In this paper, we present a millisecond-level anchor-free face detector, YuNet, which is specifically designed for edge devices. There are several key contributions in improving the efficiency-accuracy trade-off. First, we analyse the influential state-of-the-art face detectors in recent years and summarize the rules to reduce the size of models. Then, a lightweight face detector, YuNet, is introduced. Our detector contains a tiny and efficient feature extraction backbone and a simplified pyramid feature fusion neck. To the best of our knowledge, YuNet has the best trade-off between accuracy and speed. It has only 75 856 parameters and is less than 1/5 of other small-size detectors. In addition, a training strategy is presented for the tiny face detector, and it can effectively train models with the same distribution of the training set. The proposed YuNet achieves 81.1% mAP (single-scale) on the WIDER FACE validation hard track with a high inference efficiency (Intel i7-12700K: 1.6 ms per frame at 320×320). Because of its unique advantages, the repository for YuNet and its predecessors has been popular at GitHub and gained more than 11 K stars at https://github.com/ShiqiYu/libfacedetection
Multimodal Biometric Fusion Algorithm Based on Ranking Partition Collision Theory
Zhuorong Li, Yunqi Tang
doi: 10.1007/s11633-022-1403-7
Abstract PDF SpringerLink Bib
x
Abstract:
Score-based multimodal biometric fusion has been shown to be successful in addressing the problem of unimodal techniques′ vulnerability to attack and poor performance in low-quality data. However, difficulties still exist in how to unify the meaning of heterogeneous scores more effectively. Aside from the matching scores themselves, the importance of the ranking information they include has been undervalued in previous studies. This study concentrates on matching scores and their ranking information and suggests the ranking partition collision (RPC) theory from the standpoint of the worth of scores. To meet both forensic and judicial needs, this paper proposes a method that employs a neural network to fuse biometrics at the score level. In addition, this paper constructs a virtual homologous dataset and conducts experiments on it. Experimental results demonstrate that the proposed method achieves an accuracy of 100% in both mAP and Rank1. To show the efficiency of the proposed method in practical applications, this work carries out more experiments utilizing real-world data. The results show that the proposed approach maintains a Rank1 accuracy of 99.2% on the million-scale database. It offers a novel approach to fusion at the score level.
Biological Eagle-eye Inspired Target Detection for Unmanned Aerial Vehicles Equipped with a Manipulator
Yi-Min Deng, Si-Yuan Wang
doi: 10.1007/s11633-022-1342-3
Abstract PDF SpringerLink Bib
x
Abstract:
Inspired by eagle eye mechanisms, the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle (UAV) with a mechanical arm. In this paper, a novel eagle-eye inspired multi-camera sensor and a saliency detection method are proposed. A combined camera system is built by simulating the double fovea structure on the eagle retina. A saliency target detection method based on the eagle midbrain inhibition mechanism is proposed by measuring the static saliency information and dynamic features. Thus, salient targets can be accurately detected through the collaborative work between different cameras of the proposed multi-camera sensor. Experimental results show that the eagle-eye inspired visual system is able to continuously detect targets in outdoor scenes and that the proposed algorithm has a strong inhibitory effect on moving background interference.
A New Diagnosis Method with Few-shot Learning Based on a Class-rebalance Strategy for Scarce Faults in Industrial Processes
Xinyao Xu, De Xu, Fangbo Qin
doi: 10.1007/s11633-022-1363-y
Abstract PDF SpringerLink Bib
x
Abstract:
For industrial processes, new scarce faults are usually judged by experts. The lack of instances for these faults causes a severe data imbalance problem for a diagnosis model and leads to low performance. In this article, a new diagnosis method with few-shot learning based on a class-rebalance strategy is proposed to handle the problem. The proposed method is designed to transform instances of the different faults into a feature embedding space. In this way, the fault features can be transformed into separate feature clusters. The fault representations are calculated as the centers of feature clusters. The representations of new faults can also be effectively calculated with few support instances. Therefore, fault diagnosis can be achieved by estimating feature similarity between instances and faults. A cluster loss function is designed to enhance the feature clustering performance. Also, a class-rebalance strategy with data augmentation is designed to imitate potential faults with different reasons and degrees of severity to improve the model′s generalizability. It improves the diagnosis performance of the proposed method. Simulations of fault diagnosis with the proposed method were performed on the Tennessee-Eastman benchmark. The proposed method achieved average diagnosis accuracies ranging from 81.8% to 94.7% for the eight selected faults for the simulation settings of support instances ranging from 3 to 50. The simulation results verify the effectiveness of the proposed method.
Region-adaptive Concept Aggregation for Few-shot Visual Recognition
Mengya Han, Yibing Zhan, Baosheng Yu, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao
doi: 10.1007/s11633-022-1358-8
Abstract PDF SpringerLink Bib
x
Abstract:
Few-shot learning (FSL) aims to learn novel concepts from very limited examples. However, most FSL methods suffer from the issue of lacking robustness in concept learning. Specifically, existing FSL methods usually ignore the diversity of region contents that may contain concept-irrelevant information such as the background, which would introduce bias/noise and degrade the performance of conceptual representation learning. To address the above-mentioned issue, we propose a novel metric-based FSL method termed region-adaptive concept aggregation network or RCA-Net. Specifically, we devise a region-adaptive concept aggregator (RCA) to model the relationships of different regions and capture the conceptual information in different regions, which are then integrated in a weighted average manner to obtain the conceptual representation. Consequently, robust concept learning can be achieved by focusing more on the concept-relevant information and less on the conceptual-irrelevant information. We perform extensive experiments on three popular visual recognition benchmarks to demonstrate the superiority of RCA-Net for robust few-shot learning. In particular, on the Caltech-UCSD Birds-200-2011 (CUB200) dataset, the proposed RCA-Net significantly improves 1-shot accuracy from 74.76% to 78.03% and 5-shot accuracy from 86.84% to 89.83% compared with the most competitive counterpart.
Dual-domain and Multiscale Fusion Deep Neural Network for PPG Biometric Recognition
Chun-Ying Liu, Gong-Ping Yang, Yu-Wen Huang, Fu-Xian Huang
doi: 10.1007/s11633-022-1366-8
Abstract PDF SpringerLink Bib
x
Abstract:
Photoplethysmography (PPG) biometrics have received considerable attention. Although deep learning has achieved good performance for PPG biometrics, several challenges remain open: 1) How to effectively extract the feature fusion representation from time and frequency PPG signals. 2) How to effectively capture a series of PPG signal transition information. 3) How to extract time-varying information from one-dimensional time-frequency sequential data. To address these challenges, we propose a dual-domain and multiscale fusion deep neural network (DMFDNN) for PPG biometric recognition. The DMFDNN is mainly composed of a two-branch deep learning framework for PPG biometrics, which can learn the time-varying and multiscale discriminative features from the time and frequency domains. Meanwhile, we design a multiscale extraction module to capture transition information, which consists of multiple convolution layers with different receptive fields for capturing multiscale transition information. In addition, the dual-domain attention module is proposed to strengthen the domain of greater contributions from time-domain and frequency-domain data for PPG biometrics. Experiments on the four datasets demonstrate that DMFDNN outperforms the state-of-the-art methods for PPG biometrics.
FedFV: A Personalized Federated Learning Framework for Finger Vein Authentication
Feng-Zhao Lian, Jun-Duan Huang, Ji-Xin Liu, Guang Chen, Jun-Hong Zhao, Wen-Xiong Kang
doi: 10.1007/s11633-022-1341-4
Abstract PDF SpringerLink Bib
x
Abstract:
Most finger vein authentication systems suffer from the problem of small sample size. However, the data augmentation can alleviate this problem to a certain extent but did not fundamentally solve the problem of category diversity. So the researchers resort to pre-training or multi-source data joint training methods, but these methods will lead to the problem of user privacy leakage. In view of the above issues, this paper proposes a federated learning-based finger vein authentication framework (FedFV) to solve the problem of small sample size and category diversity while protecting user privacy. Through training under FedFV, each client can share the knowledge learned from its user′s finger vein data with the federated client without causing template leaks. In addition, we further propose an efficient personalized federated aggregation algorithm, named federated weighted proportion reduction (FedWPR), to tackle the problem of non-independent identically distribution caused by client diversity, thus achieving the best performance for each client. To thoroughly evaluate the effectiveness of FedFV, comprehensive experiments are conducted on nine publicly available finger vein datasets. Experimental results show that FedFV can improve the performance of the finger vein authentication system without directly using other client data. To the best of our knowledge, FedFV is the first personalized federated finger vein authentication framework, which has some reference value for subsequent biometric privacy protection research.
ECG Biometrics via Enhanced Correlation and Semantic-rich Embedding
Kui-Kui Wang, Gong-Ping Yang, Lu Yang, Yu-Wen Huang, Yi-Long Yin
doi: 10.1007/s11633-022-1345-0
Abstract PDF SpringerLink Bib
x
Abstract:
Electrocardiogram (ECG) biometric recognition has gained considerable attention, and various methods have been proposed to facilitate its development. However, one limitation is that the diversity of ECG signals affects the recognition performance. To address this issue, in this paper, we propose a novel ECG biometrics framework based on enhanced correlation and semantic-rich embedding. Firstly, we construct an enhanced correlation between the base feature and latent representation by using only one projection. Secondly, to fully exploit the semantic information, we take both the label and pairwise similarity into consideration to reduce the influence of ECG sample diversity. Furthermore, to solve the objective function, we propose an effective and efficient algorithm for optimization. Finally, extensive experiments are conducted on two benchmark datasets, and the experimental results show the effectiveness of our framework.
Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients
Cheng-Cheng Ma, Bao-Yuan Wu, Yan-Bo Fan, Yong Zhang, Zhi-Feng Li
doi: 10.1007/s11633-022-1328-1
Abstract PDF SpringerLink Bib
x
Abstract:
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
Display Method:
Review
AI in Human-computer Gaming: Techniques, Challenges and Opportunities
Qi-Yue Yin, Jun Yang, Kai-Qi Huang, Mei-Jing Zhao, Wan-Cheng Ni, Bin Liang, Yan Huang, Shu Wu, Liang Wang
2023,  vol. 20,  no. 3, pp. 299-317,  doi: 10.1007/s11633-022-1384-6
Abstract PDF SpringerLink Bib
x
Abstract:
With the breakthrough of AlphaGo, human-computer gaming AI has ushered in a big explosion, attracting more and more researchers all over the world. As a recognized standard for testing artificial intelligence, various human-computer gaming AI systems (AIs) have been developed, such as Libratus, OpenAI Five, and AlphaStar, which beat professional human players. The rapid development of human-computer gaming AIs indicates a big step for decision-making intelligence, and it seems that current techniques can handle very complex human-computer games. So, one natural question arises: What are the possible challenges of current techniques in human-computer gaming and what are the future trends? To answer the above question, in this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs, and real-time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games and the corresponding techniques utilized for achieving professional human-level AIs; 2) summarize the mainstream frameworks and techniques that can be properly relied on for developing AIs for complex human-computer games; 3) raise the challenges or drawbacks of current techniques in the successful AIs; and 4) try to point out future trends in human-computer gaming AIs. Finally, we hope that this brief review can provide an introduction for beginners and inspire insight for researchers in the field of AI in human-computer gaming.
A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning
Wai-Chung Kwan, Hong-Ru Wang, Hui-Min Wang, Kam-Fai Wong
2023,  vol. 20,  no. 3, pp. 318-334,  doi: 10.1007/s11633-022-1347-y
Abstract PDF SpringerLink Bib
x
Abstract:
Dialogue policy learning (DPL) is a key component in a task-oriented dialogue (TOD) system. Its goal is to decide the next action of the dialogue system, given the dialogue state at each turn based on a learned dialogue policy. Reinforcement learning (RL) is widely used to optimize this dialogue policy. In the learning process, the user is regarded as the environment and the system as the agent. In this paper, we present an overview of the recent advances and challenges in dialogue policy from the perspective of RL. More specifically, we identify the problems and summarize corresponding solutions for RL-based dialogue policy learning. In addition, we provide a comprehensive survey of applying RL to DPL by categorizing recent methods into five basic elements in RL. We believe this survey can shed light on future research in DPL.
Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects
Rui Jiang, Ruixiang Zhu, Hu Su, Yinlin Li, Yuan Xie, Wei Zou
2023,  vol. 20,  no. 3, pp. 335-369,  doi: 10.1007/s11633-022-1378-4
Abstract PDF SpringerLink Bib
x
Abstract:
Moving object segmentation (MOS), aiming at segmenting moving objects from video frames, is an important and challenging task in computer vision and with various applications. With the development of deep learning (DL), MOS has also entered the era of deep models toward spatiotemporal feature learning. This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years. Specifically, we present a more up-to-date categorization based on model characteristics, then compare and discuss each category from feature learning (FL), and model training and evaluation perspectives. For FL, the methods reviewed are divided into three types: spatial FL, temporal FL, and spatiotemporal FL, then analyzed from input and model architectures aspects, three input types, and four typical preprocessing subnetworks are summarized. In terms of training, we discuss ideas for enhancing model transferability. In terms of evaluation, based on a previous categorization of scene dependent evaluation and scene independent evaluation, and combined with whether used videos are recorded with static or moving cameras, we further provide four subdivided evaluation setups and analyze that of reviewed methods. We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology. Finally, based on the above comparisons and discussions, we present research prospects and future directions.
A Survey on Collaborative DNN Inference for Edge Intelligence
Wei-Qing Ren, Yu-Ben Qu, Chao Dong, Yu-Qian Jing, Hao Sun, Qi-Hui Wu, Song Guo
2023,  vol. 20,  no. 3, pp. 370-395,  doi: 10.1007/s11633-022-1391-7
Abstract PDF SpringerLink Bib
x
Abstract:
With the vigorous development of artificial intelligence (AI), intelligence applications based on deep neural networks (DNNs) have changed people′s lifestyles and production efficiency. However, the large amount of computation and data generated from the network edge becomes the major bottleneck, and the traditional cloud-based computing mode has been unable to meet the requirements of realtime processing tasks. To solve the above problems, by embedding AI model training and inference capabilities into the network edge, edge intelligence (EI) becomes a cutting-edge direction in the field of AI. Furthermore, collaborative DNN inference among the cloud, edge, and end devices provides a promising way to boost EI. Nevertheless, at present, EI oriented collaborative DNN inference is still in its early stage, lacking systematic classification and discussion of existing research efforts. Motivated by it, we have comprehensively investigated recent studies on EI-oriented collaborative DNN inference. In this paper, we first review the background and motivation of EI. Then, we classify four typical collaborative DNN inference paradigms for EI, and analyse their characteristics and key technologies. Finally, we summarize the current challenges of collaborative DNN inference, discuss future development trends and provide future research directions.
Research Article
Dynamic Movement Primitives Based Robot Skills Learning
Ling-Huan Kong, Wei He, Wen-Shi Chen, Hui Zhang, Yao-Nan Wang
2023,  vol. 20,  no. 3, pp. 396-407,  doi: 10.1007/s11633-022-1346-z
Abstract PDF SpringerLink Bib
x
Abstract:
In this article, a robot skills learning framework is developed, which considers both motion modeling and execution. In order to enable the robot to learn skills from demonstrations, a learning method called dynamic movement primitives (DMPs) is introduced to model motion. A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators. The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences. In addition, motions are categorized into different goals and durations. It is worth mentioning that an adaptive neural networks (NNs) control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution, which is beneficial to the improvement of reliability of the skills learning system. The experiment test on the Baxter robot verifies the effectiveness of the proposed method.
Robust Local Light Field Synthesis via Occlusion-aware Sampling and Deep Visual Feature Fusion
Wenpeng Xing, Jie Chen, Yike Guo
2023,  vol. 20,  no. 3, pp. 408-420,  doi: 10.1007/s11633-022-1381-9
Abstract PDF SpringerLink Bib
x
Abstract:
Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence. Rendering a locally immersive light field (LF) based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques. In this work, we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view. To fully explore the precious information from source LF captures, we propose a novel occlusion-aware source sampler (OSS) module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner. An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF. The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles, but also proves to be able to effectively enhance the visual rendering quality. Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.
Masked Vision-language Transformer in Fashion
Ge-Peng Ji, Mingchen Zhuge, Dehong Gao, Deng-Ping Fan, Christos Sakaridis, Luc Van Gool
2023,  vol. 20,  no. 3, pp. 421-434,  doi: 10.1007/s11633-022-1394-4
Abstract PDF SpringerLink Bib
x
Abstract:
We present a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation. Technically, we simply utilize the vision transformer architecture for replacing the bidirectional encoder representations from Transformers (BERT) in the pre-training model, making MVLT the first end-to-end framework for the fashion domain. Besides, we designed masked image reconstruction (MIR) for a fine-grained understanding of fashion. MVLT is an extensible and convenient architecture that admits raw multi-modal inputs without extra pre-processing models (e.g., ResNet), implicitly modeling the vision-language alignments. More importantly, MVLT can easily generalize to various matching and generative tasks. Experimental results show obvious improvements in retrieval (rank@5: 17%) and recognition (accuracy: 3%) tasks over the Fashion-Gen 2018 winner, Kaleido-BERT. The code is available at https://github.com/GewelsJI/MVLT.
Symmetric-threshold ReLU for Fast and Nearly Lossless ANN-SNN Conversion
Jianing Han, Ziming Wang, Jiangrong Shen, Huajin Tang
2023,  vol. 20,  no. 3, pp. 435-446,  doi: 10.1007/s11633-022-1388-2
Abstract PDF SpringerLink Bib
x
Abstract:
The artificial neural network-spiking neural network (ANN-SNN) conversion, as an efficient algorithm for deep SNNs training, promotes the performance of shallow SNNs, and expands the application in various tasks. However, the existing conversion methods still face the problem of large conversion error within low conversion time steps. In this paper, a heuristic symmetric-threshold rectified linear unit (stReLU) activation function for ANNs is proposed, based on the intrinsically different responses between the integrate-and-fire (IF) neurons in SNNs and the activation functions in ANNs. The negative threshold in stReLU can guarantee the conversion of negative activations, and the symmetric thresholds enable positive error to offset negative error between activation value and spike firing rate, thus reducing the conversion error from ANNs to SNNs. The lossless conversion from ANNs with stReLU to SNNs is demonstrated by theoretical formulation. By contrasting stReLU with asymmetric-threshold LeakyReLU and threshold ReLU, the effectiveness of symmetric thresholds is further explored. The results show that ANNs with stReLU can decrease the conversion error and achieve nearly lossless conversion based on the MNIST, Fashion-MNIST, and CIFAR10 datasets, with 6× to 250 speedup compared with other methods. Moreover, the comparison of energy consumption between ANNs and SNNs indicates that this novel conversion algorithm can also significantly reduce energy consumption.
Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey
De-Rong Liu, Hong-Liang, Li Ding Wang
2015,  vol. 12,  no. 3, pp. 229-242,  doi: 10.1007/s11633-015-0893-y
Abstract PDF SpringerLink
Second-order Sliding Mode Approaches for the Control of a Class of Underactuated Systems
Sonia Mahjoub, Faiçal Mnif, Nabil Derbel
2015,  vol. 12,  no. 2, pp. 134-141,  doi: 10.1007/s11633-015-0880-3
Abstract PDF SpringerLink
Quantization Based Watermarking Methods Against Valumetric Distortions
Zai-Ran Wang, Jing Dong, Wei Wang
2017,  vol. 14,  no. 6, pp. 672-685,  doi: 10.1007/s11633-016-1010-6
Abstract PDF SpringerLink
Innovative Developments in HCI and Future Trends
Mohammad S. Hasan, Hongnian Yu
2017,  vol. 14,  no. 1, pp. 10-20,  doi: 10.1007/s11633-016-1039-6
Abstract PDF SpringerLink
Genetic Algorithm with Variable Length Chromosomes for Network Intrusion Detection
Sunil Nilkanth Pawar, Rajankumar Sadashivrao Bichkar
2015,  vol. 12,  no. 3, pp. 337-342,  doi: 10.1007/s11633-014-0870-x
Abstract PDF SpringerLink
Cooperative Formation Control of Autonomous Underwater Vehicles: An Overview
Bikramaditya Das, Bidyadhar Subudhi, Bibhuti Bhusan Pati
2016,  vol. 13,  no. 3, pp. 199-225,  doi: 10.1007/s11633-016-1004-4
Abstract PDF SpringerLink
Recent Progress in Networked Control Systems-A Survey
Yuan-Qing Xia, Yu-Long Gao, Li-Ping Yan, Meng-Yin Fu
2015,  vol. 12,  no. 4, pp. 343-367,  doi: 10.1007/s11633-015-0894-x
Abstract PDF SpringerLink
Grey Qualitative Modeling and Control Method for Subjective Uncertain Systems
Peng Wang, Shu-Jie Li, Yan Lv, Zong-Hai Chen
2015,  vol. 12,  no. 1, pp. 70-76,  doi: 10.1007/s11633-014-0820-7
Abstract PDF SpringerLink
A Wavelet Neural Network Based Non-linear Model Predictive Controller for a Multi-variable Coupled Tank System
Kayode Owa, Sanjay Sharma, Robert Sutton
2015,  vol. 12,  no. 2, pp. 156-170,  doi: 10.1007/s11633-014-0825-2
Abstract PDF SpringerLink
Advances in Vehicular Ad-hoc Networks (VANETs): Challenges and Road-map for Future Development
Elias C. Eze, Si-Jing Zhang, En-Jie Liu, Joy C. Eze
2016,  vol. 13,  no. 1, pp. 1-18,  doi: 10.1007/s11633-015-0913-y
Abstract PDF SpringerLink
An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers
Danasingh Asir Antony Gnana Singh, Subramanian Appavu Alias Balamurugan, Epiphany Jebamalar Leavline
2015,  vol. 12,  no. 5, pp. 511-517,  doi: 10.1007/s11633-014-0859-5
Abstract PDF SpringerLink
Sliding Mode and PI Controllers for Uncertain Flexible Joint Manipulator
Lilia Zouari, Hafedh Abid, Mohamed Abid
2015,  vol. 12,  no. 2, pp. 117-124,  doi: 10.1007/s11633-015-0878-x
Abstract PDF SpringerLink
Bounded Real Lemmas for Fractional Order Systems
Shu Liang, Yi-Heng Wei, Jin-Wen Pan, Qing Gao, Yong Wang
2015,  vol. 12,  no. 2, pp. 192-198,  doi: 10.1007/s11633-014-0868-4
Abstract PDF SpringerLink
Robust Face Recognition via Low-rank Sparse Representation-based Classification
Hai-Shun Du, Qing-Pu Hu, Dian-Feng Qiao, Ioannis Pitas
2015,  vol. 12,  no. 6, pp. 579-587,  doi: 10.1007/s11633-015-0901-2
Abstract PDF SpringerLink
Extracting Parameters of OFET Before and After Threshold Voltage Using Genetic Algorithms
Imad Benacer, Zohir Dibi
2016,  vol. 13,  no. 4, pp. 382-391,  doi: 10.1007/s11633-015-0918-6
Abstract PDF SpringerLink
Analysis of Fractional-order Linear Systems with Saturation Using Lyapunov s Second Method and Convex Optimization
Esmat Sadat Alaviyan Shahri, Saeed Balochian
2015,  vol. 12,  no. 4, pp. 440-447,  doi: 10.1007/s11633-014-0856-8
Abstract PDF SpringerLink
Distributed Control of Chemical Process Networks
Michael J. Tippett, Jie Bao
2015,  vol. 12,  no. 4, pp. 368-381,  doi: 10.1007/s11633-015-0895-9
Abstract PDF SpringerLink
Backstepping Control of Speed Sensorless Permanent Magnet Synchronous Motor Based on Slide Model Observer
Cai-Xue Chen, Yun-Xiang Xie, Yong-Hong Lan
2015,  vol. 12,  no. 2, pp. 149-155,  doi: 10.1007/s11633-015-0881-2
Abstract PDF SpringerLink
Appropriate Sub-band Selection in Wavelet Packet Decomposition for Automated Glaucoma Diagnoses
Chandrasekaran Raja, Narayanan Gangatharan
2015,  vol. 12,  no. 4, pp. 393-401,  doi: 10.1007/s11633-014-0858-6
Abstract PDF SpringerLink
Generalized Norm Optimal Iterative Learning Control with Intermediate Point and Sub-interval Tracking
David H. Owens, Chris T. Freeman, Bing Chu
2015,  vol. 12,  no. 3, pp. 243-253,  doi: 10.1007/s11633-015-0888-8
Abstract PDF SpringerLink
Flexible Strip Supercapacitors for Future Energy Storage
Rui-Rong Zhang, Yan-Meng Xu, David Harrison, John Fyson, Fu-Lian Qiu, Darren Southee
2015,  vol. 12,  no. 1, pp. 43-49,  doi: 10.1007/s11633-014-0866-6
Abstract PDF SpringerLink