Image Semantic Description Based on Deep Learning with Multi-attention Mechanisms

Jian Yang; Zuqiang Meng

doi:10.1007/978-3-030-00828-4_36

Conference Papers Year : 2018

Image Semantic Description Based on Deep Learning with Multi-attention Mechanisms

(1) , (1)

Jian Yang

Function : Author
PersonId : 1051525

School of Computer and Electronic Information [Guangxi University]

Zuqiang Meng

Function : Author

School of Computer and Electronic Information [Guangxi University]

Abstract

In the era of big data, cross-media and multi-modal data are expanding, and data processing methods fail to meet corresponding functional requirements. Aiming at the characteristic of large expression gap of multi-model data, This paper proposes a multimodal data fusion method based on deep learning, which combines the advantages of deep learning in the field of image detection, text sequence prediction, and the multi-attention mechanism. The BLEU algorithm is used to calculate the similarity of four levels of description statements of model output and image. Training and testing were conducted in the Flickr8K data set. Comparing with the traditional single mode state image description method, the experiments show that under the BLEU index, the multi-AM model can achieve better results.

Keywords

Domains

Computer Science [cs]

Fichier principal

473854_1_En_36_Chapter.pdf (345.44 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-02197765

Submitted on : Tuesday, July 30, 2019-5:00:19 PM

Last modification on : Wednesday, April 30, 2025-11:26:02 AM

Dates and versions

hal-02197765 , version 1 (30-07-2019)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-02197765 , version 1
DOI : 10.1007/978-3-030-00828-4_36

Cite

Jian Yang, Zuqiang Meng. Image Semantic Description Based on Deep Learning with Multi-attention Mechanisms. 10th International Conference on Intelligent Information Processing (IIP), Oct 2018, Nanning, China. pp.356-362, ⟨10.1007/978-3-030-00828-4_36⟩. ⟨hal-02197765⟩

Image Semantic Description Based on Deep Learning with Multi-attention Mechanisms

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share