2020

WP2 Automatic multimodal content analysis

Aku Rouhe, Tuomas Kaseva, and Mikko Kurimo. Speaker-aware training of attention-based end-to-end speech recognition using neural speaker embeddings. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

Phuong Anh Nguyen, Jiaxin Wu, Chong-Wah Ngo, Danny Francis and Benoit Huet. VIREO @ Video browser showdown 2020. In 26th International Conference on MultiMedia Modeling (MMM 2020), 5-8 January 2020, Daejon, Korea.

 

WP4 Multimodal machine translation

Grönroos Stig-Arne, Virpioja Sami, Kurimo Mikko. Morfessor EM+Prune: Improved subword segmentation with expectation maximization and pruning. In Proceedings of the 12th Language Resources and Evaluation Conference, ELRA, Marseilles, France, 2020 (to appear).

Mikko Aulamo, Umut Sulubacak, Sami Virpioja, Jörg Tiedemann. OpusTools and Parallel Corpus Diagnostics. In Proceedings of the 12th Language Resources and Evaluation Conference, ELRA, Marseilles, France, 2020 (to appear).

Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann. Multimodal Machine Translation through Visuals and Speech. In Machine Translation, Springer, 2020 (to appear).

Mary Nurminen and Maarit Koponen. 2020. Machine Translation and Fair Access to Information. Translation Spaces 9(1), 150-169. (link)

Maarit Koponen, Umut Sulubacak, Kaisa Vitikainen, Jörg Tiedemann. MT for subtitling: User evaluation of post-editing productivity. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT 2020), pages 115-124. Online Conference 3-5 November, 2020. (PDF)

Maarit Koponen, Umut Sulubacak, Kaisa Vitikainen, Jörg Tiedemann. MT for Subtitling: Investigating professional translators’ user experience and feedback. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas October 6 – 9, 2020, 1st Workshop on Post-Editing in Modern-Day Translation, pages 79-92. Online, 6 October, 2020. (PDF)

WP5 Human processing in multimodal content description and translation

Starr, K., Braun, S. and Delfani, J. ‘Taking a Cue from the Human: Linguistic and Visual Prompts for the Automatic Sequencing of Multimodal Narrative’. Journal of Audiovisual Translation, 2020. (forthcoming)

Braun, S. and Starr, K. (2020, forthcoming) Innovation in Audio Description Research. London: Routledge. For further information: https://www.routledge.com/Innovation-in-Audio-Description-Research/Braun-Starr/p/book/9781138356672.

Braun, S. and Starr, K. (2020, forthcoming) ‘Comparing human and automated approaches to visual storytelling’, in Braun, S. and Starr, K. (eds) Innovation in Audio Description Research. London: Routledge.

Braun Sabine, and Starr Kim. Byte-Sized Storytelling: Training the Machine to See the Bigger Picture (forthcoming). 13th Languages and the Media, Berlin, December 14-16th.

 

2019

WP2 Automatic multimodal content analysis

Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen and Nazar Zaki. Multi-stream convolutional networks for indoor scene recognition. In Proceedings of the 18th International Conference on Computer Analysis of Images and Patterns (CAIP2019), pages 196–208, Salerno, Italy, September 2019. (PDF)

Tzu-Jui Julius Wang, Hamed Rezazadegan Tavakoli, Mats Sjöberg and Jorma Laaksonen. Geometry-aware relational exemplar attention for dense captioning. In Proceedings of the 1st International Workshop on Multimodal Understanding and Learning for Embodied Applications (MULEA ’19) in ACM Multimedia Conference, pages 3–11, Nice, France, October 2019. (PDF)

Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao and Jorma Laaksonen. Deep contextual attention for human-object interaction detection. In Proceedings of the International Conference on Computer Vision (ICCV2019), pages 5694–5702, Seoul, Korea, October 2019. (PDF)

Héctor Laria Mantecón, Jorma Laaksonen, Danny Francis and Benoit Huet. PicSOM and EURECOM Experiments in TRECVID 2019. In Proceedings of the TRECVID 2019 Workshop. NIST, Gaithersburg, MA, USA, November 2019. (PDF)

Jean Carrive. Using Artificial Intelligence to Preserve Audiovisual Archives: New Horizons, More Questions. In Proceedings of the 27th ACM International Conference on Multimedia, Conference proceedings, Pages 1–2, Nice, France, October 2019. Invited keynote. (PDF)

David Doukhan, Géraldine Poels, Zohra Rezgui and Jean Carrive. Describing Gender Equality in French Audiovisual Streams with a Deep Learning Approach. VIEW Journal of European Television History and Culture, 7(14), pp.103. 2019 (PDF)

Stefanos Vrochidis, Benoit Huet, Edward Y. Chang and Ioannis Kompatsiaris. Big data analytics for large-scale multimedia search. 2019. Wiley, ISBN: 978-1119376972

Danny Francis, Phuong Anh Nguyenn, Benoit Huet and Chong-Wah Ngo. Fusion of multimodal embeddings for ad-hoc video search. In 1st International Workshop on Video Retrieval Methods and Their Limits (ViRaL 2019), co-located with ICCV 2019, Seoul, Korea, October 2019.

Danny Francis, Benoit Huet. L-STAP: Learned Spatio-Temporal Adaptive Pooling for video captioning. In 1st International Workshop on AI for smart TV content production, access and delivery (AI4TV 2019), co-located with the 27th ACM International Conference on Multimedia, October 2019, Nice, France (EURECOM)

Danny Francis, Phuong Anh Nguyen, Benoit Huet and Chong-Wah Ngo. EURECOM at TRECVid AVS 2019. In Proceedings of the TRECVID 2019 Workshop. NIST, Gaithersburg, MA, USA, November 2019. (EURECOM)

Phuong Anh Nguyen, Jiaxin Wu, Chong-Wah Ngo, Francis Danny and Benoit Huet. VIREO-EURECOM @ TRECVID 2019: Ad-hoc Video Search (AVS). In Proceedings of the TRECVID 2019 Workshop. NIST, Gaithersburg, MA, USA, November 2019. (EURECOM)

Tuomas Kaseva, Aku Rouhe, and Mikko Kurimo. Spherediar – an efficient speaker diarization system for meeting data. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019. (PDF)

Arturs Polis. Paragraph-length image captioning using hierarchical recurrent neural networks. Master’s Thesis, University of Helsinki, 2019. (PDF)

Héctor Laria Mantecón. Deep Reinforcement Sequence Learning for Visual Captioning. Master’s Thesis, Aalto University, 2019. (PDF)

Aditya Surikuchi: Visual Storytelling: Captioning of Image Sequences. Master’s Thesis, Aalto University, 2019. (PDF)

Tuomas Kaseva. Spherediar – an efficient speaker diarization system for meeting data. Master’s thesis, Aalto University, 2019. (PDF)

Zohra Rezgui. Détection et classification de visages pour la description de l’égalité femme-homme dans les archives télévisuelles, INA, 2019. (PDF)

Danny Francis. Semantic representations of images and videos. PhD Thesis, Sorbonne University/EURECOM. December 2019.

 

WP3 Media enrichment and hyperlinking

Mohammad Reza Kavoosifar, Daniele Apiletti, Elena Baralis, Paolo Garza and Benoit Huet. Effective video hyperlinking by means of enriched feature sets and monomodal query combinations. International Journal of Multimedia Information Retrieval, 2019

Lorenzo Canale, Pasquale Lisena and Raphaël Troncy. Une nouvelle méthode ensembliste pour la reconnaissance et la désambiguïsation d’entités nommées en utilisant des réseaux de neurones. In Journées francophones d’Ingénierie des Connaissances (IC 2019), July 2019, Toulouse, France

Pasquale Lisena, Albert Meroño-Peñuela, Tobias Kuhn and Raphaël Troncy. Easy web API development with SPARQL transformer. In 18th International Semantic Web Conference (ISWC 2019), In-use Track, October 2019, Auckland, New Zealand

 

WP4 Multimodal machine translation

Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen and Jörg Tiedemann. The University of Helsinki submissions to the WMT19 news translation task. In Proceedings of the Fourth Conference on Machine Translation (WMT): Shared Task Papers. 2019. (PDF)

Raúl Vázquez, Umut Sulubacak and Jörg Tiedemann. The University of Helsinki submission to the WMT19 parallel corpus filtering task. In Proceedings of the Fourth Conference on Machine Translation (WMT): Shared Task Papers. 2019. (PDF)

 

WP5 Human processing in multimodal content description and translation

Braun, S., & Starr, K. (2019). Finding the Right Words: Investigating Machine-Generated Video Description Quality using a Human-Derived Corpus-based Approach. Journal of Audiovisual Translation, 2 (2).

Braun Sabine, and Starr, Kim. Comparing human and automated approaches to video description. Media for All 8, Stockholm. 19th June, 2019.

 

2018

WP2 Automatic multimodal content analysis

Danny Francis, Benoit Huet and Bernard Merialdo. EURECOM participation in TrecVid VTT 2018. In 22nd International Workshop on Video Retrieval Evaluation (TRECVID 2018), Gaithersburg, USA, November 13-15, 2018. (PDF)

Mats Sjöberg, Hamed R. Tavakoli, Zhicun Xu, Hector Laria Mantecon and Jorma Laaksonen. PicSOM experiments in TRECVID 2018. In 22nd International Workshop on Video Retrieval Evaluation (TRECVID 2018), Gaithersburg, USA, November 13-15, 2018. (PDF)

David Doukhan, Eliott Lechapt, Marc Evrard and Jean Carrive. INA’s MIREX 2018 music and speech detection system. In 14th Music Information Retrieval Evaluation eXchange (MIREX), September 2018, Paris, France.

Zhicun Xu, Peter Smit, and Mikko Kurimo. The Aalto system based on fine-tuned audioset features for DCASE2018 task2 – general purpose audio tagging. In Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2018), Surrey, UK, November 2018. (PDF)

Zhicun Xu. Audio Event Classification Using Deep Learning Methods. Master’s thesis. Aalto University, 2018. (Permanent link)

Olfa Ben-Ahmed and Benoit Huet. Deep Multimodal Features for Movie Genre and Interestingness Prediction. In International Conference on Content-Based Multimedia Indexing (CBMI 2018), La Rochelle, France, 4-6 September 2018. (PDF)

 

WP3 Media enrichment and hyperlinking

Julien Plu, Giuseppe Rizzo and Raphaël Troncy. ADEL: ADaptable Entity Linking. A Hybrid Approach to Link Entities with Linked Data for Information Extraction. In Semantic Web Journal, IOS Press, (to appear) 2019. (PDF)

Lorenzo Canale, Pasquale Lisena and Raphaël Troncy. A Novel Ensemble Method for Named Entity Recognition and Disambiguation based on Neural Network. In International Semantic Web Conference (ISWC 2018), Monterey, CA, USA, 8-12 October 2018. (PDF)

 

WP4 Multimodal machine translation

Umut Sulubacak, Aku Rouhe, Jörg Tiedemann, Stig-Arne Grönroos and Mikko Kurimo. The MeMAD Submission to the IWSLT 2018 Speech Translation Task. In 15th International Workshop on Spoken Language Translation (IWSLT 2018); Bruges, Belgium. pp. 89–94. (PDF)

Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphaël Troncy and Raul Vázquez. 2018. The MeMAD Submission to the WMT18 Multimodal Translation Task. In 3rd Conference on Machine Translation (WMT 2018); Brussels, Belgium. Association for Computational Linguistics, pp. 609–617. arXiv:1808.10802 [cs]. ArXiv: 1808.10802. (PDF)

Stig-Arne Grönroos, Sami Virpioja and Mikko Kurimo. Cognate-aware morphological segmentation for multilingual neural translation. In 3rd Conference on Machine Translation (WMT 2018); Brussels, Belgium. Association for Computational Linguistics, pp. 390–397. (PDF)

Franck Burlot, Yves Scherrer, Vinit Ravishankar, Ondřej Bojar, Stig-Arne Grönroos, Maarit Koponen, Tommi Nieminen and François Yvon. The WMT18 Morpheval test suites for English–Czech, English–German, English–Finnish and Turkish–English. In 3rd Conference on Machine Translation (WMT18); Brussels, Belgium. Association for Computational Linguistics, pp. 550–564. (PDF)

 

WP5 Human processing in multimodal content description and translation

Conference abstracts

Sabine Braun and Kim Starr (2018) ‘From Slicing Bananas to Pluto the Dog: Human and Automatic Approaches to Visual Storytelling’, Languages & the Media, Berlin, 10/2018. (PDF)

Sabine Braun and Kim Starr (2019) ‘Mind the Gap: An Investigation of Omissions in Audio Description’, ARSAD, Barcelona, 03/2019, accepted.

Sabine Braun and Kim Starr. (2019) ‘Comparing Human and Automated Approaches to Video Description’, Media for All4, Stockholm, 06/2019, accepted.