2020

WP2 Automatic multimodal content analysis

Aku Rouhe, Tuomas Kaseva, and Mikko Kurimo. Speaker-aware training of attention-based end-to-end speech recognition using neural speaker embeddings. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

 

WP4 Multimodal machine translation

Grönroos Stig-Arne, Virpioja Sami, Kurimo Mikko. Morfessor EM+Prune: Improved subword segmentation with expectation maximization and pruning. In Proceedings of the 12th Language Resources and Evaluation Conference, ELRA, Marseilles, France, 2020, to appear.

 

2019

WP2 Automatic multimodal content analysis

Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen and Nazar Zaki. Multi-stream convolutional networks for indoor scene recognition. In Proceedings of the 18th International Conference on Computer Analysis of Images and Patterns (CAIP2019), pages 196–208, Salerno, Italy, September 2019. (PDF tbd)

 

Tzu-Jui Julius Wang, Hamed Rezazadegan Tavakoli, Mats Sjöberg and Jorma Laaksonen. Geometry-aware relational exemplar attention for dense captioning. In Proceedings of the 1st International Workshop on Multimodal Understanding and Learning for Embodied Applications (MULEA ’19) in ACM Multimedia Conference, pages 3–11, Nice, France, October 2019. (PDF tbd)

 

Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao and Jorma Laaksonen. Deep contextual attention for human-object interaction detection. In Proceedings of the International Conference on Computer Vision (ICCV2019), pages 5694–5702, Seoul, Korea, October 2019. (PDF tbd)

 

Héctor Laria Mantecón, Jorma Laaksonen, Danny Francis and Benoit Huet. PicSOM and EURECOM Experiments in TRECVID 2019. in Proceedings of the TRECVID 2019 Workshop. NIST, Gaithersburg, MA, USA, November 2019. (PDF tbd)

 

Tuomas Kaseva, Aku Rouhe, and Mikko Kurimo. Spherediar – an efficient speaker diarization system for meeting data. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019.

 

Tuomas Kaseva. Spherediar – an efficient speaker diarization system for meeting data. Master’s thesis, Aalto University, 2019. (https://aaltodoc.aalto.fi/handle/123456789/39063)

 

WP4 Multimodal machine translation

Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen and Jörg Tiedemann. The University of Helsinki submissions to the WMT19 news translation task. In Proceedings of the Fourth Conference on Machine Translation (WMT): Shared Task Papers. 2019. (PDF)


Raúl Vázquez, Umut Sulubacak and Jörg Tiedemann. The University of Helsinki submission to the WMT19 parallel corpus filtering task. In Proceedings of the Fourth Conference on Machine Translation (WMT): Shared Task Papers. 2019. (PDF)

 

WP5 Combining automatic human efficiency with human accuracy

Braun, S., & Starr, K. (2019). Finding the Right Words: Investigating Machine-Generated Video Description Quality using a Human-Derived Corpus-based Approach. Journal of Audiovisual Translation, 2 (2).

 

2018


WP2 Automatic multimodal content analysis

Danny Francis, Benoit Huet and Bernard Merialdo. EURECOM participation in TrecVid VTT 2018. In 22nd International Workshop on Video Retrieval Evaluation (TRECVID 2018), Gaithersburg, USA, November 13-15, 2018. (PDF)


Mats Sjöberg, Hamed R. Tavakoli, Zhicun Xu, Hector Laria Mantecon and Jorma Laaksonen.
PicSOM experiments in TRECVID 2018. In 22nd International Workshop on Video Retrieval Evaluation (TRECVID 2018), Gaithersburg, USA, November 13-15, 2018. (PDF)


David Doukhan, Eliott Lechapt, Marc Evrard and Jean Carrive.
INA’s MIREX 2018 music and speech detection system. In 14th Music Information Retrieval Evaluation eXchange (MIREX), September 2018, Paris, France.


Zhicun Xu, Peter Smit, and Mikko Kurimo.
The Aalto system based on fine-tuned audioset features for DCASE2018 task2 – general purpose audio tagging. In Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2018), Surrey, UK, November 2018. (PDF)


Zhicun Xu.
Audio Event Classification Using Deep Learning Methods. Master’s thesis. Aalto University, 2018. (Permanent link)


Olfa Ben-Ahmed and Benoit Huet.
Deep Multimodal Features for Movie Genre and Interestingness Prediction. In International Conference on Content-Based Multimedia Indexing (CBMI 2018), La Rochelle, France, 4-6 September 2018. (PDF)

 

WP3 Media enrichment and hyperlinking

Julien Plu, Giuseppe Rizzo and Raphaël Troncy. ADEL: ADaptable Entity Linking. A Hybrid Approach to Link Entities with Linked Data for Information Extraction. In Semantic Web Journal, IOS Press, (to appear) 2019. (PDF)


Lorenzo Canale, Pasquale Lisena and Raphaël Troncy.
A Novel Ensemble Method for Named Entity Recognition and Disambiguation based on Neural Network. In International Semantic Web Conference (ISWC 2018), Monterey, CA, USA, 8-12 October 2018. (PDF)

 

WP4 Multimodal machine translation

Umut Sulubacak, Aku Rouhe, Jörg Tiedemann, Stig-Arne Grönroos and Mikko Kurimo. The MeMAD Submission to the IWSLT 2018 Speech Translation Task. In 15th International Workshop on Spoken Language Translation (IWSLT 2018); Bruges, Belgium. pp. 89–94. (PDF)


Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphaël Troncy and Raul Vázquez. 2018.
The MeMAD Submission to the WMT18 Multimodal Translation Task. In 3rd Conference on Machine Translation (WMT 2018); Brussels, Belgium. Association for Computational Linguistics, pp. 609–617. arXiv:1808.10802 [cs]. ArXiv: 1808.10802. (PDF)


Stig-Arne Grönroos, Sami Virpioja and Mikko Kurimo.
Cognate-aware morphological segmentation for multilingual neural translation. In 3rd Conference on Machine Translation (WMT 2018); Brussels, Belgium. Association for Computational Linguistics, pp. 390–397. (PDF)


Franck Burlot, Yves Scherrer, Vinit Ravishankar, Ondřej Bojar, Stig-Arne Grönroos, Maarit Koponen, Tommi Nieminen and François Yvon.
The WMT18 Morpheval test suites for English–Czech, English–German, English–Finnish and Turkish–English. In 3rd Conference on Machine Translation (WMT18); Brussels, Belgium. Association for Computational Linguistics, pp. 550–564. (PDF)

 

WP5 Combining automatic human efficiency with human accuracy

Conference abstracts

Sabine Braun and Kim Starr (2018) ‘From Slicing Bananas to Pluto the Dog: Human and Automatic Approaches to Visual Storytelling’, Languages & the Media, Berlin, 10/2018. (PDF)

Sabine Braun and Kim Starr (2019) ‘Mind the Gap: An Investigation of Omissions in Audio Description’, ARSAD, Barcelona, 03/2019, accepted.

Sabine Braun and Kim Starr. (2019) ‘Comparing Human and Automated Approaches to Video Description’, Media for All4, Stockholm, 06/2019, accepted.

 

Publications in progress

Braun, S. & Starr, K. (eds.) (2019) Innovations in Audio Description Research. London/New York: Routledge, in preparation.

Braun, S., Starr, K., Hirvonen, M., Tiittula, L. Laaksonen, J. (2019) Comparing human and automated approaches to visual storytelling. In S. Braun & K. Starr (eds.), in preparation.

Starr, K. & Braun, S. (2019) Re-purposing audiovisual accessibility to assist emotion recognition in autistic children. In S. Braun & K. Starr (eds.), in preparation.

 

Blog post

Starr, K. (2018) From Slicing Bananas to Pluto the Dog: Computer Vision with a Human Touch, Blogpost, MeMAD website. October 19th.