MeMAD project
MeMAD project aims at designing cutting-edge audiovisual media indexing methods. This work will meet the need for automatic procedures required for the management of large audiovisual collections, as well as media accessibility for the visually and hearing impaired.
INA contributed to the MeMAD platform with a software module dedicated to audio analysis: music detection, male and female voice classification, which are preliminary tasks needed by speech analysis algorithms. This low-level metadata extraction methods has the advantages of being robust and associated to low calculation time. The opportunities offered by use of this information were investigated through a large scale analysis of gender distribution in French TV and radio programs presented below.
Describing gender representation differences in media
Gender representation in media has been described in a substantial amount of quantitative studies such as Global Media Monitoring Project or the annual report of French Higher Council of Audiovisual (Conseil supérieur de l’audiovisuel – CSA). Presence rate, defined as the percentage of men and women found in audiovisual programs, is one of the most commonly used metrics for describing gender imbalance in media. This descriptor is often presented together with the character’s roles (presenter, expert, political guest) and the topics covered (health, economy).
The percentage of women and men speaking time, also known as expression rate, can be used in conjunction with other metrics such as the presence rate. Expression rate has been used in relatively fewer studies. Reiser & Gresy used this descriptor to analyze a corpus of programs broadcasted on May 15, 2008 on 6 TV channels and 6 radio stations. The amount of recordings collected per channel or station ranges from 6 minutes to 3 hours.
Expression rates were also presented in an experimental study conducted by Belgian CSA (Higher Council of Audiovisual). The study was based on the analysis of 36 hours of programs broadcasted during a week, and speech time was presented for several age categories of men and women.
Manual estimation of speaking time in TV and radio programs is expensive and time-consuming. Studies describing expression rate and women speaking time percentage are therefore limited to the analysis of relatively small amounts of data.
Automatic Estimation of Women Speaking Time Percentage
Based on the recent advances in artificial intelligence and machine learning, INA designed inaSpeechSegmenter: an open-source software component able to detect speech zones in sound signals, and predict speaker gender. This software takes advantage of modern GPU architectures and can analyze one hour long documents in about one minute. It is based on Convolutional Neural Networks trained using INA’s speaker dictionary, and obtained the best performances in the speech detection task of MIREX 2018 challenge. It was used to perform a large-scale data-driven study based on the analysis of 700.000 hours of audiovisual documents, broadcasted in French TV and Radio from 2001 to 2018.
Global Analysis of Audiovisual Streams
We found out that between 2010 and 2018, men used to speak twice as much as women, in French TV and radio.
Average results per TV channel are presented below. Women Speaking Time Percentage (WSTP) varies between 7 and 47%, meaning all French channels allocate more speaking time to men than to women. WSTP is minimal for sport channels (Eurosport, l’Equipe) and maximal for channels aimed at a women audience (Teva, Cherie 25). We also found out that the time women are speaking is lower in channels specialized in cultural or educational programs (Arte, Histoire, France 5) than in generalist channels.
Yearly evolution of Women Speaking Time Percentage
WSTP was estimated on radio from 2001 to 2018. We found out this estimate evolved from 25.3% in 2001 to 34.4% in 2018. In other words, the French radio landscape changed from a configuration where the time men are speaking was three times as long as women’s to a configuration where the time men are speaking time is twice as long as women’s. While these proportions are still highly unbalanced, this shows substantial changes in the French radio landscape.
To go further
In this post, we presented a software integrated into MeMAD platform, allowing to detect speaker gender based on modern neural network architectures. This information contributes to a better description of audiovisual documents. We showed how it could be used by end-users to extract valuable knowledge from large media collections.
Detailed analyses related to these studies are presented in a study published in VIEW: the journal of European Television History and Culture. Among key results presented in this study, we found out that WSTP had a significant increase over the years in public channels, without observing significant changes in private channels. We also found out that WSTP is lower during high-audience time slots in private TV and radio channels.
The detailed results of these analyses are freely accessible through data.gouv.fr, which is the open platform for French public data. We hope this data will help further research in digital humanities and contribute to a better understanding of gender equality issues in media.
References
Derinoz, S., Hanot, M., Levant, B. How gender representations matter with generation in television?, II International Conference Gender and Communication, 2014
Doukhan, D., Carrive, J., Vallet, F., Larcher, A., & Meignier, S. An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5214-5218). IEEE.
Doukhan, D., Poels, G., Rezgui, Z. and Carrive, J. Describing Gender Equality in French Audiovisual Streams with a Deep Learning Approach. VIEW Journal of European Television History and Culture, 7(14), pp.103–122, 2018
Reiser, M. and Gresy, B., 2008. L’image des femmes dans les médias, Secrétariat d’Etat à la solidarité, 2008.