Recently, myself and other members of the LEvInSoN group hosted a Summer School at the Max Planck Institute for Psycholinguistics as part of our Minds, Mechanisms, and Interaction in the Evolution of Language Workshop. The goal of this summer school was ambitious – a synthetic view of the process of designing, creating, running, and analysing the results of an experiment – all in two days. We wanted to show the students the process of scientific collaboration, warts and all. The contents of the summer school, including the data collected for our experiment can be found in this GitHub repository.

We were, overall, very happy with the results of the summer school- both in the learning outcomes for the students and the actual experimental results – below I outline the basics of the study conducted, the topics covered, and the results of our experiment.

The Organisers

The September Tutorial in Empiricism was a massive undertaking with many moving parts and many contributors. I’d especially like to thank my fellow organisers, without whom nothing would have been possible – while I stressed about the details of the experiment and wrangling the instructors, they did a great job of making sure all of the bureaucratic boxes were ticked and making sure things ran smoothly.

Alan Nielsen

Ashley Micklos
Hannah Little

The Instructors

The Students


Twenty-six students participated in the Summer School, from a wide range of research backgrounds and levels of experience. Of our 26 students the majority (19) were PhD students, but we also had 2 postdoctoral students, 3 Master’s students, and even 2 undergraduates.

The following students would like to  be recognized for their attendance:

Federica Bartolozzi – PhD Candidate – Max Planck Institute for Psycholinguistics
Miguel Borges – PhD Candidate – Max Planck Institute for Psycholinguistics
Giusy Cirillo – MA Student – University of Tubingen
Lara Clauss* – PhD Candidate – Max Planck Institute for Psycholinguistics
Varun deCastro-Arrazola – PhD Candidate – Leiden University and Meertens Instituut
Ian Joo – MA Student – National Chiao Tung University
Greta Kaufeld – PhD Candidate – Max Planck Institute for Psycholinguistics
Fiona Kirton – PhD Student – Center for Language Evolution, University of Edinburgh
Ezequiel Koile – Visiting Postdoctoral Researcher – Max Planck Institute for the Science of Human History
Elly Koutamanis – PhD Student – Center for Language Studies, Radboud University
Hannah Lutzenberger – PhD Student – Center for Language Studies, Radboud University
Katie Mudd – PhD Student – Vrije Universiteit Brussels
Limor Raviv – PhD Student – Max Planck Institute for Psycholinguistics
Constanze Schon* – Visiting Intern – Max Planck Institute for Psycholinguistics
Kazuki Sekine – Postdoctoral Researcher – Radboud University/Max Planck Institute for Psycholinguistics
Chen Shen* – PhD Student – Center for Langauge Studies, Radboud University
Anita Slonimska – PhD Candidate – Center for Language Studies, Radboud University. ITSC  Fellow
Katja Stark – PhD Student – Max Planck Institute for Psycholinguistics
Katarina Stekic – BSc Student – Laboratory for Neurocognition and Applied Cognition, University of Belgrade
Jeroen van Paridon – PhD Candidate – Max Planck Institute for Psycholinguistics
Marieke Woensdregt – PhD Student – Center for Language Evolution, University of Edinburgh
Nezihe Zeybek* – PhD Student – University of Burgundy
Eirini Zormpa – PhD Student – Max Planck Institute for Psycholinguistics

*Not Pictured

As mentioned, our students had a broad range of experience with the various parts of designing, hosting, and analysing large-scale online experiments – in advance of the summer school we solicited self-reported competence ratings for a number of topics, which can be seen below:

This presented us with a unique challenge. Generally students were fairly familiar with experimental design and the most basic statistics we would be using for data analysis (LMER in R), but very few were familiar with more advanced statistical techniques like Factor Analysis or K-Means clustering, almost none had any experience with JavaScript and jsPsych, and very few had experience with Bayes or Bayesian modelling.

Thus, we had students work through the summer school in groups, which we attempted to make as balanced as possible – you can see the average competence score for each group below. This approach proved especially valuable, because it allowed the more experienced students opportunities to pass on their knowledge to younger or otherwise less-experienced students in a hands on fashion.

This also served as a great first example of the ability of R (and ggplot specifically) to easily output informative graphs from pretty minimal code.

The Project

Human beings are not unbiased perceivers of the world, taking in information from the environment and processing it in a vacuum. One way that humans are biased is in the types of associations that they make between sensory modalities – starting in at least the early part of the 20th century it was recognized that humans are biased to associate certain types of sounds with certain types of meanings.

In the examples above, it’s likely that when tasked with choosing appropriate labels for the given images you’ll have chosen that the jagged star-like image should be called takete (rather than maluma), that the large table fits better with the name mal than mil, and that the correct word to describe the pictured dog is fuwafuwa (a Japanese ideophone meaning fluffy) rather than korokoro (a Japanese ideophone meaning ‘a small object rolling repeatedly’).

These types of iconic biases between sounds and meanings are often referred to as sound-symbolism – an area that has become a growth area in psycholinguistics research:

Associations between what would seem like otherwise unrelated perceptual modalities are not, however, limited to those that can be explored linguistically – experimental participants have been found to have dozens if not hundreds of these types of associations – they suggest for example that small objects are happy, bright, fast moving, and high pitched. As researchers have tested for more and more of these crossmodal biases, they have found increasing evidence that humans make associations both within and between sensory domains, which raises a number of questions.

From our perspective, the most important question is How are crossmodal biases related to each other?

Unfortunately, the current state of knowledge in the crossmodal literature makes this question vexingly difficult to answer – even answering fundamental questions that we need answers to in order to approach the above question is challenging. There are a number of reasons for this: first, researchers do not typically share stimuli, so finding, for example, that participants in one study associate high pitched sounds with fast objects while in another study they associate high pitched sounds with small objects can sometimes be minimally informative – often the pitch differences, size differences, or speed differences will be entirely idiosyncratic to the individual study in question. Relatedly, the findings of various studies are often isolated – even ambitious projects typically look at associations between less than a handful of domains. These studies are important, generally well-designed and implemented, and informative, but they make enumerating the types of crossmodal associations that human participants make difficult. If one could make a general summary of crossmodal research, it would be that where we look for crossmodal biases, we find them.

The focus of the present study then is to exhaustively test associations between a number of domains, cataloging their relative strengths and creating a network of associations.

The domains tested were:
Amplitude (Loudness) – Loud vs. Quiet
Pitch – High vs. Low
Noise – Noisy vs. Tonal
Size – Large vs. Small
Shape – Jagged vs. Curvy
Speed – Fast vs. Slow
Brightness – Bright vs. Dull
Color – Yellow vs. Blue, Red vs. Yellow, Red vs. Green, Red vs. Blue
Affect- Stressed vs. Calm, Pleased vs. Disgusted, Happy vs. Sad, Excited vs. Bored

Stimuli can be found on the github repository here.

The Experiment

In the experiment conducted for the Summer School, experimental participants were recruited via Amazon Mechanical Turk and tasked with making associations between perceptual domains – you can see the experimental interface here. The experimental interface was created in jsPsych by Justin Sulik.

Participants were tasked with responding to trials like the following below:


Participants were each shown 96 of these trials, and we solicited responses from a total of 210 participants via mechanical turk (60 pilot, 150 main experiment). Collectively, these participants were tested on all comparisons possible for our 9 chosen perceptual domains – thus they were asked whether Small things were Loud or Quiet, Noisy or Tonal, High or Low pitched, Jagged or Curvy, Fast or Slow, Bright or Dull, etc.


We found many interesting results, and continue to further analyse the data from the Summer School in preparation for publication, but our general suggestion that many or most associations are responded to in a biased fashion was upheld. Below you can see a heatmap of effect sizes for comparisons from our main experiment:

The heatmap may seem difficult to read, but it’s not – the value shown in each cell is the effect size of the association (calculated from a nonparametric Wilcoxon signed-ranks test), thus high values reflect larger effect sizes (more consistency between participants). Gray cells show non-significant associations – all other associations are significant at p < 0.05. The difference in colors (blue vs. red) informs you about the direction of association made by participants – the bright blue square (effect size = 0.87) in the top left corner of the heatmap tells you that generally, participants suggested that fast-moving images were excited (rather than bored). In the Noisy/Tonal row you can see an example of a strong negative association (effect size = 0.77) – participants responded that Noisy sounds were Disgusted (and thus than tonal sounds were pleased).

Any cell can be read in this fashion – If you take a Domain on a row, e.g. Bright/Dark, and a Domain on a column, e.g. Happy/Sad, the cell will be blue if participants associated the token on the left of the row domain (e.g. Bright) with the token on the left of the column domain (e.g. Happy; in this case most participants suggested that Bright colors are happy). if a cell is red, on the other hand, it suggests that the token on the left of the row domain was regularly paired with the token on the right of the column domain (and vice-versa – e.g. participants generally agreed that Jagged images were sad (and curvy images happy)).

Hypothesis Challenge

As part of the Summer School, we encouraged students to make predictions about what the results of our main experiment would be, having provided them with some insight about how participants responded during the experimental pilot.

The rules of our hypothesis challenge were fairly simple- participants would make a set of predictions about how participants would respond, and we would compare their predictions to the actual results of the experiment. You can see the deviation of each set of predictions from the actual observed results (brighter red = larger difference) below:

Our top 3 sets of predictions were made by Jeroen van Paridon (average deviation = 0.134), Ezequiel Koile (average deviation = 0.178), and Ian Joo (average deviation = 0.234). From these three students a winner was chosen based on the participant who had made the fewest overall predictions (and allowed the imputation procedure covered by Bill Thompson during the summer school to fill in unmade predictions). This gave us a clear winner – in addition to producing the best overall predictions, Jeroen van Paridon also made less individual predictions than either of his competitors.

This provided an additional teaching opportunity for the other students about the power of computation. Whereas Ian Joo studies sound symbolism and made his predictions based on his own knowledge of the literature, Jeroen is a computationalist and took a brute-force mathematical/computational approach to the problem – generating a sampling procedure to best explain the pilot data with a minimal number of predictions. So congratulations to Jeroen for his interesting approach to the problem (which actually mirrors some of the more advanced ways we are continuing to look at the data). For his troubles, he won a copy of Rethinking Statistics.

Closing Thoughts and Future Directions

We all, as young academics, found the process of organising and running this summer school immensely rewarding – as a collaborative effort it highlighted good scientific practice not only to the students, but to all of us as well, and showed how productive it can be to bring together researchers with different types of expertise all working on a single project. There were a few hiccups along the way – but those were expected and desirable, given our framing of the Summer School as being a synthetic and honest view of science as it is done, rather than science as it is written.

I’d really like to thank the students who participated, especially those who came to the endeavour with a positive attitude. With the Summer School taking place over only two days, participants were presented with an impossible learning task, so rather than focusing on participants leaving with the ability to put together a similar experiment immediately, we aimed to provide them with some knowledge about what types of practices are possible for modern empiricists, and plenty of additional reading and supplementary materials that would allow them to later work through at their own pace and turn the tools they were provided with towards their own projects. As a nice bit of validation, some of our students provided us with lovely testimonials about their experience.

To that end, the GitHub repository for the summer school will remain public, and in the coming months we will assemble a public-facing website with all of the worked materials available in handbook format for anyone who is interested. We hope to be able to format the summer school materials such that any of us who participated in the teaching of this inaugural version would be able to present our own version of the school at other locations and later dates – so keep your eyes open for future possibilities.

On a personal level, I was immensely lucky that we chose to pursue one of my projects for the purposes of the Summer School, and I’m very happy with the results, so look for a more complete description in the near future. The upside of this project is not only in cataloging a set of crossmodal biases in our typical WEIRD population of experimental participants, but also in the possibility of extending further refinements of this procedure to other cultures and languages, allowing us to compare the degree to which crossmodal biases are universal vs. language specific. I recognize that the stimuli and the design of this preliminary experiment is imperfect (in fact, a major part of our analysis looks at the influence of task demands on these types of experiments), so am looking forward to feedback from other researchers.