IN5060 - Høst 2019

Home exam 2: User studies

In this home exam, you explore how performance can be assessed using a categorical user study.

This method allows to assess the performance of a complete system with respect to the perception of a human user.

The same method can be applied to the automatic evaluation of complex systems using objective methods, it is not limited to studying the perception of people. Instead, user studies have been chosen to demonstrate how very flexible this method actually is.

In the delivery of this home exam, you will conduct the analysis of results from a data collection that you performed together. The collection yielded numerical data, and the method of choice for evaluation of its relevance is the ANOVA test.

Bildet kan inneholde: action-eventyrspill, spill, pc-spill.

Background

Quantitative data

Many measurement studies in computer science yield their results in the form of numbers, which can be sorted, ordered and measured in spaces where zero, addition and multiplication have the traditional meaning. Sometimes difference axiomatic systems are equally capable.

When the evaluation is performed by people, you must carefully evaluate whether you are in this situation. People have quite different perceptions and preferences, and two people may assess two observation in different orders. A categorical study (Friedman's text) allows you to test whether a statistically relevant number of people make the same ranking decision, or whether a rank between two observations cannot be established. A quantitative study (ANOVA) allows you to assign values to options and assess whether they show relevant statistical differences to other options.

With Friedman's test, you can establish a partial order between observations. With ANOVA, you can establish average values with a standard deviation and determine the statistical relevance of the distance between options.

System-under-test

You have together conducted a user study, where candidates compared the quality of some videos. The results of the user study can found here: all.csv

These videos emulate the uneven visual quality that is typical for delivering a visual region of a 360-degree panorama video to a mobile phone or VR headset using adaptive video streaming over HTTP. You have collected several ratings on a 5-point Likert scale for each of the example videos, and are now prepared to determine whether the videos can be ranked according to a global order, or whether the videos can be grouped to form a total order.

Task

First, arrange the video ratings from each participant in the study in a table according to the procedure for ANOVA. Test whether you can achieve a high confidence in distinguishing all example videos from each other. If yes, document and illustrate your results.

If no, define groups of videos. It is typical at this point to group videos by properties of the test input. In the given test data, you could group videos by the percentage of low vs high quality pixels in the video, or group them by the intensity of the introduced blur, or by content. However, you could also group the videos by similarity of values that you have discovered in the first step.

The grouping method is your choice. You should choose a single grouping method that allows you to establish a ranking, and then explain your grouping method as well as the results.

Publisert 15. okt. 2019 09:05 - Sist endret 15. okt. 2019 13:52