SOS9025/SOS9025B – Sequence analysis

Course content

Sequence analysis in social research refers to the application of computer science algorithms to longitudinal social data such as life course histories. It stands as a complementary alternative to conventional statistical approaches such as event history analysis. It focuses on defining similarity measures between pairs of sequences, which can be used in a number of ways, typically to develop data-driven classifications of sequences, but also to explore issues about, for instance, convergence or divergence of patterns over time or across categories.

 

This course will begin by a consideration of the use of sequence analysis in the sociological literature, and proceed to a practically-oriented examination of the method, with lectures alternating with laboratory exercises. The course will primarily use Stata, using the SADI package, but will also consider the TraMineR package in R, and will use real life history data.

 

It will address:

 - Descriptive approaches to sequence data, such as analysis of

   transition rates, cumulated durations or number of spells, the use of

   regular expressions to manipulate sequences

 - Graphical summaries such as transition rate time-series, Kaplan-Meier

   survival curves, chronograms, indexplots,

 - How to define sequence similarity: from Hamming distance to Optimal

   Matching, and the strengths and weaknesses of the Optimal Matching

   algorithm

 - Cluster analysis as a strategy: creating empirical typologies

 - Multidimensional scaling as an alternative to clustering

 - Discrepancy analysis as an alternative to clustering

 - Sequence complexity: entropy, turbulence

 - How to parameterise OM: the issue of costs, how to think carefully about state spaces

 - What to do with your results: multinomial regression of cluster

   solutions, etc.

 - Alternative distance measures, including Dynamic Hamming distance,

   Elzinga's combinatorial approaches, Time-warp edit distance

 - Multiple domains: analysis of sequences in more than one state space

 - Missing data: Multiple imputation and lighter-weight approaches

Admission

This course has to variants: SOS9025 and SOS9025B. If you are planning to submit a paper, register for SOS9025 (5 ECTS). If you only want to participate in the course and complete the course tasks without submitting a paper after the course, register for SOS9025B (2 ECTS).

PhD students at the Department of Sociology and Human Geography register for the course in StudentWeb.

Interested participants outside the Department of Sociology and Human Geography shall fill out this application form.

The application deadline is 16st May 2015.

Teaching

Course instructor: Dr. Brendan Halpin, Head, Department of Sociology, University of Limerick, Ireland

Place: Eilert Sundts hus, B-blokka: PC-stue 351                                        

 

16th June 9.00-12.00 - Session 1: Introduction

What sequence analysis is, and why it is useful to treat sequence data (such as life course histories) holistically. A review of non-holistic approches to sequence data: Transitions, durations, other summaries. Using holistic approaches to generate typologies, both theoretical and data driven. Relating state-space distances to inter-trajectory sequences; the Needleman-Wunsch algorithm.

Lab

Installing SADI (sequence analysis tools for Stata) and running some simple analyses with Stata.

Reading

  • Abbott and Hrycak, 1990, Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians' Careers, American Journal of Sociology, 96(1)

 

16th June 13.00-16.00 - Session 2

Analysis of real data using Hamming distance and Optimal Matching Analysis. Generation of data-driven typologies using clustering. Plotting sequences using chronograms and indexplots. Summarising sequences in terms of cumulated duration, volatility, etc.

Lab

Carry out optimal matching analysis on real data, summarising sequences graphically and otherwise, carrying out cluster analysis of pairwise distance matrices.

Reading

  • Halpin and Chan, 1998, Class Careers as Sequences: An Optimal Matching Analysis of Work-life Histories, European Sociological Review, 14(2)
  • Scherer, 2001, Early Career Patterns: A Comparison of Great Britain and West Germany, European Sociological Review, 17(2)

 

17th June 9.00-12.00 - Session 3

Exploring the sequence space using multi-dimensional scaling. Towards an understanding of substitution and insertion/deletion costs.

Using distance data for further analysis, using cluster solutions or MDS dimensions as dependent or independent variables, discrepancy analysis of the distance matrix.

Using theoretically or empirically defined ideal types as an alternative to clustering.

Lab

MDS on pairwise distance data, use of clusters and other measures in further analysis.

Reading

  • Wiggins, Erzberger, Hyde, Higgs and Blane, 2007 Optimal Matching Analysis Using Ideal Types to Describe the Lifecourse: An Illustration of How Histories of Work, Partnerships and Housing Relate to Quality of Life in Early Old Age International Journal of Social Research Methodology, 10(4)
    • Raffaella Piccarreta, 2012, Graphical and Smoothing Techniques for Sequence Analysis, Sociological Methods and Research, 41(2)

 

17th June 13.00-16.00 - Session 4

Alternatives to the Needleman-Wunsch distance used by optimal matching, such as Elzinga's subsequence methods, and time-warping. Comparing distance measures via cluster agreement and correlation. Alternatives to sequence analysis such as latent class analysis.

Lab

Fitting and comparing different distance measures

Reading

  • Halpin, 2014, Three narratives of sequence analysis, in Bühlmann et al (eds), Advances in Sequence Analysis, Springer
  • Nicola Barban and Francesco Billari, 2012, Classifying life course trajectories: A comparison of latent class and sequence analysis, Journal of the Royal Statistical Society Series C, 61(5)
  • Elzinga and Studer, Spell Sequences, State Proximities and Distance Metrics, 2015 Sociological Methods & Research 44(1)

 

18th June 9.00-12.00 - Session 5

Multi-channel sequence analysis: analysing sequences in multiple domains. Dyadic sequence analysis: mother–daughter, spouse-pair sequences. Dealing with missing data in sequences.

Lab

Implementing MCSA, extracting dyadic distances.

Reading

  • Gary Pollock, 2007, Holistic Trajectories: A Study of Combined Employment, Housing and Family Careers by Using Multiple-Sequence Analysis, Journal of the Royal Statistical Society: Series A, 70(1)
  • Raab, Fasang, Karhula and Erola, 2014, Sibling Similarity in Family Formation, Demography
  • Brendan Halpin, 2012, Multiple Imputation for Life-Course Sequence Data', Dept of Sociology working paper WP2012-01, University of Limerick
  • Brendan Halpin, 2013, Imputing Sequence Data: extensions to initial and terminal gaps, Dept of Sociology working paper WP2013-01, University of Limerick, 2013

 

 

Info on software:

Brendan Halpin (2013), SADI: Sequence Analysis Tools for Stata, Working paper WP2014-03, Department of Sociology, University of Limerick http://www3.ul.ie/sociology/pubs/wp2014-03.pdf

http://hdl.handle.net/10344/3783

Examination

SOS9025B: 2 ECTS method-credits are obtained by active participation in the course and completing the course tasks.

SOS9025: Participants obtain 5 ECTS by active participation, completing the course tasks and submitting a paper/assessment.

Assessment: Thre will be a choice between a set excercise involving carrying out specified analyses with an existing data set, or an open exercise involving using sequence analysis to address a relevant research question using data chosen by the student.

The deadline for submitting the paper (15 pages) after the course is  1st September 2015. Send the paper to katalin.godbgerg@sosgeo.uio.no.

Grading scale

Grades are awarded on a pass/fail scale. Read more about the grading system.

Facts about this course

Credits
5
Level
PhD
Teaching
Spring 2015

16-18th June 2015

Examination
Spring 2015
Teaching language
English