Embracing Legacy Research Code

Should we? If so, how?

Joscha T. Schmiedt & Andreas K. Kreiter

deRSE23 - Conference for Research Software Engineering in Germany

20-22 Feb 2023 Paderborn (Germany)

  • old
  • bad?
  • written by other people?

  • untested?

Embracing Legacy Research Code | deRSE 2023

What is legacy code?

It’s dirty, it’s crowded, it’s ugly, it’s messy, it’s… old. But (...) it’s there for a reason,

we just need to appreciate that reason.

Steven Wade, 2021

  • old
  • bad?
  • written by other people?

  • untested?

What is legacy code?

Embracing Legacy Research Code | deRSE 2023

It’s dirty, it’s crowded, it’s ugly, it’s messy, it’s… old. But (...) it’s there for a reason,

we just need to appreciate that reason.

Steven Wade, 2021

  • established
  • bad?
  • written by other people?

  • untested?

What is legacy code?

Embracing Legacy Research Code | deRSE 2023

  • established
  • bad?
  • written by other people?

  • untested?

Key to working successfully with legacy code isn’t skill or code knowledge; it’s attitude. As a developer, you must be empathetic to the developers who came before you.

What is legacy code?

Embracing Legacy Research Code | deRSE 2023

  • established
  • bad?
  • written by other people?

  • untested?

What is legacy RESEARCH code?

  • unprofessional?
  • developed by small teams?
  • using scripted languages?
  • complicated?
  • written for research!

Embracing Legacy Research Code | deRSE 2023

Should we EMBRACE LEGACY RESEARCH CODE?

Why was the software written?
Why does it need to change?

Embracing Legacy Research Code | deRSE 2023

Who wrote/will write the software?

Who uses it?

When & Where was it developed?

What does the code do and how?

Who wrote the software?
Who will write the software?

Who uses it?

OK

Uh-oh!

+ cooperating original authors

+ licensed code

+ few understanding users

+ experienced RSE will take over

- unvailable authors

- no license

- many demanding users

- inexperienced new developer

Embracing Legacy Research Code | deRSE 2023

Where was it written?

+ version control

+ single source

- no version control

- scattered versions

OK

Uh-oh!

Embracing Legacy Research Code | deRSE 2023

When was the software written?

OK

Uh-oh!

1980

2020

2000

Embracing Legacy Research Code | deRSE 2023

Python 3.10 / Flask

C++20 / Qt 5

MATLAB 2021a / App Designer

Swift / iOS 14

C++98 / GTK 2

Python 2.7 / Tk

Fortran 2003

Java / Android 3.0

MATLAB 6.5 / Guide

Objective C / Carbon

C / WinAPI

C++98 / MFC

IDL

Motorola 68k Assembly

FORTRAN 77

COBOL

When was the software written?

OK

Uh-oh!

1980

2020

2000

Embracing Legacy Research Code | deRSE 2023

When was the software written?

OK

Uh-oh!

Embracing Legacy Research Code | deRSE 2023

1980

2020

2000

Why was the software written?

+ to control an experiment

+ to run complex simulations

+ to manage research data

+ ...

- because of ignorance

- for political reasons

- as a workaround

- ...

OK

Uh-oh!

  1. Getting it to run on modern systems
  2. Adding a feature
  3. Fixing a bug
  4. Improving the design or performance

Why does it need to change?

Embracing Legacy Research Code | deRSE 2023

What does the code do and how?

+ clean code

+ clean architecture

- inconsistent, bad naming

- monster methods/classes

- sprawling dependencies

OK

Uh-oh!

Embracing Legacy Research Code | deRSE 2023

- deprecated frameworks

- untestable style

- relies on deprecated hardware

+ few, well-isolated dependencies

+ stable platform (frameworks, OS)

+ has tests

+ runs on todays hardware

- no type-safety

Example: VStim

Who wrote the software?
1 University professor with several years of C++ experience
 

Who uses it?

1 working group only, ca. 10-15 users

 

Where was it written?

local development PCs, network backup,
no formal version control, not published

 

When was the software written?

1997 to 2022

Embracing Legacy Research Code | deRSE 2023

Example: VStim

Why was the software written?

To control behavioral experiments involving
accurate visual stimulation with morphing shapes

 

👍Ok, seems doable! Now, how?

 

Embracing Legacy Research Code | deRSE 2023

What does the code do and how?

  • Few, stable dependencies: DirectX, Microsoft Foundation Classes
  • >60,000 lines of C++98: raw pointers, not using standard library
  • grown, but relatively modular architecture, cohesive classes
  • documentation as Word files
  • manual tests only

Why does it need to change?

Accuracy is lost on modern Windows due to too much multi-tasking
New features for new scientific questions

1

Clarify authorship & licensing

Setup version control

2

Understand

structure, function

dependencies

3

Identify change points

5

Write tests

4

Identify test points

6

Change

Embracing Legacy Research Code | deRSE 2023

Should we embrace legacy research code?

Who wrote/will write the software?
Who uses it?

Where was it written?

When was the software written?

Why was the software written?
Why does it need to change?

What does the code do and how?

IF SO, HOW?

Thank you!

  1. Clarify authorship, licensing, version control
  2. Understand the code
  3. Identify change and test points
  4. Write tests
  5. Change

Embracing Legacy Research Code | deRSE 2023