Partners

Pallas Athena

Source Code Mining

Source code are important assets, however it is striking how little we know about these assets. In general we know how many systems are operational with an organization. In many cases we probably know which programming languages, database systems, etc. where used to develop the systems. However more detailed knowledge about size, complexity and structure are in many cases not available. The original documentation is quite often outdated or completely absent. In order to perform maintenance or migration it is crucial to know more about the source code.

In order to answer the following questions:

  • What is the quality and architecture of the system?
  • Continue to support, migrate or buy new?
  • Why is the maintenance this so expensive?
  • Why does calculating this so long?

These questions can be generalized into the following topics:

  • Quality enhancement of systems
  • Performance enhancement
  • Cost control in maintenance and migration
  • Independent assessment of quality

The overall goal is the increase the dependability of software. The goal of source code mining is to assess the overall quality, performance, maintainability and reliability of code bases.

LaQuSo performed a number of projects in the area of maintainability of the code base. The goal of maintainability is to fix errors/bugs in a system, to add new features to a system, or to adapt a system to a new environment. All these actions are performed because better maintainable systems cost less time and less money to adapt and/or fix. The main ingredients to establish the maintainability of a system are the number of internal and external dependencies and metrics.

MatrixView

Dependencies:

Red
Calls from and to modules inside the system of interest
Green
Calls from and to modules outside the system of interest
Highlighted (light blue)
Calls breaking the architectural layering rules

Over the years a broad range of source code related metrics, McCabe cyclomatic complexity, Halstead complexity, Fan-in, Fan-out, etc., have been developed. These metrics can be programming language specific.

MatrixView

LaQuSo offers generic technology to extract source code related metrics for a broad collection of programming languages, among others Cobol, Java, PL-SQL, C, C++, C#. These metrics are presented using state-of-art visualization techniques. Furthermore, dependencies between programs, functions, classes, methods, etc. can also be extracted and visualized. LaQuSo developed a toolset SQuAVisiT to perform the analysis of source code and to facilitate the visualization of both metrics and dependencies. SQuAVisiT has been applied in numerous projects to establish the maintainability index or to extract the architecture of complex software systems.

MatrixView

SQuAVisiT is focused on extracting metrics and dependencies of large software systems. In many cases the internal structure of a single program is needed in order to perform low level maintenance or to facilitate migration from one version to another version. In order to perform low level maintenance or facilitate migration the internal structure of the source code has to be extracted and captured in a model. LaQuSo has developed a tool, Cpp2XMI, to analyse C/C++ code and to extract the following models: class diagrams, sequence diagrams, activity diagrams and for well structured code even state machine diagrams. These diagrams are represented in XMI and can be processed by UML modeling environments.

These extracted models can be used to further analysis, see Model Analysis, to check source code for

  • Uninitialized variables
  • Null pointer dereferencing
  • Out of bounds referencing of arrays
  • User defined properties, for instance
  • Lock – Unlock
  • B may only occur after A

Future developments in the area of source code mining are related

  • to software evolution, for instance the analysis of software repositories and extraction of evolution related metrics
  • to model extraction and analysis of extracted models via model checking technology
  • to model extraction based on the analysis of log information obtained when running a program