[tool_anchor_carousel]

Correlation diagram

  • Purpose of the tool

The correlation diagram is used to graphically represent the relationship between two variable quantities.
It answers the question of whether, in which direction, and how strongly two variables are related to each other.

Each point in the diagram represents a pair of values consisting of an influencing variable (x) and a target variable (y).
The distribution of the points shows whether there is a positive, negative, or no correlation between the variables under consideration.

The correlation diagram is particularly suitable for:

  • the investigation of influencing factors on a target variable
  • continuous measured variables (e.g., time, temperature, quantity, viscosity)
  • initial visual inspection of possible correlations without model assumptions
  • Example tomato sauce:

The development department for new tomato sauces is investigating how cooking time affects the viscosity of the product.
To this end, tomato sauces with different cooking times are produced in several test series.

For each test series, the cooking time and the resulting viscosity are measured and recorded as a pair of values.
A correlation diagram is used to examine whether and how the relationship between cooking time and viscosity is reflected in the measurement data.

The basic relationship is well known in the industry:
Prolonged cooking causes water to evaporate, which generally results in the tomato sauce having a higher viscosity.

Correlation diagram

Explanations of the graph:

The scatter plot shows the viscosity of the tomato sauce as a function of cooking time.
The measured viscosity increases with increasing cooking time.

The points lie approximately on a straight line, indicating a clear positive, almost linear relationship between cooking time and viscosity.
For the same cooking times, a slight dispersion of the measured values can be seen.

  • Procedure

Preliminary work

  1. Define the target variable (y) (e.g., viscosity of tomato sauce)
  2. Define the influencing variable (x) (e.g., cooking time)
  3. Ensure that both variables are quantitative measures
  4. Collect data

Use in AlphadiTab

  1. In the Measure phase, select the Correlation Diagram tool
  2. For data X, select the “Cooking time” column
  3. For data Y, select the “Viscosity” column
  4. Generate the diagram using the “Create new” button.

Interpretation

  1. Check whether a correlation between the x and y axes is apparent
  2. Assess whether the correlation is positive, negative, or non-existent
  3. Assess whether the correlation is approximately linear
  4. Compare the point clouds between series or groups, if available.
  • Interpretation guide

General observation

  • Are the points arranged in order or randomly distributed?
  • Is there a discernible pattern to the points?
  • Can the pattern be described by a straight line?
  • If yes: Is the correlation positive or negative?
  • If not: Does the pattern indicate a non-linear correlation?
  • Is the dispersion of the points small or large?

For known specifications

  • Are the measured values within the defined specification limits?
  • Are there areas of the influencing variable in which the specification is not met?
  • Does the relationship change near the specification limits?

Note: A discernible correlation does not necessarily mean that one variable is the cause of the other.

  • Forms of representation

Various display formats are available for correlation diagrams. The display in the diagram changes depending on whether one or more data series and additional groups or series are selected. Data can thus be grouped or broken down by series for visualization and targeted comparison. All of the following display formats are based on the same file, but differ in the selection of columns used. The respective procedure is described in the individual tiles.

Procedure:

Step 1: Select column A in the Y data.

Step 2: Select column B in the X data.

Procedure:

Step 1: Select column A for data Y.

Step 2: Select column B for data X.

Step 3: Select column D (process status) for group.

Procedure:

Step 1: Select column A for data Y.

Step 2: Select column B for data X.

Step 3: Select column D (process status) for group.

Step 4: Select column E (product) for series.

  • Requirements

  • Two quantitative variables
  • Tools

    (When are others more suitable?)

  • If only a single variable is considered and no correlation is to be analyzed → histogram or box plot.
  • If categories are to be compared with each other → bar chart or box plot.
  • If the focus is on developments over timetime series chart.
  • If a correlation is to be demonstrated.
  • Examples

Development

Cooling time and material hardness

In the development of a new component, research is being conducted into how the cooling time after heat treatment affects the hardness of the material.
It is assumed that a longer cooling time results in lower material hardness.

The correlation diagram shows a negative correlation between cooling time and material hardness.
The measured hardness decreases as the cooling time increases.

Quality assurance

Comparison of production and QA measurements

In production, a quality characteristic is measured inline, e.g., the filling quantity.
In quality assurance, the same characteristic is checked again using a separate measuring device.

To investigate whether the measured values from production and quality assurance are consistent with each other, both measured values are recorded as a value pair and displayed in a correlation diagram.

The scatter diagram shows the relationship between the QA measurement value and the production measurement value for the filling quantity.

In the lower measurement range, the production and QA measurement values correspond very well and are approximately on a straight line.
However, at higher measurement values, there is an increasing deviation, with the QA measurement values being higher than the production measurement values.

Production

Impact of maintenance frequency on downtime

In production, the effect of maintenance frequency on unplanned machine downtime is examined.
It is assumed that regular maintenance reduces unplanned downtime, but that this effect diminishes after a certain maintenance frequency.

For several machines, the number of maintenance operations per month and the unplanned downtime during the same period are recorded and documented as a pair of values.
A correlation diagram is used to check whether there is a correlation and whether a saturation effect can be identified.

The correlation diagram shows that as the maintenance frequency increases, unplanned downtime initially decreases significantly.
From a maintenance frequency of around 4–5 maintenance operations per month, the effect flattens out, with further maintenance operations leading to only minor additional improvements.

IT support

Processing time for tickets by location

The IT service desk is investigating whether the age of a ticket has an impact on the processing time.
It is assumed that older tickets are often more complex or have been escalated multiple times, resulting in longer processing times.

For several tickets, the ticket age at the time of processing and the actual processing time are recorded and documented as a pair of values.
A correlation diagram is used to check whether there is a correlation between ticket age and processing time.

The scatter plot shows a wide variation in processing time across all ticket ages.
There is no clear linear correlation between ticket age and processing time.

Sales

Sales rate vs. offer duration

Sales offers are created for customers in the sales department.
The aim is to investigate whether the duration of the offer process has an impact on the sales rate.

For several offers, the offer duration (time from offer creation to decision) and the resulting sales rate are recorded and documented as a pair of values.
A correlation diagram is used to check whether there is a discernible connection between offer duration and sales rate.

The scatter plot shows the relationship between offer duration and sales rate, separated by products A and B.

For product A, no clear linear relationship can be seen, as the sales rate varies greatly over the offer duration.
Product B, on the other hand, shows higher sales rates for longer offer durations.

The diagram illustrates that the relationship between offer duration and sales rate differs between products.

Logistics

Delivery time to logistics center

In logistics, customer orders are processed across multiple logistics centers.
The aim is to investigate whether the delivery quantity has an impact on the delivery time.

For several orders, the delivery quantity and the actual delivery time are recorded and documented as a pair of values.
A correlation diagram is used to check whether there is a discernible relationship between delivery quantity and delivery time.

The scatter plot shows the relationship between delivery quantity and delivery time.

The points show a clear dispersion.
No relationship between delivery quantity and delivery time can be identified.

Purchasing

On-time deliveries vs. order time

The purchasing department is investigating whether the length of the order lead time has an impact on on-time delivery.
It is assumed that longer lead times improve planning and thus increase the on-time delivery rate.

For several orders, the lead time (time between order and planned delivery date) and the actual on-time delivery rate are recorded and documented as a pair of values.
A correlation diagram is used to check whether there is a discernible relationship between lead time and on-time delivery rate.

The scatter plot shows a positive correlation between lead time and on-time delivery rate.
As lead time increases, the on-time delivery rate rises.

The points show an upward trend, indicating an approximately linear correlation.

Planning

Forecast deviation

In production planning, forecasts are adjusted using a correction factor to compensate for systematic over- or under-forecasting.
The aim is to investigate how the size of the correction factor applied affects the remaining forecast deviation.

For several forecasts, the correction factor used and the actual forecast deviation are recorded and documented as a pair of values.
A correlation diagram is used to check whether there is a discernible relationship between the correction factor and the forecast deviation.

The scatter plot shows the relationship between the correction factor and the forecast deviation.

With low correction factors, the forecast deviation is predominantly positive, indicating an overestimation of demand.
As the correction factor increases, the forecast deviation decreases and is close to 0%in the range around correction factor = 1.00.

For higher correction factors, the forecast deviation becomes increasingly negative, indicating an underestimation of demand.

  • Terms

Correlation diagram: Diagram for the graphical representation of the relationship between two numerical variables.

Scatter diagram: Alternative name for the correlation diagram, in which individual value points are displayed.

Influencing variable (x): Variable whose influence on another variable is being investigated.

Target variable (y): Variable that is to be influenced by the influencing variable.

Value pair: Related measured values from the influencing variable and the target variable.

Linear relationship: Relationship in which the values are approximately arranged along a straight line.

Non-linear relationship: Relationship in which the progression cannot be described by a straight line.

Dispersion: Measure of the distribution of points around a recognizable curve.

Positive correlation: As the x-value increases, the y-value increases.

Negative correlation: As the x-value increases, the y-value decreases.

Correlation: Measure of the strength and direction of a relationship between two variables.

Regression: Method for describing a relationship using a mathematical function.

Causality: Cause-and-effect relationship between two variables that cannot be derived from a correlation diagram.

  • Formulas

Linear model

\( \mathrm{y}=\mathrm{a}_0+\mathrm{a}_1\,\mathrm{x} \)

Quadratic model

\( \mathrm{y}=\mathrm{a}_0+\mathrm{a}_1\,\mathrm{x}+\mathrm{a}_2\,\mathrm{x}^2 \)

Cubic model

\( \mathrm{y}=\mathrm{a}_0+\mathrm{a}_1\,\mathrm{x}+\mathrm{a}_2\,\mathrm{x}^2+\mathrm{a}_3\,\mathrm{x}^3 \)

4th degree polynomial

\( \mathrm{y}=\mathrm{a}_0+\mathrm{a}_1\,\mathrm{x}+\mathrm{a}_2\,\mathrm{x}^2+\mathrm{a}_3\,\mathrm{x}^3+\mathrm{a}_4\,\mathrm{x}^4 \)

5th degree polynomial

\( \mathrm{y}=\mathrm{a}_0+\mathrm{a}_1\,\mathrm{x}+\mathrm{a}_2\,\mathrm{x}^2+\mathrm{a}_3\,\mathrm{x}^3+\mathrm{a}_4\,\mathrm{x}^4+\mathrm{a}_5\,\mathrm{x}^5 \)

6th degree polynomial

\( \mathrm{y}=\mathrm{a}_0+\mathrm{a}_1\,\mathrm{x}+\mathrm{a}_2\,\mathrm{x}^2+\mathrm{a}_3\,\mathrm{x}^3+\mathrm{a}_4\,\mathrm{x}^4+\mathrm{a}_5\,\mathrm{x}^5+\mathrm{a}_6\,\mathrm{x}^6 \)