In recent years, the volume of data created has grown at an incredible pace and is forecast to continue its rapid ascent. The volume of data created, captured, copied and consumed globally has increased from two zettabytes in 2010 to 64.2 zettabytes in 2020, and is predicted to reach 181 zettabytes in 2025.1 This growth has largely been fuelled by new technologies and software in various industries including the construction industry, which has introduced digital solutions such as building information modelling (BIM), virtual reality, electronic document management, artificial intelligence, robotics, 3D printing, drones and other data collection and monitoring technology.
The advent of big data poses new challenges for companies seeking to process and make use of the data produced. It also presents an issue in the disputes context, as the resources required to collect, process and review large amounts of data is often in opposition to a litigant's desire (as well as the objective of various legal codes and arbitral rules) to deal with cases efficiently and at proportionate cost. Parties that can make the best use of this data will place themselves at a significant advantage in any dispute resolution process.
Against this background, there has been a growing acceptance amongst lawyers, arbitral tribunals and the courts, of the use of sampling as a way to resolve this tension and to enable the prosecution of claims and use of data as evidence, in circumstances where it may otherwise be too disproportionately costly.
What is sampling?
Sampling is a means of finding out about the characteristics of a large population by asking questions of, or investigating, a subset of that population. The results of the investigation of the sample are then extrapolated to the whole population.
The English courts have approved sampling as a way to resolve a variety of construction disputes.2 While traditionally sampling has been associated with claims relating to defects on construction projects where the value of each individual defect is too low to justify evaluating each defect individually,3 the Court of Appeal has recently also approved its use in professional negligence claims.4
However, sampling has even wider application and should be considered whenever the quantity of data available or required is too substantial to justify a full review/analysis. For example, our team has worked on multi-billion US$ claims relating to delay and disruption occurring in the engineering process of mega energy infrastructure projects which have necessitated the use of sampling. These projects often involve the use of design databases, 3D Modelling software and the production, review and approval of 100,000s of design documents. The review and approval of the 3D Model and design documents can involve the exchange of 100,000 of further comments and communications between the owner, the main contractor, subcontractors and in some cases, relevant authorities. In our experience, delay and disruption in the engineering process is one of the most common causes of major delays and cost overruns on large construction projects, but also one of the hardest to properly analyse and allocate responsibility in respect of.
Types of sampling
A key requirement of effective sampling is that the sample investigated must be representative of the relevant population as a whole. Broadly speaking, sampling can be carried out in two ways – by selecting samples at random (as in statistical/probability sampling) or by selecting samples which are intended/expected to be representative of the whole (non-statistical sampling).
While both methods are acceptable in principle,5 a difficulty with non-statistical sampling is that as it requires subjective judgement in selecting the sample, it is difficult to demonstrate that the sample is truly representative of the population. Such studies are therefore particularly vulnerable to criticisms of bias and it can be difficult to obtain buy-in on the use of the selected sample from the other side and tribunal or court.
We consider that statistical sampling is therefore the preferred method of running a large sampling exercise: The selection process is random and the results can be supported by reference to statistical concepts such as confidence intervals and margin of error which demonstrate how accurate the results of your sampling study are likely to be by quantifying the uncertainty which is present in any sampling exercise.
How is statistical sampling carried out?
The statistical sampling process is broadly carried out as follows:
- Define the population and the sampling frame
The population is the group you wish to draw conclusions about, and the sampling frame is the group that you will draw your sample from. In an ideal world, these would be the same. However, it may be that for practical reasons, your sampling frame will not include the entire population.
- Draw a randomised sample from the sampling frame
Each item in the sampling frame should have an equal chance of being selected. This is often done using software, which selects samples at random.
- Analyse the sample
Examine the sample itself, preferably by reference to objective industry standards.
- Extrapolate the results
Extrapolate the results of your analysis to the population as a whole. The key benefit of probability sampling is that it produces a margin of error and confidence level percentage. This show how precise, or imprecise, the results of your study are and whether they can be safely relied on.
Getting it right
As set out above, the English cases have established that it is, in theory, possible to prove a claim based on sampling. We note that the failure rate is high and in both the Amey and ICI cases, the claims failed due to sampling errors.6 Some of these errors were egregious and, as noted by the courts, may have indicated an intention to obtain a beneficial, rather than representative sample. While advocates naturally wish to obtain the best results for their client, it is important to avoid the temptation to obtain what may be perceived as the most beneficial sample, as doing so risks completely invalidating the results of the sampling exercise.
Given the challenges involved with pursuing a case based on sampling, it is important to consider whether sampling is, in fact, appropriate for your case. It may be more appropriate, depending on the value of the claim and the volume of data, to review all the data and bring a case on a traditional basis. Your access to the population, and the characteristics of the population, may also affect whether your sampling exercise is likely to produce a reliable result. For example, if you are unable to include a substantial amount of your population in your sampling frame, or if your population is very varied in nature, it may not be desirable to use sampling or it may be necessary to introduce additional sampling techniques (such as stratification) in order to obtain precise results.
If sampling is the correct, or only, way to bring the claim, the case law and our experience with running cases based on sampling, indicate that the following key points should be borne in mind:
1. Beware of Bias
Put simply, bias is the difference between the sample results and the population it is meant to represent. If there is a significant amount of bias, this indicates that the results are likely to have been skewed – often inadvertently – by the sample selection or measurement process. Most commonly, bias arises when particular members of the population are improperly given a greater chance of selection, which means that the sample cannot be considered representative of the population. The presence of bias is often the reason why sampling claims fail.
A variety of issues can cause bias in the sampling process, but common sources of bias include:
(a) Non-random methods of drawing the sample – When subjective judgement is used to select samples, it is more likely that bias will be present. Even if the selection was carried out with the best intentions, it can be difficult to avoid bias occurring inadvertently and/or to convince the other side, who will naturally regard the samples with suspicion, that the sample is not bias;
(b) Over/under inclusion of potential samples from the sampling frame – While exclusion may be necessary at times it is important to reduce the exclusion of items and ensure that the sampling frame is as close to the population as possible. Excessive/illegitimate exclusion of possible items may result in the results of the study being invalidated. For example, in the ICI case, there was a pre-screening process, which meant the items included in the sampling frame were more likely to be defective. In the Amey case, the samples were limited to those which had been carried out later and which were easier to access. Conversely, it is important to ensure that the sampling frame does not include items which are not part of the population and which may skew the results of your study.
(c) Insufficient sample size – If your sample is too small, it can be difficult to obtain representative results. Bias tends to drop when the sample size increases because the sample population becomes more similar to the actual population.
(d) Measurement errors – Ensure that the methods of analysing the samples are consistently applied and have a rational basis. Where industry standards are available, these, or other objective forms of analysing the sample, should be used. Technical experts will often be involved in the analysis of a sample and it may be helpful to have a joint expert meeting to allow the experts to attempt to reach agreement on the approach to analysing, and the results of the analysis, of the samples.
2. Agree on the use of sampling, and the methodology, where possible
It is preferable to agree with the other side on the sampling methodology and approach at an early stage. This will reduce the risk of methodology-based objections to the results, and allow issues to be rectified before significant costs are incurred. A court/tribunal will also be more able to compare and evaluate the parties' results and conclusions if the parties have agreed on an approach and methodology. If the approach taken by each party is too dissimilar, a court or tribunal's decision is likely to be more unpredictable and chances of settling based on the results of the sample will be reduced.
3. Use an expert
Statistical sampling is complex, and is not an area that most legal professionals are expert or experienced in. It can therefore be helpful to appoint a statistical expert at an early stage to advise on the appropriateness of sampling and supervise the sampling exercise. If faced with a sampling claim, an expert would also be able to evaluate the sampling exercise undertaken by the other party.
Aside from the importance of getting the process right, educating a court/tribunal on sampling is key, as many judges/tribunal members may never have come across a claim based on sampling in their career. An expert can be helpful when it comes to explaining the science of statistical sampling and addressing the uncertainty in the results that will always be present to an extent.
Construction projects and the disputes that arise therefrom are well known to be data-heavy in nature. The increasing size and complexity of construction projects, coupled with the use of new technology, is only likely to contribute to an increasing need for techniques such as sampling to enable disputes to be resolved in a proportionate manner. While the use of statistical sampling is not yet commonplace, the recent approval by the courts and increasing understanding in the legal field of the science of statistical sampling, indicate that the use of sampling will only grow in the years to come and we may well be, as Lady Rose, Justice of the Supreme Court, has suggested, at the start of a "statistical revolution".7
1 Based on IDE and Statista estimates.
2 Building Design Partnership Limited v Standard Life Assurance Limited  EWCA; Amey LG Ltd v Cumbria County Council  EWHC 2856 (TCC).
3 See for example, Amey LG v Cumbria County Council  EWHC 2856 (TCC) and Imperial Chemical Industries Ltd v Merit Merrell Technology Limited (No.2)  EWHC 1763 (TCC).
4 Building Design Partnership Limited v Standard Life Assurance Limited  EWCA.
5 See Amey LG v Cumbria County Council  EWHC 2856 (TCC) paragraph 25.103.
6 See also In re Hardieplank Fiber Cement Siding Litig., No. 12-md-2359, 2018 WL 262826 at *26 (D. Minn. Jan. 2, 2018), and In re Chevron U.S.A., Inc., 109 F.3d 1016, 1019-20 (5th Cir. 1997), two U.S. cases which highlight the importance of a proper sampling methodology.
7 Lady Rose, A Numbers Game? Statistics in Public Law cases, ALBA Annual Lecture, 5 July 2021.
White & Case means the international legal practice comprising White & Case LLP, a New York State registered limited liability partnership, White & Case LLP, a limited liability partnership incorporated under English law and all other affiliated partnerships, companies and entities.
This article is prepared for the general information of interested persons. It is not, and does not attempt to be, comprehensive in nature. Due to the general nature of its content, it should not be regarded as legal advice.
© 2023 White & Case LLP