The title is stolen from Dijkstra.
Papers submitted to medical journals below the top rank are most often reviewed by medical reviewers who have used the Wilcoxon test as a catch-all in their active days. To smuggle a paper though their phalanx, garnish it generously with:
Other modernistic stuff will delay the review process: “The authors should supplement their Bayesian analysis by non-random methods that provide correct p-values”. There is nothing wrong with Wilcoxon tests and Bland-Altman plots, and the mentioned AUC is on the boundary to semi-professionalism, so is is worth a look. Anything left I did not mention?
The AUC (area under the curve) or the AOB (signed area over baseline) is popular because it is easy to calculate: for the case of equidistant points, it is the average over all values multiplied with the record duration. Or, equivalently, the area of a the equivalent rectangle shown as shaded area below, which has a height equal to the mean volume and a width equal to record duration. There are better approximation for the AUC (Gabrielsson (2006), p. 161), but that does not change the validity of the arguments in the following.
The AUC is highly biased when used naively; that is, when it is computed over an arbitrary record duration.
The AUC is very useful when computed up to very large times, when the volume has gone back to baseline and no further contribution to the area is to be expected. This is illustrated in the curves below, which are similar to the PDR curves recorded in 13C breath tests. For all time series shown, the total area under the curve up to infinity has been chosen to be the same. The AUC is a measure of bioavailability (Gabrielsson 2006), the chief reason why in non-compartmental pharmacodynamics it is the prime parameter of interest. The AUCs computed from the recorded data range is invalid in the left panel, but a good approximation in the right panel.
It is possible to compute the AUC up to infinity even when only a limited data set is available; or, even worse, when the duration of the records is not the same between test units. First, you have to fit a model curve to your data that goes back to the reference line at infinity, and has a finite improper integral; the first does not imply the latter. Remember high school calculus: the function 1/x does not have an improper integral - it is infinite - but 1/x2 has one; even 1/x1.0001 has one.
In analysis of 13C breath test curves (see Bluck (2009) for a review), the Gamma function from earlier publications does not have an improper integral, while the exponential beta function has one; some examples are given here. In analysis of gastric emptying, the power exponential generally does not have an improper integral, but the linexp function has one, so the AUC can be calculated.
Once you fitted a parametric curve to your data, the (the only, real, valid, nice!) AUC can be computed by consulting an integral table or feeding Wolfram Alpha. Note that a spline function cannot be used in this case, because the spline is a spline is a spline and thus flexible to bend to data’s capriciousness, but it does not know where volume will wander in the infinite future.
Too much to remember for the next meeting with you boss (the “we did all with Wilcox” guy)? Read the executive summary:
1: Use a weighted mean for non-equidistant points
One feature of AUC is its robustness; the form of the curve does not play a role, a curve with two peaks can have the same AUC as one dropping down montonously, even if both curves tell different stories. Robustness can be a virtue, but can also hide interesting effects; or, to tell your boss: you may loose power by reducing curve details to an area.
When you find a good model function for your interesting alternatives, you are back in power.