Process and Assay Improvement

Understanding False Discovery Rate

Timothy Gardner
FD Rtheory

For the “why should I bother?” behind FDR, also see this post:

False Discovery Rate—The Most Important Calculation You Were Never Taught

FDR is a very simple concept. It is the number of false discoveries in an experiment divided by total number of discoveries in that experiment. A “discovery” is a test that passes your acceptance threshold (i.e., you believe the result is real). But there is a problem, you never know how many of discoveries are actually real or false when you accepted them. After all, that is the whole point of doing the experiment. So how do you estimate FDR from your data?

Yoav Benjamini and Yosef Hochberg gave us that answer to that in 1995.[1] Their method is marvelously powerful and surprisingly simple. With it, you calculate a “Q-value”, which is an estimate of FDR, from the P-values in your experiment. The formula for a Q-value is:

qi = pi N / i

Where pi is the i th smallest P-value out of N total P-values for the experiment. (You calculate one P-value for each sample or test in your experiment.)

What does this equation mean?

  • The numerator ( piN ) is the expected number of false results if you accept all results that have P-values of pi or smaller. Why? Because pi is the probability of a accepting a false result by chance, and N is the total number of results in your experiment. So pi times N is the expected number of false results.
  • The denominator ( i ) is the number of results you actually accept at the ith P-value threshold. If you accept more results at a particular P-value than would be expected by chance, then some of those should be true positives, and the rest should be false positives.

Thus the Q-value equation is literally the expected false positives based on the P-value, divided by the total number of positives actually accepted at that same P-value.

You can use the Q-value much like a P-value. For example, you might choose to accept all results with a Q-value of 0.25 or less. That means you expect that 25% or less of your accepted results will be false.

Here's how to calculate a Q-value:

  1. Rank order the P-values from all of your multiple hypotheses tests in an experiment.
  2. Calculate qi = pi N / i.
  3. Replace qi with the lowest value among all lower-rank Q-values that you calculated.

This last bit is done in order to correct for the fact that qi is not a monotonic function. Thus moving to a lower P-value might actually result in a higher Q-value, which doesn't make any sense. Thus step 3 is a statistically validated adjustment that ensures monotonically decreasing Q-values. It's explained nicely here.[2]

Many statistical packages will calculate FDR for you, and if you are an Excel user, we have provided here a few lines of VBA code that you can paste into your spreadsheets as a user-defined function. Or you can open the example spreadsheet we provided with this blog post, which includes the FDR function in the embedded VBA code. This example spreadsheet also provides a simulation of titer data collected for a hypothetical cell line screening experiment which was described in our related blog post.

Function FDR(Pval As Double, PvalDist, Optional Q As Boolean = True, Optional FDRType As Integer = 1)

' ©2017 Riffyn Inc
' License: MIT (see below)
' IF THIS HELPS YOUR WORK, PLEASE DROP US A THANK YOU ON OUR BLOG AT:
' https://riffyn.com/riffyn-blog...
'
' Calculates the false discovery rate for a P-value using a set of P-values
' calculated from the same null hypothesis.
'
' ARGUMENTS
' Pval: the P-value for which the FDR will be calculated
' PvalDist: Range of cells containing the set of all P-values calculated for the
' experiment

' Q: Optional. If TRUE, then return the q-value (adjusted FDR to ensure
' monotonicity). If FALSE return the unadjusted FDR.
' FDRType: Optional. Selects the method used for calculating the FDR

Const BH As Integer = 1 'Benjamini Hochberg FDR method'

Dim PvalCount As Long
Dim FDRtemp As Double
Dim FDRDist() As Double

PvalCount = WorksheetFunction.Count(PvalDist)
PvalRank = WorksheetFunction.Rank(Pval, PvalDist, 1)

Select Case FDRType
Case BH
 FDR = PvalCount * Pval / PvalRank

 If Q Then
 For i = 1 To PvalCount
 If PvalDist(i) > Pval Then
 FDRtemp = PvalCount * PvalDist(i) / _
 WorksheetFunction.Rank(PvalDist(i), PvalDist, 1)
 If FDRtemp < FDR Then
 FDR = FDRtemp
 End If
 End If
 Next i
 End If

Case Else
 FDR = "Unrecognized FDR Type"
 Exit Function

End Select

End Function

'License: MIT
'Permission is hereby granted, free of charge, to any person obtaining a copy of
'This software and associated documentation files (the "Software"), to deal in the
'Software without restriction, including without limitation the rights to use, copy,
'modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
'and to permit persons to whom the Software is furnished to do so, subject to the
'following conditions:
'
'The above copyright notice and this permission notice shall be included in all
'copies or substantial portions of the Software.
'
'THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
'INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
'PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
'HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
'OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
'SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Interested in more insights from Riffyn? Follow us on LinkedIn to join the conversation.

Notes

† Although there are even more powerful approaches, they are harder to calculate, and BH gets you far enough for practical purposes in your daily work.

Citations

[1] Benjamini, Y. & Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological). 57(1): 289-300.

[2] Winkler, A.M. (2011 Sep) False Discovery Rate: Corrected & Adjusted P-value. Retrieved from: https://brainder.org/2011/09/05/fdr-corrected-fdr-adjusted-p-values/

Timothy Gardner's photo

Timothy Gardner

Tim Gardner is the Founder and the CEO of Riffyn. He was previously Vice President of Research & Development at Amyris, where he led the engineering of yeast strain and processes technology for large-scale bio-manufacturing of renewable chemicals. Tim has been recognized for his pioneering work in Synthetic Biology by Scientific American, the New Scientist, Nature, Technology Review, and the New York Times. He also served as an advisor to the European Union Scientific Committees and the Boston University Engineering Alumni Advisory Board. Tim enjoys hockey, running, mountain biking, and being beaten by his sons in almost everything.