Understanding False Discovery Rate
The most important calculation you never do
FDR is a very simple concept. It is the number of false discoveries in an experiment divided by total number of discoveries in that experiment. A “discovery” is a test that passes your acceptance threshold (i.e., you believe the result is real). But there is a problem, you never know how many of discoveries are actually real or false when you accepted them. After all, that is the whole point of doing the experiment. So how do you estimate FDR from your data?
Yoav Benjamini and Yosef Hochberg gave us that answer to that in 1995.[1] Their method is marvelously powerful and surprisingly simple. With it, you calculate a “Q-value”, which is an estimate of FDR, from the P-values in your experiment. The formula for a Q-value is:
What does this equation mean?
- The numerator ( p_{i} N ) is the expected number of false results if you accept all results that have P-values of p_{i} or smaller. Why? Because p_{i} is the probability of a accepting a false result by chance, and N is the total number of results in your experiment. So p_{i} times N is the expected number of false results.
- The denominator ( i ) is the number of results you actually accept at the ith P-value threshold. If you accept more results at a particular P-value than would be expected by chance, then some of those should be true positives, and the rest should be false positives.
Thus the Q-value equation is literally the expected false positives based on the P-value, divided by the total number of positives actually accepted at that same P-value.
You can use the Q-value much like a P-value. For example, you might choose to accept all results with a Q-value of 0.25 or less. That means you expect that 25% or less of your accepted results will be false.
Here's how to calculate a Q-value:
- Rank order the P-values from all of your multiple hypotheses tests in an experiment.
- Calculate q_{i} = p_{i} N / i.
- Replace q_{i} with the lowest value among all lower-rank Q-values that you calculated.
This last bit is done in order to correct for the fact that q_{i} is not a monotonic function. Thus moving to a lower P-value might actually result in a higher Q-value, which doesn't make any sense. Thus step 3 is a statistically validated adjustment that ensures monotonically decreasing Q-values. It's explained nicely here.[2]
Many statistical packages will calculate FDR for you, and if you are an Excel user, we have provided here a few lines of VBA code that you can paste into your spreadsheets as a user-defined function. Or you can open the example spreadsheet we provided with this blog post, which includes the FDR function in the embedded VBA code. This example spreadsheet also provides a simulation of titer data collected for a hypothetical cell line screening experiment which was described in our related blog post.
Function FDR(Pval As Double, PvalDist, Optional Q As Boolean = True, Optional FDRType As Integer = 1) ' ' ©2017 Riffyn Inc ' ' License: MIT (see below) ' ' IF THIS HELPS YOUR WORK, PLEASE DROP US A THANK YOU ON OUR BLOG AT: ' https://riffyn.com/riffyn-blog/2017/10/29/false-discovery-rate ' ' Calculates the false discovery rate for a P-value using a set of P-values ' calculated from the same null hypothesis. ' ' ARGUMENTS ' Pval: the P-value for which the FDR will be calculated ' PvalDist: Range of cells containing the set of all P-values calculated for the ' experiment ' Q: Optional. If TRUE, then return the q-value (adjusted FDR to ensure ' monotonicity). If FALSE return the unadjusted FDR. ' FDRType: Optional. Selects the method used for calculating the FDR Const BH As Integer = 1 'Benjamini Hochberg FDR method' Dim PvalCount As Long Dim FDRtemp As Double Dim FDRDist() As Double PvalCount = WorksheetFunction.Count(PvalDist) PvalRank = WorksheetFunction.Rank(Pval, PvalDist, 1)
Select Case FDRType Case BH FDR = PvalCount * Pval / PvalRank
If Q Then For i = 1 To PvalCount If PvalDist(i) > Pval Then FDRtemp = PvalCount * PvalDist(i) / _ WorksheetFunction.Rank(PvalDist(i), PvalDist, 1) If FDRtemp < FDR Then FDR = FDRtemp End If End If Next i End If
Case Else FDR = "Unrecognized FDR Type" Exit Function
End Select End Function 'License: MIT 'Permission is hereby granted, free of charge, to any person obtaining a copy of 'This software and associated documentation files (the "Software"), to deal in the 'Software without restriction, including without limitation the rights to use, copy, 'modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, 'and to permit persons to whom the Software is furnished to do so, subject to the 'following conditions: ' 'The above copyright notice and this permission notice shall be included in all 'copies or substantial portions of the Software. ' 'THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 'INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 'PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 'HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 'OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 'SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
Notes
† Although there are even more powerful approaches, they are harder to calculate, and BH gets you far enough for practical purposes in your daily work.
Citations
[1] Benjamini, Y. & Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological). 57(1): 289-300.
[2] Winkler, A.M. (2011 Sep) False Discovery Rate: Corrected & Adjusted P-value. Retrieved from: https://brainder.org/2011/09/05/fdr-corrected-fdr-adjusted-p-values/