BEAST v2 Tutorial
your run will differ, but should not be that far appart (less than 2 SDs, or about 22 log points in 95% of
the time).
To get more accurate estimates, the number of particles can be increased. The expected SD is sqrt(H/N)
where N is the number of particles and H the information. The information H is conveniently estimated
in the nested sampling run as well. To aim for an SD of say 2, we need to run again with N particles such
that 2=sqrt(125/N), which means 4=125/N, so N=125/4 and N=32 will do. Note that the computation
time of nested sampling is linear in the number of particles, so it will take about 32 times longer to run if
we change the particleCount from 1 to 32 in the XML.
A pre-cooked run with 32 particles can be found here: https://github.com/rbouckaert/NS-tutorial/
tree/master/precooked_runs. Download the files HBV-Strict-NS32.log and HBVUCLN-NS32.log and
run the NSLogAnalyser application to analyse the results. To start the NSLogAnalyser from the command line,
use
ap p lau n che r NSL o gAn a l yse r - no p ost e rio r - N 32 - lo g / path / to / HB VS tr ic t - NS32 . log
ap p lau n che r NSL o gAn a l yse r - no p ost e rio r - N 32 - lo g / path / to / HBVUC LN - N S32 . lo g
or from BEAUti, click menu File/Launch apps, select NSLogAnalyser and fill in the form in the GUI. The output
for the strict clock analysis should be something like this:
Lo adi ng HBVStrict - NS 32 . log , bu rni n 0%, s kip pin g 0 log lin es
| - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - |
∗∗∗ ∗ ∗∗∗∗∗∗∗∗ ∗∗∗∗∗∗∗∗ ∗∗∗∗∗∗∗∗ ∗ ∗
Ma r gi n al lik e lih o od : - 1 242 6 . 207 7 5 047 4 8 12 sqrt ( H /N ) = ( 1 . 8 9 1 3 0 5 9 067381148)=?=SD = ( 1 . 8 3 74367294317693)←-
In f orm a tio n : 114. 4 6 5 21705 1 5 9 945
Max ESS : 400.4 1 2 1 42090 5 2 8 96
Ca l cul a tin g st a tis t ics
| - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - | - - - - - - - - - |
∗∗∗∗∗ ∗ ∗ ∗ ∗∗∗∗∗∗∗∗∗∗∗ ∗ ∗ ∗ ∗∗∗∗∗∗∗∗∗∗∗ ∗ ∗ ∗∗∗∗∗∗∗∗∗∗∗∗ ∗ ∗ ∗ ∗∗∗∗∗∗∗∗∗∗ ∗ ∗ ∗ ∗∗∗∗∗∗∗∗∗∗∗∗ ∗ ∗ ∗∗∗
#Par t icl es = 32
item mean st dd ev
po s ter ior - 1 25 12 .7 2 . 98 8 1 07
li k eli h oo d - 1 23 11 .7 2 . 91 6 33
pr io r - 2 01 .0 09 1 . 58 0 2 07
tre e Lik e l iho o d - 1 23 11 .7 2 . 91 6 33
Tr e eHe i gh t 34 4 3. 9 55 13 4 .1 9 21
ka pp a 2. 6 79 1 69 0. 1 51 2 43
po pSi ze 23 3 7. 4 54 29 0 .0 4 07
Coa l e sce n t C ons t a nt -1 6 3 . 1 9 1 3 . 09 0 67 1
fre q Par a met e r . 1 0. 2 40 0 33 0. 0 05 8 84
fre q Par a met e r . 2 0. 2 67 8 56 0. 0 06 8 14
fre q Par a met e r . 3 0. 2 17 0 84 0. 0 06 6 38
fre q Par a met e r . 4 0. 2 75 0 27 0. 0 06 7 16
Done !
So, that gives us a ML estimate of -12426.2 with SD of 1.89, slightly better than the 2 we aimed for,
but the information is also a bit lower than we assumed (114 vs 128). Furthermore, there are posterior
estimates of all the entries in the trace log. Nested sampling does not only estimate MLs and SDs, but
can also provide a sample from the posterior, which can be useful for cases where MCMC has trouble with
convergence. But let’s not digress too much and get back to model selection.
8