Fast algorithm to calculate Pi in parallel -
i starting learn cuda , think calculating long digits of pi nice, introductory project.
i have implemented simple monte carlo method parallelize-able. have each thread randomly generate points on unit square, figure out how many lie within unit circle, , tally results using reduction operation.
but not fastest algorithm calculating constant. before, when did exercise on single threaded cpu, used machin-like formulae calculation far faster convergence. interested, involves expressing pi sum of arctangents , using taylor series evaluate expression.
an example of such formula:

unfortunately, found parallelizing technique thousands of gpu threads not easy. problem majority of operations doing high precision math opposed doing floating point operations on long vectors of data.
so i'm wondering, what efficient way calculate arbitrarily long digits of pi on gpu?
you should use bailey–borwein–plouffe formula
why? first of all, need algorithm can broken down. so, first thing came mind having representation of pi infinite sum. then, each processor computes 1 term, , sum them in end.
then, preferable each processor manipulates small-precision values, opposed high precision ones. example, if want 1 billion decimals, , use of expressions used here, chudnovsky algorithm, each of processor need manipulate billion long number. that's not appropriate method gpu.
so, in all, bbp formula allow compute digits of pi separately (the algorithm cool), , "low precision" processors! read "bbp digit-extraction algorithm π"
advantages of bbp algorithm computing π algorithm computes π without requiring custom data types having thousands or millions of digits. method calculates nth digit without calculating first n − 1 digits, , can use small, efficient data types. algorithm fastest way compute nth digit (or few digits in neighborhood of nth), π-computing algorithms using large data types remain faster when goal compute digits 1 n.
Comments
Post a Comment