S. Sengupta, M. Harris, Y. Zhang, J. D. Owens, Scan primitives for gpu computing, in: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, GH '07, Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 2007, pp. 97-106. URL http://dl.acm.org/citation.cfm?id=1280094. 1280110.
S. Collange, M. Daumas, D. Defour, Graphic processors to speed-up simulations for the design of high performance solar receptors, in: Application-specific Systems, Architectures and Processors, 2007. ASAP. IEEE International Conf. on, IEEE, 2007, pp. 377-382.
Z. Wei, J. JaJa, Optimization of linked list prefix computations on multithreaded gpus using cuda, in: Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, 2010, pp. 1 -8. doi:10.1109/IPDPS.2010. 5470455.
K. Hawick, A. Leist, D. Playne, Parallel graph component labelling with gpus and cuda, Parallel Computing 36 (12) (2010) 655 - 678. doi:10.1016/j.parco. 2010.07.002. URL http://www.sciencedirect.com/science/article/pii/ S0167819110001055.
C. Leiserson, B. M. Maggs, Communication-efficient parallel algorithms for distributed random-access machines, Algorithmica 3 (1988) 53-77.
D. Shirmohammadi, H. Hong, A. Semlyen, G. Luo, A compensation-based power flow method for weakly meshed distribution and transmission networks, Power Systems, IEEE Transactions on 3 (2) (1988) 753 -762. doi:10.1109/59.192932.
W. M. Fitch, Toward defining the course of evolution: Minimum change for a specific tree topology, Syst Biol 20 (1971) 406-416.
D. Sankoff, Minimal mutation trees of sequences, SIAM Journal on Applied Mathematics 28 (35-42).
D. Merrill, M. Garland, A. Grimshaw, Scalable gpu graph traversal, SIGPLAN Not. 47 (8) (2012) 117-128. doi:10.1145/2370036.2145832. URL http://doi.acm.org/10.1145/2370036.2145832.
L. Luo, M. Wong, W.-m. Hwu, An effective gpu implementation of breadth-first search, in: Proceedings of the 47th Design Automation Conference, DAC '10, ACM, New York, NY, USA, 2010, pp. 52-55. doi:10.1145/1837274.1837289. URL http://doi.acm.org/10.1145/1837274.1837289.
M. Hussein, A. Varshney, L. S. Davis, On implementing graph cuts on cuda, FirstWorkshop on General Purpose Processing on Graphics Processing Units.
P. Harish, P. J. Narayanan, Accelerating large graph algorithms on the gpu using cuda, in: Proceedings of the 14th international conference on High performance computing, HiPC'07, Springer-Verlag, Berlin, Heidelberg, 2007, pp. 197-208. URL http://dl.acm.org/citation.cfm?id=1782174.1782200.
Y. S. Deng, B. D. Wang, S. Mu, Taming irregular eda applications on gpus, in: Proceedings of the 2009 International Conference on Computer-Aided Design, ICCAD '09, ACM, New York, NY, USA, 2009, pp. 539-546. doi:10.1145/1687399. 1687501. URL http://doi.acm.org/10.1145/1687399.1687501.
M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, S. Marlow, Data parallel haskell: A status report, in: Proceedings of the 2007 workshop on Declarative aspects of multicore programming, DAMP '07, ACM, New York, NY, USA, 2007, pp. 10-18. doi:10.1145/1248648.1248652. URL http://doi.acm.org/10.1145/1248648.1248652.
Scandal project home page, http://www.cs.cmu.edu/scandal/(2012).
The manticore project, http://manticore.cs.uchicago.edu/(2012).
G. L. Miller, J. H. Reif, Parallel tree contraction and its application, in: 26th Symposium on Foundations of Computer Science, IEEE, Portland, Oregon, 1985, pp. 478-489.
G. E. Blelloch, Prefix sums and their applications, Tech. rep., Synthesis of Parallel Algorithms (1990).
R. E. Tarjan, U. Vishkin, Finding biconnected componemts and computing tree functions in logarithmic parallel time, in: Proceedings of the 25th Annual Symposium onFoundations of Computer Science, 1984, SFCS '84, IEEE Computer Society, Washington, DC, USA, 1984, pp. 12-20. doi:10.1109/SFCS.1984.715896. URL http://dx.doi.org/10.1109/SFCS.1984.715896.
M. Atallah, U. Vishkin, Finding euler tours in parallel, J. Comput. Syst. Sci. 29 (3) (1984) 330-337. doi:10.1016/0022-0000(84)90003-5. URL http://dx.doi.org/10.1016/0022-0000(84)90003-5.
M. Harris, S. Sengupta, J. D. Owens, Parallel prefix sum (scan) with CUDA, in: H. Nguyen (Ed.), GPU Gems 3, Addison Wesley, 2007, Ch. 39, pp. 851-876.
M. Harris, M. Garland, GPU Computing Gems Jade Edition, 1st Edition, no. 3, MKP, 2011, Ch. Optimizing Parallel Prefix Operations for the Fermi Architecture.
The university of florida sparse matrix collection, http://www.cise.ufl. edu/research/sparse/matrices/(2012).