boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimisation - CPU , GPU & RAM - PC, Mac & ARM development
HPC - High Performance Computation for beneficial goals and obvious worth.
(Guide, experimentation, developer kit's and manuals)
Observing the workloads of many beneficial projects we find that commonly the workload data set is small,
In addition to the memory set being smaller or larger than a machine can compute optimally; we find that feature sets such as fae and avx have commonly not been implemented,
Some projects like asteroids at home and the seti project are using enhanced computation instruction sets ... like avx and memory loads that benefit from the 4gb or more ram that is available on decent gaming and home laptops.
Not all modern machines have loads of ram; However research and or university establishments use sufficiently powerful machines that can glow on the boinc record in full glory with a 256mb to 768mb workload,
In addition the machines are operand,xen ... commonly and servers may have such as Sparc or power pc specific hardware and instruction sets,
In order to examine examples .. below we can see workloads include small data arrays; in the 40mb to 79mb range..
In line with servers and gaming rigs .. we have 1gb of ram per core, of course not all issues require a larger array in the workload and some machines have 256mb per core !
However much Ram you allocate to the projected workload; small memory loads can and will be sufficient for data swapping and or paging (like DNA Replicators)...
Some task can sufficiently benefit from larger thread and data models, to my mind DNA and mapping data are fine examples of specific workloads; Where memory counts,
In addition thread count can be 4 or other numbers and i suggest that a single task can use more than one core and instruction set (neon for example or Symmetric threading FPU, SMT)
Specific workload optimisation, or rather generic with SSE and AVX and FPU threading and precision optimisation would be very cool while we deal with the workload running app
In particular the Ryzen multi-core is a new and exciting product,
So take care to read the guides in the lower half of the document, AVX2, RDSEED, ADX and additional encryption formats are some of the most exciting changes to the AMD Ryzen Arch.
Further thought ... Efficiency :
add a MHz/Dhrystone's/MIP'S performance per watt to each system ...
then projects will further optimise workloads to improve upon workload energy & environmental efficiency versus work carried out.
Work Hours x Mhz / (efficiency per watt)
-------
Hours / % of projects finished with work completed
Also bear in mind that GPU's need watt efficiency and task management to optimise power used versus work done....
worker priority should always be :
efficiency + merit of the work
--------
time / % necessity
Please examine the issue further.
Rupert S
https://www.worldcommunitygrid.org
https://boinc.berkeley.edu/
http://esa-space.blogspot.com/
HPC Computing work load Photos http://bit.ly/HPCImpact
http://bit.ly/HPC-Dev
http://bit.ly/tRNG-Dev
http://esa-space.blogspot.ru/2017/04/rng-and-random-web.html - we need Chaos Seeds : Random seeds for our work
AMD Platform Optimization - please read for all developers
https://community.amd.com/thread/213045 - particular instruction differences for microcode optimisation
http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Optimizing-For-AMD-Ryzen.pdf - code optimisation a few very important lessons... may seem simple to some but obviously is not to be taken for granted.
http://support.amd.com/TechDocs/24593.pdf - AMD64 Architecture Programmer’s Manual Volume 2: System Programming
CPU Optimisation - utility and function.
http://www.agner.org/optimize/ - code optimisation for all programmers on X86,X86-64bit and some others.
http://www.agner.org
for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx sse4a osvw xop wdt fma4 topx page1gb rdtscp bmi1
11000 Mips & 2700 FPU Mips - per Core
**
Compilers and Make
https://cmake.org/
http://llvm.org/
http://llvm.org/docs/FAQ.html
https://gcc.gnu.org/
**
an article that took some deep learning... itself ôo, anyway very interesting....
hip c++ will we think be simpler than open CL then as a higher level code port...
for a comparison of Gflops/Mips throughput of various Boinc Tasks ..
here we show the relevance of the code or function used ... AVX for example is multi threaded ! and so is the FPU pipeline of the AMD FX & Ryzen processor.....
http://bit.ly/HPCImpact (original non edited photos ...)
and set 2 (newer) http://bit.ly/2HPCImpact ....
see the work throughput GFlops compared to code efficiency per task !
sometimes entropy is needed to for-fill the task one would imagine (for example on android) http://bit.ly/tRNG-Dev
the improvement of the boinc and worldcommunitygrid projects has been observed, noted and one feels improved upon, ..
further improvement should be implemented as soon as possible; To improve work versus output efficiency.
thank you kindly programmers/Workers & scientists for your perseverance & effort.
RS
http://bit.ly/BoincStudies - Result Studies
https://browser.geekbench.com/v4/compute/743093 GPU Function
https://browser.geekbench.com/v4/cpu/2831836 CPU Function
boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimisation - CPU , GPU & RAM - PC, Mac & ARM development
HPC - High Performance Computation for beneficial goals and obvious worth.
(Guide, experimentation, developer kit's and manuals)
Observing the workloads of many beneficial projects we find that commonly the workload data set is small,
In addition to the memory set being smaller or larger than a machine can compute optimally; we find that feature sets such as fae and avx have commonly not been implemented,
Some projects like asteroids at home and the seti project are using enhanced computation instruction sets ... like avx and memory loads that benefit from the 4gb or more ram that is available on decent gaming and home laptops.
Not all modern machines have loads of ram; However research and or university establishments use sufficiently powerful machines that can glow on the boinc record in full glory with a 256mb to 768mb workload,
In addition the machines are operand,xen ... commonly and servers may have such as Sparc or power pc specific hardware and instruction sets,
In order to examine examples .. below we can see workloads include small data arrays; in the 40mb to 79mb range..
In line with servers and gaming rigs .. we have 1gb of ram per core, of course not all issues require a larger array in the workload and some machines have 256mb per core !
However much Ram you allocate to the projected workload; small memory loads can and will be sufficient for data swapping and or paging (like DNA Replicators)...
Some task can sufficiently benefit from larger thread and data models, to my mind DNA and mapping data are fine examples of specific workloads; Where memory counts,
In addition thread count can be 4 or other numbers and i suggest that a single task can use more than one core and instruction set (neon for example or Symmetric threading FPU, SMT)
Specific workload optimisation, or rather generic with SSE and AVX and FPU threading and precision optimisation would be very cool while we deal with the workload running app
In particular the Ryzen multi-core is a new and exciting product,
So take care to read the guides in the lower half of the document, AVX2, RDSEED, ADX and additional encryption formats are some of the most exciting changes to the AMD Ryzen Arch.
Further thought ... Efficiency :
add a MHz/Dhrystone's/MIP'S performance per watt to each system ...
then projects will further optimise workloads to improve upon workload energy & environmental efficiency versus work carried out.
Work Hours x Mhz / (efficiency per watt)
-------
Hours / % of projects finished with work completed
Also bear in mind that GPU's need watt efficiency and task management to optimise power used versus work done....
worker priority should always be :
efficiency + merit of the work
--------
time / % necessity
Please examine the issue further.
Rupert S
https://www.worldcommunitygrid.org
https://boinc.berkeley.edu/
http://esa-space.blogspot.com/
HPC Computing work load Photos http://bit.ly/HPCImpact
http://bit.ly/HPC-Dev
http://bit.ly/tRNG-Dev
http://esa-space.blogspot.ru/2017/04/rng-and-random-web.html - we need Chaos Seeds : Random seeds for our work
HPC Best Practices..
http://www.intertwine-project.eu/best-practice-guides
AMD Platform Optimization - please read for all developers
https://community.amd.com/thread/213045 - particular instruction differences for microcode optimisation
http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Optimizing-For-AMD-Ryzen.pdf - code optimisation a few very important lessons... may seem simple to some but obviously is not to be taken for granted.
http://support.amd.com/TechDocs/24593.pdf - AMD64 Architecture Programmer’s Manual Volume 2: System Programming
CPU Optimisation - utility and function.
http://www.agner.org/optimize/ - code optimisation for all programmers on X86,X86-64bit and some others.
http://www.agner.org
for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx sse4a osvw xop wdt fma4 topx page1gb rdtscp bmi1
11000 Mips & 2700 FPU Mips - per Core
**
Compilers and Make
https://cmake.org/
http://llvm.org/
http://llvm.org/docs/FAQ.html
https://gcc.gnu.org/
**
an article that took some deep learning... itself ôo, anyway very interesting....
hip c++ will we think be simpler than open CL then as a higher level code port...
and machine converted CUDA-code to 99.6%
http://www.anandtech.com/show/10831/amd-sc16-rocm-13-released-boltzmann-realized
**
PC/Mac/Windows/Linux/Android
https://www.khronos.org/news/events/2016-isc-high-performance
https://www.khronos.org/assets/uploads/developers/library/2008_siggraph_bof_opengl/OpenCL%20and%20OpenGL%20SIGGRAPH%20BOF%20Aug08.pdf HPC Report
https://www.microsoft.com/en-us/download/details.aspx?id=54507 Microsoft HPC Pack 2016 including linux
https://technet.microsoft.com/en-us/library/cc514029(v=ws.11).aspx all HPC Packs 2016,2012 to 2008 info and download
https://msdn.microsoft.com/en-us/library/ff976568.aspx Microsoft High Performance Computing for Developers - info and downloads
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/hpcpack-cluster-active-directory - information and virtualisation
**
OpenVX for high performance Computing : Multi platform spec
"OpenVX for HPC Neural Nets and processing .... a new way to deliver on research, gaming & processing of data and images"
https://www.khronos.org/news/tags/tag/OpenVX
https://www.khronos.org/news/press/openvx-1.2-specification-cross-platform-acceleration-power-efficient-vision
**
Open CL "GPU Development" links
https://www.khronos.org/blog/iwocl-where-you-learn-the-latest-on-opencl
https://www.khronos.org/opencl/
https://www.khronos.org/opencl/resources for SDK, learning & optimisation resources.
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/
https://github.com/RadeonOpenCompute - ROCm: Platform for GPU Enabled HPC and UltraScale Computing
http://gpuopen.com/professional-compute/
http://gpuopen.com/compute-product/hcrng/
https://bitbucket.org/multicoreware/hcrng
http://gpuopen.com/compute-product/clrng/
installing the AMD SDK improves compute performance, Optimise your code !
https://streamhpc.com/blog/2017-05-21/amd-open-sourced-rocms-opencl-driver-stack/
https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/amd-master/README.md
http://developer.amd.com/tools-and-sdks/opencl-zone/
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
http://gpuopen.com/games-cgi/
http://developer.amd.com/tools-and-sdks/graphics-development/
http://hgpu.org information and interesting learning & source
http://dspace.princeton.edu/jspui/bitstream/88435/dsp01wm117r22g/1/Jia_princeton_0181D_11168.pdf Optimisation for parallel computing information.
https://arxiv.org/pdf/1705.05249 - CLBlast: A Tuned OpenCL BLAS Library demonstration.
HIP - HSA - the CUDA Compatible C++ for Heterogeneous Computing
http://developer.amd.com/wordpress/media/2012/09/7637-HIP-Datasheet-V1_4-US-Letter.pdf
http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf - a full guide
http://www.hsafoundation.com/
http://www.hsafoundation.com/hsa-developer-tools/
https://github.com/HSAFoundation/HSA-docs-AMD/wiki#initial-implementation
https://github.com/HSAFoundation/HSAIL-Tools
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver - Driver for kernel
http://www.amd.com/Documents/SDN-Whitepaper.pdf - Smart Software Defined Networks
http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf - Secure Encrypted Virtualization Key Management
http://support.amd.com/TechDocs/Protecting%20VM%20Register%20State%20with%20SEV-ES.pdf - PROTECTING VM REGISTER STATE WITH SEV-ES
http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf - bios and kernel drivers
**
ARM Development software/SDK's & tools - HPC
https://developer.arm.com/products/software-development-tools
https://developer.arm.com/products/software-development-tools/hpc for high performance computing (ideal for Boinc)
https://developer.arm.com/products/software-development-tools/compilers for both HPC and APP development.
https://developer.arm.com/products/system-design/fixed-virtual-platforms
https://www.synopsys.com/verification/virtual-prototyping/vdk/vdk-for-arm.html
https://www.synopsys.com/designware-ip/technical-bulletin/designware-hybrid-ip.html
**
IOT links - (internet of things)
https://www.infoq.com/articles/thread-protocol-for-home-automation
http://wso2.com/wso2_resources/wso2_whitepaper_a-reference-architecture-for-the-internet-of-things.pdf
**
Linux arch reference material
https://www.ibm.com/developerworks/library/l-linuxuniversal/
**
Agency GPL
https://code.nasa.gov/
**
Workers :
https://www.upwork.com/hire/driver-development-freelancers/
http://www.wcgsig.com/342585.gif
Update 2:
for a comparison of Gflops/Mips throughput of various Boinc Tasks ..
here we show the relevance of the code or function used ... AVX for example is multi threaded ! and so is the FPU pipeline of the AMD FX & Ryzen processor.....
http://bit.ly/HPCImpact (original non edited photos ...)
and set 2 (newer) http://bit.ly/2HPCImpact ....
see the work throughput GFlops compared to code efficiency per task !
sometimes entropy is needed to for-fill the task one would imagine (for example on android) http://bit.ly/tRNG-Dev
the improvement of the boinc and worldcommunitygrid projects has been observed, noted and one feels improved upon, ..
further improvement should be implemented as soon as possible; To improve work versus output efficiency.
thank you kindly programmers/Workers & scientists for your perseverance & effort.
RS
http://bit.ly/BoincStudies - Result Studies
https://browser.geekbench.com/v4/compute/743093 GPU Function
https://browser.geekbench.com/v4/cpu/2831836 CPU Function