While there’s something perversely beautiful about a press release that’s aimed way over the heads of the reporters who are likely to get it, please remember that the generally-accepted protocol is to at least hint at what you’re talking about in plain English, so the clueless journo who receives it can figure it out if he or she knows anyone who possesses the knowledge to decipher it. Then it can be forwarded. Opaque releases get dumped.
Happy Holidays! Thought I’d update you on LexisNexis Big Data as we roll out new use cases in the upcoming year!
HPCC and Hadoop are both open source projects released under an Apache 2.0 license, are free to use, with both leveraging commodity hardware and local storage interconnected through IP networks. Both allow for parallel data processing and/or querying across architecture. While this doesn’t necessarily mean that certain HPCC operations don’t use a scatter and gather model (equivalent to Map and Reduce), but HPCC was designed under a different paradigm and provides a comprehensive and consistent high-level and concise declarative dataflow oriented programming model.
One limitation of the strict MapReduce model is that internode communication is left to the Shuffle phase. This makes iterative algorithms that require frequent internode data exchange hard to code and slow to execute (as they need to go through multiple phases of Map, Shuffle and Reduce, ea representing a barrier operation that forces serialization of the long tails of execution). HPCC provides for direct inter-node communication at all times and is leveraged by many of the high level ECL primitives.
Another disadvantage for Hadoop is the use of Java for the entire platform, including the HDFS distributed filesystem — adding overhead from the JVM; in contrast, HPCC and ECL are compiled into C++, which executes natively on top of the OS. This leads to more predictable latencies and overall faster execution — we have seen anywhere between 3 & 10 X faster execution on HPCC when compared to Hadoop on the exact same hardware.
Would love to explain more — any chance to set up a meeting or call on this?
When I was a tech magazine editor, my general rule was to make 10% of the stories in each issue over the head of the majority of the audience. I wanted to give readers something to shoot for, and to show them what was beyond the horizons of their knowledge.
But I do not think this is a good guideline for press releases.
Hat tip: Pat Houston.