New Wrangler Supcomputer Tackles Big Data For Scientists

Handling a lot of data can be a massive task for supercomputers, which is why researchers are constantly developing machines to be able to process more at faster speeds. Now, scientists at the Texas Advanced Computing Center have brought a new kind of supercomputer online that could be a huge boon when it comes to wrangling data.

"When you're in the world of data, there are rocks and bumps in the way, and a lot of things that you have to take care of," former Hubble Space Telescope scientist Niall Gaffney said.

The new supercomputer is called Wrangler. It specializes in dealing with big data, such as computer problems that involve analyzing thousands of files that need to be quickly opened, examined, and cross-related. In addition, it was designed to work closely with the Stampede supercomputer, which is the 10th most powerful supercomputer in the world.

"We kept a lot of what was good with systems like Stampede, but added new things to it like a very large flash storage system, a very large distributed spinning disc storage system, and high speed network access. This allows people who have data problems that weren't being fulfilled by systems like Stampede and Lonestar to be able to do those in ways they never could before," Gaffney said.

Wrangler itself includes 600 terabytes of flash memory that's shared via PCI interconnect across Wrangler's 3,000 Haswell compute cores. All parts of Wrangler can access the same storage and can work parallel together on data that's stored inside the high-speed storage system.

Scientists are already hoping to take advantage of Wrangler to tame big data. Computer scientist Joshua New, who is the principal investigator of the Autotune project, wants to use Wrangler to help with his studies.

"Wrangler has enough horsepower that we can run some very large studies and get meaningful results in a single run," New said. "I think Wrangler fills a specific niche for us in that we're turning our analysis into an end-to-end workflow, where we define what parameters we want to vary. It creates the sampling matrix. It creates the input files. It does the computationally challenging task of running all the simulations in parallel. It creates the output. Then we run our artificial intelligence and statistic techniques to analyze that big data on the back end. Doing that from beginning to end as a solid workflow on Wrangler is something that we're very excited about."

The new supercomputer could be huge for future studies and is already becoming a part of workflow in several different projects.

Tags
Technology, Data, Tech, Computers, Computer
Real Time Analytics