10 articles in this selection
| 2009/07/28 Aster Data Systems
Aster Data Systems is a database systems for data warehousing - the first DBMS to tightly integrate SQL with MapReduce - providing insights on data analyzed on clusters of low-cost commodity hardware.
| |
|
|
| 2009/07/27 Distributed Data - Hadoop, Hbase and Hive
Google released whitepapers on their distributed filesystem called GFS and a parallel computing architecture called MapReduce. Since then, a number of open-source projects have started that implement these ideas. Of note, though, is the Apache project Hadoop which implements a distributed filesystem and the MapReduce framework in Java. So I decided to build a cluster of 10 nodes and try it out....
| |
|
|
| 2009/07/27 Wikipedia on MapReduce
MapReduce is a software framework introduced by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms. MapReduce libraries have been written in C++, C#, Java, Python, F# and other programming languages....
| |
|
|
| 2009/07/27 Wikipedia on Hadoop
Apache Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers....
| |
|
|
| 2009/07/27 Why MapReduce matters to SQL data warehousing
Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is "Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up." Read on for the long answer....
| |
|
|
| 2009/07/27 Decline of the Enterprise Data Warehosue Due to Hadoop, HBase, and Hive
With the rise of Social Media and the decreasing cost of storage, very small companies have a need for processing massive quantities of data. Furthermore, it's easier than ever to write software to generate/output/process data, thanks to languages like Ruby, frameworks like Spring, and scalability best practices. A few days' work and a handful of engineers can net you a bare-bones Twitter clone, or a crawler to get the link graph of the entire Internet. This data growth can be far from linear. You simply can't analyze this much data in an RDBMS - but these small startups can't spend millions on DWs, either. Since the analysis is a core of the business, internal hacked-together tools cobbled together from SQL boxes often emerged. What results is a temporary solution, not a platform. With Hadoop, HBase, and Hive, there’s now a free, scalable Data Warehousing platform that makes it possible to migrate a considerable portion of DW analytics....
| |
|
|
| 2009/07/22 A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks
There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model. In this paper, we describe and compare both paradigms. Furthermore,we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs....
| |
|
|
| 2009/07/22 Researchers: Databases still beat Google's MapReduce
The paper, titled "A Comparison of Approaches to Large-Scale Data Analysis" is sure to stoke heated discussion among data junkies over the technical merits of MapReduce versus traditional databases. The conclusion? Databases "were significantly faster and required less code to implement each task, but took longer to tune and load the data," the researchers write. Database clusters were between 3.1 and 6.5 times faster on a "variety of analytic tasks."...
| |
|
|
| 2009/07/22 HadoopDB Project
An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
| |
|
|
| 2009/07/01 Aster Data
Aster Data Systems is a proven leader in high-performance analytic database systems for data warehousing - the first DBMS to tightly integrate SQL with MapReduce - providing deep insights on data analyzed on clusters of low-cost commodity hardware.
| |
|
|