Siliconindia Magazine

Enterprise Services

6 Most Terrible Big Data Practices To Avoid

By siliconindia | Monday, 30 November -1, 05:30:00 AM IST

5. Treating HDFS as just a file system

Hadoop Distributed File System (HDFS), is a distributed file system intended to hold very large amounts of data .Files are stored in a redundantly across various machines to ensure high availability to parallel applications.

Since the file storing is done in a redundant fashion, you might get confused where to secure all the files at a single time. HDFS is just a distributed file system that solves this problem in different ways.

6. RAID/LVM/SAN/VMing your data nodes

What will happen if you put the Hadoop stripes blocks of data across multiple nodes and RAID strips it across various disks? A noisy, low-performing and dormant mess, that’s all.

Definitely LVM is better for internal file systems, but one cannot decide at random that all hundred of data nodes need to be larger, instead of adding a few more data nodes.

It’s just you need to think out of the box!

Siliconindia Magazine

6 Most Terrible Big Data Practices To Avoid

ON THE DECK

Geotech Geospatial: A Trailblazer Leading...

Revolutionizing Marketing With The Trends

Wiggle Media: On A Mission To Deliver Quality...