6 Most Terrible Big Data Practices To Avoid


5. Treating HDFS as just a file system

 Hadoop Distributed File System (HDFS), is a distributed file system intended to hold very large amounts of data .Files are stored in a redundantly across various machines to ensure high availability to parallel applications.

Since the file storing is done in a redundant fashion, you might get confused where to secure all the files at a single time. HDFS is just a distributed file system that solves this problem in different ways.

6. RAID/LVM/SAN/VMing your data nodes

What will happen if you put the Hadoop stripes blocks of data across multiple nodes and RAID strips it across various disks? A noisy, low-performing and dormant mess, that’s all.

Definitely LVM is better for internal file systems, but one cannot decide at random that all hundred of data nodes need to be larger, instead of adding a few more data nodes.

It’s just you need to think out of the box!