Hi community,
I'm aware that we can use Apache Spark with/without Hadoop.
But I am sure that the majority of people are using Apache Spark with Hadoop, and I read one article that states how using Apache Spark without Hadoop is not good for deployment, and can be usable for the development environment.
Is that true?
I'd greatly appreciate if anyone can elaborate on this.
Thanks.
I don't think using Apache Spark without Hadoop has any major drawbacks or issues. I have used Apache Spark quite successfully with AWS S3 on many projects which are batch based. Yes for very high performance system HDFS is a better option.
The main problem with Apache Spark with object storage like S3 has been the consistency problem of these object storage systems. You can read this post which will help you understand the issue and how to avoid it. Hope this helps you.
https://arnon.me/2015/08/spark...
I mean we can configure Spark without Hadoop as well like using WinUtils.exe . Is that recommended for Deployment ? Or would like to understand difference between Spark Hadoop Environment and Spark Without Hadoop?
Can you elaborate on the information you've been told about how using Apache Spark without Hadoop isn't good for deployment?
This insight would help many of our users.