Useful Resources


 NoSQL

CAP Theorem

Good Article Explaining Eventual Consistency


 HBase

HBase Storage Architecture 

HBase vs BigTable 

HBase MapReduce Integration

HBase and HDFS Locality

HBase Coprocessors

HBase Real World Use Cases

Cassandra vs HBase


 Java

Java Interviews and Basic Concepts

Spring and Web Frameworks

Spark

Spark Gotchas

Spark Best Practices and Tuning Guide

*****How to Implement MapReduce like Spark ********

http://technology.finra.org/code/using-spark-transformations-for-mpreduce-jobs.html

**Implementing Secondary Sort in Spark/Mapreduce*****

https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch01.html

http://codingjunkie.net/spark-secondary-sort/

http://blog.ditullio.fr/2015/12/28/hadoop-basics-secondary-sort-in-mapreduce/

*****Best Practices********

https://stackoverflow.com/questions/30520428/what-is-the-difference-between-memory-only-and-memory-and-disk-caching-level-in

https://robertovitillo.com/2015/06/30/spark-best-practices/

https://www.slideshare.net/databricks/strata-sj-everyday-im-shuffling-tips-for-writing-better-spark-programs

https://www.slideshare.net/cloudera/top-5-mistakes-to-avoid-when-writing-apache-spark-applications

https://www.slideshare.net/jcmia1/a-beginners-guide-on-troubleshooting-spark-applications

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/

http://fdahms.com/2015/10/04/writing-efficient-spark-jobs/

http://blog.bimarian.com/configure-spark-in-tune-with-your-application/

http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/

https://www.datastax.com/dev/blog/common-spark-troubleshooting

****Memory Management****

https://0x0fff.com/spark-memory-management/

****Optimizing Joins*****

https://www.safaribooksonline.com/library/view/high-performance-spark/9781491943199/ch04.html

https://intelligentinsight.wordpress.com/2016/07/05/optimizing-spark-sql-join-statements-for-high-performance/

******Issues *********-

https://github.com/JerryLead/MyNotes/blob/master/Grind/OOM-Cases-in-Spark-Users.md

http://stackoverflow.com/questions/21138751/spark-java-lang-outofmemoryerror-java-heap-space

https://www.altiscale.com/blog/tips-and-tricks-for-running-spark-on-hadoop-part-3-rdd-persistence/

******GC*****

https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html


%d bloggers like this: