Title: 5 Emerging Ideas in Hadoop technology which are Trending
15 Emerging Ideas in Hadoop Technology which are
in Trending
Copyright 2008 PresentationFx.com
Redistribution Prohibited Image woodsy/sxc.hu
This text section may be deleted for
presentation.
21. WEB NOTEBOOKS
- Web notebooks are a way to write code within the
web browser and have it run against a cluster of
servers. - Generally, web notebooks can support languages
such as Scala and Python, as well as more basic
languages such as HTML and Markdown, which allow
the creation of a notebook that can be presented
more easily - Integration of SQL into web notebooks has also
become a more popular feature, although the
capabilities of web notebooks vary greatly. - The only current limitation of these notebooks
lies within the realm of security. - Currently there is no real security model in
these web notebooks, but by putting a web server
in front of them, some level of security can be
achieved.
Copyright 2008 PresentationFx.com
Redistribution Prohibited Image woodsy/sxc.hu
This text section may be deleted for
presentation.
32. ALGORITHMS FOR MACHINE LEARNING
- The application of machine-learning algorithms is
a hot topic, and there are a number of important
reasons for this. - The first is that most people can see the
potential of leveraging machine-learning
algorithms to gain more insights into the data
they have. - Whether creating a recommendation engine,
personalizing a website, identifying anomalies,
or detecting fraud, the popularity of this area
is strong. - A New Look at Anomaly Detection and Practical
Machine Learning Innovations in Recommendation
can each be read within a few hours.
Copyright 2008 PresentationFx.com
Redistribution Prohibited Image woodsy/sxc.hu
This text section may be deleted for
presentation.
43. SQL ON HADOOP
- Apache Hive is the SQL-on-Hadoop technology that
has been around the longest, and is probably the
most widely used. - The Hive Metastore can be leveraged by other
technologies such as Apache Drill. - The benefit in this case is that Drill can read
the metadata from Hive and then run the queries
itself. - Instead of depending upon the Hive MapReduce
runtime. This approach is significantly faster
and is one of the preferred ways of using Hive. - Now that you understand the background of SQL on
Hadoop, lets take a look at two technologies
that are gaining the most traction in this space
54. STREAM PROCESSING TECHNOLOGIES
- It seems these days that everyone wants their
stream processing framework to be the framework
used. - There are so many projects (free and paid) in
this space that it can make your head spin
Apache Flink, Spark Streaming, Apache Apex
(incubating), Apache Samza, Apache Storm, and
Akka Streams, as well as StreamSets - Apache Storm was once considered the leader in
this technology area. While it is true that the
use of Apache Storm is declining. - The Storm API will likely live a long time. It
has now been adopted by private code bases such
as Twitters Heron, and it is also supported by
Apache Flink. - Apache Beam is a rising star when it comes to
frameworks for both batch and streaming
data-parallel processing pipelines. It runs on
both Flink and Spark and is worth keeping an eye
on.
65. MESSAGING PLATFORMS
- While stream processing engines are hot,
messaging platforms are probably hotter. They can
be used to create scalable architectures and are
taking off like crazy across many organization - The top reason that the messaging platform model
is so important is that it can support huge
volumes of events. - Less than 10 years ago, people would get excited
about being able to handle 50,000 to 100,000
message events per second on a server. - The cost to scale this platform is very low,
which means a properly built application can
scale without re-architecting the entire
platform. - To perform data movement or having to enable
development and quality assurance teams to test
with production payloads. The value is
tremendous.
7CONVERGED ARCHITECTURAL APPROACH
- As you can see, there are a lot of technology
areas to keep an eye on. Be thoughtful about how
you leverage these new technologies. - They bring with them the ability to think
differently by simplifying business processes,
which can enable a business to directly integrate
analytics into core business functions. - Many of the technologies in the Hadoop ecosystem
are considered big data technologies. We provide
training for Hadoop technology. Dont hesitate
to contact us805627677
www.datawaretools.in/chennai/