Customers can now access production Spark clusters on Microsoft's cloud
Microsoft has announced general availability of Apache Spark support for the Azure HDInsight cloud-hosted service for big data analytics, bringing the advantages of the Spark platform for in-memory processing of data to better support real-time analysis.
Apache Spark for Azure HDInsight was announced as a public preview almost a year ago, but Microsoft claimed that it has proved so popular that it already makes up 50 per cent of HDInsight clusters deployed on the Azure platform.
Spark was developed by the Apache community to address shortcomings in the MapReduce framework often used in Hadoop deployments. It runs data analytics processes in-memory, and can deliver queries up to 100 times faster than traditional big data solutions.
Now, with general availability of Spark for Azure HDInsight, Microsoft is offering Apache Spark v1.6.1 along with improvements to the service to make it more enterprise-ready and easier for users to access, the firm said.
Azure HDInsight has been developed in partnership with Hadoop specialist Hortonworks and is based on the Hortonworks Data Platform. Microsoft has worked with Hortonworks to add high availability features to the platform via updates to the YARN resource manager.
Meanwhile, Microsoft has integrated Spark with the Azure Data Lake Store to enable Spark workloads to operate at greater scale. This cloud-based repository allows Spark to store and process data of almost any size, without forcing changes to a customer's application as data scales, the firm said.
Microsoft said it is also making it easier to use Spark for analytics workloads by integrating Apache Spark for Azure HDInsight with existing tools for data professionals.
"Our goal with big data is to make it accessible for everybody. With Spark for HDInsight, we have designed new productivity experiences for the different audiences that use Spark, including the data engineers working on ETL jobs, the data scientists performing experimentation and the business analysts creating dashboards," said Oliver Chiu, Microsoft's senior product marketing manager for big data and data warehousing, on the Azure blog.
This includes integration with the IntelliJ IDE that enables data engineers and developers to code with native authoring support for Scala and Java, out-of-the-box integration with Jupyter (iPython) notebooks for data scientists, and integration with Microsoft's Power BI to help business analysts build interactive visualisations of data, Microsoft said.
Aucun commentaire:
Enregistrer un commentaire