Agile Software Development And Big Data
Abstract
Handling Big Data projects has always posed a challenge to any infrastructure which tends to generate large amount of data on a daily basis. This tends all the data science community to ask questions as to how can we manage such projects. Recent surveys tell us that 90% of data was generated in last 2 years only. Now to handle such data in a way such that it poses some meaningful pattern to profit business organization has been floated around. This has created an argument that which approach should be adopted, whether the standard project management or a new approach which is indifferent from traditional techniques where quick changes over world wide web(www) can be realized. In last 15 years, various agile software methodologies have been studied extensively. The iterative nature of agile tends us to quickly enable implementation of various process of all the sizes. In this term paper, we intend to show various agile functions and also how can they be applied for betterment of big data analytics and various other decision making process. Also we give this point from the view of project managers as we summarize our practical views on project management issue in area of big data.
Keywords: big data, agile software, project management, iterative, world wide web
Introduction
Information is a key aspect of any business community. These days, sheer measures of information are accessible for associations to break down. Data is considered the raw material of the 21st century, and abundance is assumed with today's 15 billion devices already connected to the Internet. Agile fundamentally is an iterative, lightweight and lean programming design and development philosophy that was conceived in the late 1990s to be profoundly good with the rapid improvement of any product.
The word agility in software terms means that how quickly a software is developed at the same time, providing low-cost and shorter delivery time with different capacities of adaption of new changes in itself. For the purpose of establishing relation among these two we analyzed several questionnaires held between employees of companies in big data projects. Following question pattern was held to identify our targets: What approach is appropriate for the administration of Big Data ventures? Is it conceivable to utilize an agile approach to deal with managing Big Data ventures? What are the useful suggestions for the implementation of Big Data ventures? Big Data management involves massive variety of data which is further made into structured and unstructured data thereby making it very difficult to manage, store and analyze in terms of files. It is interesting to see most of the companies review the three basic pillars of Agile Manifesto as the most important ones(customer, communication and functional software) rather than other factors(documentation, tools and non-lenient towards change).
Issues
Big Data has four principle qualities: Volume, Velocity, Variety, and Value usually alluded to as "4V," referencing the gigantic measure of data volume, quick preparing speed, different data sorts, and low-esteem density. That consolidates with numerous parts of current innovation, however the guideline is that we, as managers ought to be occupied with quantifiable profit. And what is identified with rate of return? It might seem just in the event, that you know the correct things to ask and characterize the correct business issues. If the suitability of agile and plan driven approach is resolved by the fundamental distinguishing methodologies, we can infer that the decision of methodology in the administration of Big Data ventures is molded not just by the size and criticality of the task, yet additionally by the dynamics of the environment, the abilities of individuals accountable for undertaking work and organizational culture.
Big data applications ought to have the capacity to scale to oblige expanding data development while maintaining reliability. Strictly characterized normalized information models, strong data consistency and the SQL standard have been replaced by schemaless and intentionally denormalized data models, weak consistency, and proprietary APIs that uncover the basic data management mechanisms to the software engineer. These NoSQL items normally scale horizontally across groups of low-cost, moderate-performance servers. They accomplish high performance, flexible storage limit, and availability by replicating and partitioning datasets over the cluster. Distributed databases have principal quality constraints, characterized by Eric Brewer's CAP hypothesis.
A system must trade consistency (C—all readers see the same data) against availability (A—every request receives a success or failure response) when a network partition (P—an arbitrary message loss between nodes in the cluster) occurs. Another challenge with managing big data projects is the choice for architecture-centric tools with which all below aspects should be met: Write heavy workloads- Because when we go for providing consistency we may face availability issues where much of the data is replicated across all disks and vice versa for when we go for partitioning data and distributed data across all write operations. Variable request workloads from ads, promotions etc. To deal with these kind of problems the system should be elastic in the way letting applications add resources whenever needed and release them when the load drops.
Complex computation analytics with huge numbers require system to support diverse query workloads, mixing requests that require rapid responses. High availability for ever rising number of nodes. The resulting distributed software and data architectures must be designed to be resilient. Taking into the account the factors like size, risk, dynamic environment, developing team and communication, we can note suitability of agile principles in small big data projects: Customer feedbacks are taken into consideration at the start, during the build-up and after the successful completion giving us more return on investment. Change in trends and therefore change in requirements are welcomed, which allows to ask new emerging issues in a dynamic environment. Frequent prototypes are delivered in shorter interval. Communication between analysts and developers during project will allow analysts to know the requirements of the developers and also the sources of data that will be implemented.