Distributed Systems (Part -2)

Pronay Ghosh
Accredian
Published in
5 min readApr 29, 2022

--

by Pronay Ghosh and Hiren Rupchandani

  • In the previous article, we had a high-level overview of distributed systems.
  • We found why data is so important.
  • We learned that Big data is a large amount of diversified information that is arriving in ever-increasing volumes and at ever-increasing speeds.
  • Big data can be structured (typically numerical, readily formatted, and saved) or unstructured (often non-numerical, difficult to format and store) (more free-form, less quantifiable).
  • Big data analysis may benefit nearly every function in a company, but dealing with the clutter and noise can be difficult.
  • Big data can be gathered willingly through personal devices and apps, through questionnaires, product purchases, and electronic check-ins, as well as publicly published remarks on social networks and websites.
  • Big data is frequently kept in computer databases and examined with software intended to deal with huge, complicated data sets.
  • Lastly, we learned about the challenges with Big data.
  • In this article, we will learn about how does big data works, the uses of big data, and its advantage-disadvantages.

What Makes Big Data Work?

  • Unstructured and structured big data are two types of big data.
  • Structured data is information that has already been stored in databases and spreadsheets by the company, and it is typically numeric.
  • Unstructured data is unorganized data that does not fit into a predetermined model or format.
  • It includes information gleaned from social media sources that aid organizations in gathering information on customer demands.
  • Big data has publicly published remarks on social networks and websites.
  • The inclusion of sensors and other inputs in smart devices enables data to be collected in a wide range of settings and conditions.
  • Big data is frequently kept in computer databases and examined with software intended to deal with huge, complicated data sets.
  • Many software-as-a-service (SaaS) firms specialize in handling this kind of complicated data.

Big Data Applications

  • To assess whether a correlation exists, data analysts examine the link between several types of data, such as demographic data and purchasing history.
  • Such evaluations can be done in-house or by a third party that specializes in converting huge data into understandable representations.
  • Businesses frequently resort to such professionals to analyze huge data and turn it into usable information.
  • Data analysis findings can be used by nearly every department in a company, from human resources and technology to marketing and sales.
  • The purpose of big data is to speed up the time it takes for products to reach the market, minimize the time and resources needed to obtain market adoption, and target audiences, and keep customers happy.

The Benefits and Drawbacks of Big Data

  • The growing amount of data available creates both benefits and challenges.
  • In principle, having more data about consumers (and future customers) should allow businesses to better personalize products and marketing efforts to ensure customer happiness and repeat business.
  • For the benefit of all stakeholders, companies that collect a huge amount of data are allowed to undertake deeper and richer analyses.
  • While improved analysis is a good thing, huge data can also lead to overload and noise, which reduces its use.
  • Larger volumes of data must be handled, and it must be determined which data reflects signals against noise.
  • A critical issue is determining what makes the data relevant.
  • Furthermore, the data’s type and format may necessitate particular treatment before it can be used.
  • Structured data, which is made up of numeric values, is simple to store and sort.
  • Unstructured data, such as emails, movies, and text documents, may necessitate the employment of more advanced techniques before becoming helpful.

The Possible Solutions: Scaling Up VS Scaling Out

  • Modern applications are always changing, evolving to meet new objectives, and they operate in an environment with shifting resource demands.

Scaling an application allows it to be suitably sized to resource demands, resulting in satisfied consumers and lower infrastructure costs.

  • You’re not just doing your application a disservice if you don’t know how to scale effectively; you’re also putting unneeded strain on your operations team.
  • Trying to figure out when to scale up or down by hand is tough.
  • If you buy more infrastructure to handle high traffic, you may end up overspending when demand is low.
  • If you target your average load, spikes in traffic will influence your application’s performance, and these resources will go unused when traffic lowers.
  • It may be required to enhance infrastructure to handle the increasing load when your cloud workload changes, or it may make sense to reduce infrastructure when demand is low.
  • The “up or out” portion is perhaps a little less obvious.
  • To spread out a load, scaling out means adding more functionally comparable components in parallel.
  • This would entail increasing the number of load-balanced web server instances from two to three.

Scaling up is increasing the size or speed of a component in order to handle a higher load.

  • This would include switching from a virtual server (VM) with two CPUs to one with three.

Scaling down means reducing your system’s resources, regardless of whether you used the up or out strategy.

Conclusion:

  • So far in this article, we covered how does big data works, the uses of big data, and its advantage-disadvantages.
  • In the next article, we will learn in-depth solutions for Data Explosion using Hadoop.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website. If you liked this story, I recommend you to go with the Global Certificate in Data Science & AI because this one will cover your foundations, machine learning algorithms, and deep neural networks (basic to advance).

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.

--

--

Pronay Ghosh
Accredian

Data Scientist at Aidetic | Former Data Science researcher at The International School of AI and Data Science