Image from web
During the last few years, cloud computing has become an emerging field in the IT industry. Numerous cloud providers offer competing computation, storage, network, and application hosting services while providing coverage in several continents promising best on-demand prices and performance.
According to Gartner’s prediction, worldwide end-user spending on public cloud services is forecast to grow 20.4% in 2022 to total $494.7 billion, up from $410.9 billion in 2021. In 2023, end-user spending is expected to reach nearly $600 billion.
Today Google Cloud Platform (GCP) has become one of the most important and growing platforms in the cloud market. It provides developers with several products to build a range of programs, from simple websites to complex worldwide distributed applications. This outstanding reliability results in GCP being adopted by eminent organizations such as Airbus, Coca-Cola, HTC, Spotify, etc.
The on-demand accessibility of computing power, storage, applications, and other IT resources through a cloud services platform via the internet with pay-as-you-go pricing is called Cloud Computing. Cloud computing does it without direct active management by the user. It makes use of remote servers or large clouds to store, manage and process data rather than a local server or your personal computer.
Cloud-computing service providers offer solutions according to different models. The three standard models as per NIST (National Institute of Standards and Technology) are:
• Infrastructure as a Service (IaaS)
• Platform as a Service (PaaS)
• Software as a Service (SaaS)
Google Cloud Platform is a set of cloud computing services such as: Storage, Hosting, Computing, Networking, Machine Learning, and Management Services. These services provided by Google run on the same Cloud infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, Google Photos, and YouTube.
GCP offers a wide range of solutions in the data engineering pipeline. The figure below shows the steps involved in the data engineering pipeline.
Figure 1. Data Engineering pipeline
Ingestion means gathering data from multiple sources. This data could be of any type - structured, semi-structured, or un-structured data.
Batch processing and streaming are two ways data can be ingested in GCP. If one has batch data, one can use google cloud storage and use different data transfer services that are available in the GCP environment such as storage transfer services, BigQuery data transfer service, and transfer appliances.
For streaming data, PubSub is an available choice in Google Cloud Environment.
For storing a large amount of data GCP offers various cost-effective solutions that are secured, durable, scalable, offer high availability, and optimized. For different types of data formats GCP has various storage solutions to offer.
GCP offers tools through which we can process and analyze our data. GCP offers, writing complex queries to write simple one-line functions to process and analyze the data. Questions that can be answered in this step of the process and analyzing data include:
• What kind of outcome one may want?
• What type of analysis one may want to perform?
• Converting data into meaning, etc.
To visualize the results that are obtained from the process and analyze the steps in data engineering pipeline, GCP offers solutions through which we can create dashboards, charts, reports, etc.
The below figure shows tools that are available in GCP for four phases of the data engineering pipeline.
Figure 2. Tools available in GCP
Google offers a wide range of cloud platform services. Following are the services that Google has to offer for various objectives.
Storage domain includes services related to data storage. It includes the following:
• Google Cloud Storage
• Cloud SQL
• Cloud Bigtable
• Cloud Spanner
• BigTable
• BigQuery
• Persistent Disk
GCP offers a scalable array of computing options you can customize to match your needs. It provides highly customizable virtual machines and engines to deploy your code directly or via containers.
• Google Compute Engine
• Google App Engine
• Google Kubernetes Engine
• Google Cloud Container Registry
• Cloud Functions
GCP offers a broad range of networking services that leverages automation, advanced AI, and programmability, enabling enterprises to connect, scale, secure, modernize and optimize their infrastructure.
• Google Virtual Private Cloud (VPC)
• Google Cloud Load Balancing
• Content Delivery Network
• Google Cloud Interconnect
• Google Cloud DNS
Various data processing services offered by GCP include the following.
• Dataproc
• Dataflow
• Dataprep
• Data Catalog
• BigQuery
• Data fusion
• PubSub
• Workflow Composer
The GCP offers a wide range of AI and ML services that are pre-built or can be built from scratch, it includes the following services.
• Vertex AI
• Prebuilt API
• Custom Model
• Auto ML
The IAM includes services related to security, it includes the following:
• Built-in roles
• Custom roles
• Service account
• Permissions
The Developer domain includes services related to development; it includes the following services.
• Cloud SDK
• Deployment Manager
• Cloud Source Repositories
• Cloud Test Lab
• Cloud Source Repositories
• Cloud Test Lab
In a data driven world, the Google Cloud Platform can offer a wide range of solutions that can help organizations in attaining their business needs and fulfilling their objectives. The various solutions offered by GCP are secured, durable, scalable, offers high availability, and are optimized.