We use BigQuery to store data in a table and query it. Data storage can be either an internal native table or an external table where the external source will point to Google Cloud Storage or Google Drive.
Wherever we can have external storage, we can have a table built pointing to that external storage and query the tables. In BigQuery, we can query the table or even do DML operations, like insert, delete, etc.
The main thing I like about BigQuery is storage. We did an on-premise BigQuery migration with trillions of records. Usually, we have to deal with insufficient storage on-premises, but in BigQuery, we don't get that because it's like cloud storage, and we can have any number of records. That is one advantage.
The next major advantage is the column length. We have some limits on column length on-premises, like 10,000, and we have to design it based on that. However, with BigQuery, we don't need to design the column length at all. It will expand or shrink based on the records it's getting.
I can give you a real-life example based on our migration from on-premises to GCP. There was a dimension table with a general number of records, and when we queried that on-premises, like in Apache Spark or Teradata, it took around half an hour to get those records. In BigQuery, it was instant. As it's very fast, you can get it in two or three minutes. That was very helpful for our engineers.
Usually, we have to run a query on-premises and go for a break while waiting for that query to give us the results. It's not the case with BigQuery because it instantly provides results when we run it. So, that makes the work fast, it helps a lot, and it helps save a lot of time.
It also has a reasonable performance rate and smart tuning. Suppose we need to perform some joins, BigQuery has a smart tuning option, and it'll tune itself and tell us the best way a query can be done in the backend.
To be frank, the performance, reliability, and everything else have improved, even the downtime. Usually, on-premise servers have some downtime, but as BigQuery is multiregional, we have storage in three different locations. So, downtime is also not getting impacted.
For example, if the Atlantic ocean location has some downtime, or the server is down, we can use data that is stored in Africa or somewhere else. We have three or four storage locations, and that's the main advantage.
It would be better if BigQuery didn't have huge restrictions. For example, when we migrate from on-premises to on-premise, the data which handles all ebook characters can be handled on-premise. But in BigQuery, we have huge restrictions. If we have some symbols, like a hash or other special characters, it won't accept them. Not in all cases, but it won't accept a few special characters, and when we migrate, we get errors.
We need to use Regexp or something similar to replace that with another character. This isn't expected from a high-range technology like BigQuery. It has to adapt all products. For instance, if we have a TV Showroom, the TV symbol will be there in the shop name. Teradata and Apache Spark accept this, but BigQuery won't. This is the primary concern that we had.
In the next release, it would be better if the query on the external table also had cache. Right now, we are using a GCS bucket, and in the native table, we have cache. For example, if we query the same table, it won't cost because it will try to fetch the records from the cached result. But when we run queries on the external table a number of times, it won't be cached. That's a major drawback of BigQuery. Only the native table has the cache option, and the external table doesn't. If there is an option to have an external table for cache purposes, it'll be a significant advantage for our organization.
I have been using BigQuery for more than three years.
BigQuery is a stable solution.
BigQuery is highly scalable. We can have unlimited storage if we do 20 records, and It's very fast. Even if we scale it to 20 trillion, it will still be fast.
In my organization, about two in five use BigQuery. When I joined the company a year back, usage was relatively moderate. However, now usage increased because of the on-premise to GCP migration. Because of many successful projects, several people are using BigQuery now.
We have dedicated support people who help us with the framework. If there is a technical issue in BigQuery, we just get help from the technical team. But if there are any engineering issues or some data issues, our team will handle them.
I use Teradata and then Apache Spark on-premises.
The initial setup is relatively straightforward. There are some restrictions, like the project's name. It has to be unique, but once that project is created, we can simply go to an option, query, and the query control will open, and we can start creating a table, loading data, querying, and everything. So that's quite simple and straightforward.
When I joined PayPal, the setup was done in-house. When I worked at another organization, Cognizant, we had Google's help. So a Google specialist helped us set up and everything.
I have tried my own setup using my Gmail ID, and I think it had a $300 limit for free for a new user. That's what Google is offering, and we can register and create a project.
On a scale from one to ten, I would give BigQuery an eight.