I am building a web app that has this process.
1) User registers
2) After user registers, i am running a queuing process that scrapes 60k+ worth of customer data. These data came from a 3rd party API and I use curl in doing this.
3) After I scrape these data, I store it in the database.
4) These scraped data from the 3rd party api has a pagination, so what I do is that I checked the response of the API if it has another page (nextPageUrl
) and if it has that response, I curl
again then get all the customer data and store it again. This continues until there's no nextPageUrl
from the api response.
//this is a pseudo code
RegisterUser(user);
CallThirdPartyAPI()
function RegisterUser(user){
insert_in_users_table(user)
}
function CallThirdPartyAPI($url=null){
$customers = get_all_customers();
for($customer as $cust){
store_in_customers_table();
if($cust->response_has_next_page_url)
CallThirdayPartyAPI($url);
else
return false;
}
}
Now as you can see, this is ok if I only have 1 user at a time registering in my web app. But as I have a 100+ users registering in my web app, this is becoming a problem because scraping of data takes 20-30 minutes to be finished and I am running the job queue of only having 2 jobs at a time. So basically the 2 jobs needs to be done in order for the other jobs to be executed.
Now, i am looking for a better solution that would enhance and make the system efficient.
Your suggestion will be greatly appreciated.
PS:
I am running job queuing through supervisor
I have a read replica implemented in my database. I write in the master db while read on the replica to lessen cpu usage of my db.
via PinoyStackOverflower