Process Large Data from Excel File with a Breeze
1. Overview
Doing bulk upload of data from excel files reliably can be a nut-job at times. This is how to tame the beast with a simple JavaScript library.
Instead of first uploading the entire file to the server and then processing it, this library will save you some bandwidth by processing the file on the local machine and uploading the extracted data (which is what you actually want) to the server in a batch.
Because it does the upload in batches, issues such as connection timeout are taken care of. In addition to this, it reports the data that cause an exception on the server and make them available for download as an Excel file.
2. The Problem
I need to do a bulk upload of user records from an excel file ranging from 3000 to 16000 rows. Each excel record (a row) will be used to create a user account in the system. This involves trips to more than one database tables, sending email, data validation and one or more loops.
The first technique that comes to mind is to look for an Excel library, upload the excel file to the server, process the excel file and finally return success or error message — all in one HTTP Request!
The shortcomings of this method are:
- HTTP Request timeout, especially with programming languages that are not multithreaded
- Unreliable error reporting
and so on.
3. The Solution
What I did is to extract the data with JavaScript on the client browser and push these data to the server in batches.
On the server-side of things, the data received is processed in a try ... catch
block and data that raise an exception
is stored in an error array. Once, the server is done processing the current batch it sends an OK response to the client,
and alongside, the error data — if any.
One other technique that is key to this process is Database Transaction Management. All database interactions for a particular record is done in the confines of a DB transaction - on the server side. If an error occurred during database interaction, the transaction is rolled back otherwise it is committed.
In programming, there is usually more than one way to achieve a feat, each with its own shortcomings. However, this technique is reliable in terms of error reporting and HTTP connection management.
4. Using The excel_uploader.js JavaScript Library
I implemented the solution above in a simple JavaScript file called excel_uploader.js
.
To use the JavaScript file in a project: we need to get the library from its Github repo
and include it in the project alongside jQuery, xlsx.min.js
, FileSaver.js
and optionally bootstrap 4 and sweetalert.js
:
|
|
Note that Blob.js is required on older browsers for FileSaver.js to work.
Next, we’ll add this to the HTML page we want the excel upload to take place:
|
|
From the snippet above, we instantiate the ExcelUploader
object with configuration parameters.
Following is an explanation of each parameter and what it does.
maxInAGroup: This parameter controls how many records to be sent to the server per time. The default is 1000 so if there are 5000 records in the source excel file, the data will be uploaded to the server in 5 batches. It can be fine-tuned to our taste base on available resources.
serverColumnNames: This is an array of the columns the server is expecting. Every time excel_uploader pushes data to the server, it also sends a column_map.
The column map is generated by using the lowercase form of the elements in serverColumnNames
as the key and the index of the corresponding excel column as value.
For example, if the server is expecting the uploaded excel data to contain 3 columns — Name
, Email
and Phone Number
.
The user will first upload an excel file, then map which column represents what data in the Excel.
Then excel_uploader will compose the server column names using the lowercase of the serverColumnNames
,
replacing white spaces with an underscore (_) character.
So the server at the end will receive a JSON payload like this:
|
|
Which can be interpreted as the column name is at index zero (0) of the data, email is at index two (2) and phone_number is at index 1.
importTypeSelector: This is the jQuery selector for the HTML select field. The value of the Select options is the actual URL to send the excel data to if that option is selected. This way it’s possible to use the same page for multiple uploads. Just add the destination URL as an option in the Import Type Selector.
fileChooserSelector: This is the jQuery selector for the HTML file input field. For actually choosing the excel file from local storage by the user.
outputSelector: This is the jQuery selector for the HTML div
that is used for displaying progress, status and error messages.
extraData: This is the data to be sent to the server alongside the uploaded data and column mapping. For example, CSRF token. It is an optional parameter.
Now that we have looked at the options, we must have noticed we need to add some markups to our web page for this to work.
Hence the need for the following HTML:
|
|
The complete HTML can be found on the demo.html page.
NOTE: The respective Ids of the HTML elements were used to configure the ExcelUploader
object in the JavaScript snippet above.
That’s all we need to do on the front-end. Simple isn’t it?
5. A Little Bit of Server-Side Processing
excel_uploader.js library is server-side agnostic! It will work with almost any server-side technology.
It only needs to know the URL to push data via the importTypeSelector
and that’s all.
It will push data to the server in a POST
request using jQuery ajax. The POST
request payload will contain two compulsory entries:
data
and column_map
.
data: This an array of an array - representing the data from the Excel file uploaded by the user. It is in JSON format and should be decoded to a key-value data structure such as an associative array in PHP or a Map in Java or its equivalent in other server-side programming languages.
The outer array represents the whole, while the inner array represents the data on each row of the Excel file.
Hence the number of the length of data[0]
will is the number of columns mapped during upload while the length of data array itself
is the total number of rows/records available in the uploaded file.
This is an example data received in a post request:
|
|
5.1. Accessing the Data
We understood, from the previous section, that the column_map
associate column names to their corresponding index in the arrays contained in data.
We can, therefore, access the data by parsing the JSON payload to a key-value data structure like an associative array (PHP) or a Map (Java).
In PHP CodeIgniter, we can do this:
|
|
If you noticed, from the snippet above, we did not have to memorize the corresponding index of each column
in the data
array. We simply use the column_map
which is much easier.
The same thing is applicable if the server-side is using Java programming language in a framework like Spring Boot:
|
|
The complete demo code for PHP and Java can be found here.
5.2. Processing the Data
One of the best ways to handle the incoming POST
data is to have two methods (or functions).
One in which the POST
payload will be decoded and the data array will be looped over let’s call it uploadUserData()
.
In this function, there will be an $errorArray
variable which itself must be an array of array.
The data array, decoded from the POST
payload, will be looped over and each element of that
data (corresponding to a row on the excel file) will be processed by another method doUserUpload()
.
The method doUserUpload
will perform database interactions in a try...catch
block wrapped in a database transaction.
If any of the database interaction fails, it will rollback the database changes and return false
otherwise it returns true
.
If doUserUpload
returns false
, the particular record/row under processing will be added to $errorArray
Example implementation of uploadUserData()
in PHP:
|
|
The complete example server-side processing can be found in demo.php. The corresponding Java demo can also be found here.
5.3. Server Response
The key step to this flow is the server response. The server is expected to respond in JSON and the content-type
should be application/json
.
If any of the uploaded data result in an error while processing, and it’s added to the errorArray
,
the resulting error Array should be sent to the excel_uploader.js
, otherwise, a success message should be sent:
|
|
Or in Java:
|
|
If we noticed an invalid data is being sent and we’ll like to notify the user,
we can send a JSON response with an error entry and a 200
status code - because we actually handled the error ourselves:
|
|
Note: the JSON response should have a STATUS code of 200
for excel_uploader.js
to be able to process it.
If there were any other server-generated errors, they will be handled in the fails()
callback of the ajax method.
6. The Soup is Ready!
With just a little bit of JavaScript, a sprinkle of config and HTML, we are good to go.
The good thing is that we can use the same page to upload data to many endpoints even with an additional payload.
Just add the URL as another option in the importTypeSelector
and we are good.
I want to hear about your experiences integrating this simple tool into your project. If you find any bug, kindly report it to Github.
Github: [https://github.com/SeunMatt/excel_uploader](Github: https://github.com/SeunMatt/excel_uploader)