Process Large Data from Excel File with a Breeze
Instead of first uploading the entire file to the server and then processing it, this library will save you some bandwidth by processing the file on the local machine and uploading the extracted data (which is what you actually want) to the server in a batch.
Because it does the upload in batches, issues such as connection timeout are taken care of. In addition to this, it reports the data that cause an exception on the server and make them available for download as an Excel file.
2. The Problem
I need to do a bulk upload of user records from an excel file ranging from 3000 to 16000 rows. Each excel record (a row) will be used to create a user account in the system. This involves trips to more than one database tables, sending email, data validation and one or more loops.
The first technique that comes to mind is to look for an Excel library, upload the excel file to the server, process the excel file and finally return success or error message — all in one HTTP Request!
The shortcomings of this method are:
- HTTP Request timeout, especially with programming languages that are not multithreaded
- Unreliable error reporting
and so on.
3. The Solution
On the server-side of things, the data received is processed in a
try ... catch block and data that raise an exception
is stored in an error array. Once, the server is done processing the current batch it sends an OK response to the client,
and alongside, the error data — if any.
One other technique that is key to this process is Database Transaction Management. All database interactions for a particular record is done in the confines of a DB transaction - on the server side. If an error occurred during database interaction, the transaction is rolled back otherwise it is committed.
In programming, there is usually more than one way to achieve a feat, each with its own shortcomings. However, this technique is reliable in terms of error reporting and HTTP connection management.
and include it in the project alongside jQuery,
FileSaver.js and optionally bootstrap 4 and
Note that Blob.js is required on older browsers for FileSaver.js to work.
Next, we’ll add this to the HTML page we want the excel upload to take place:
From the snippet above, we instantiate the
ExcelUploader object with configuration parameters.
Following is an explanation of each parameter and what it does.
maxInAGroup: This parameter controls how many records to be sent to the server per time. The default is 1000 so if there are 5000 records in the source excel file, the data will be uploaded to the server in 5 batches. It can be fine-tuned to our taste base on available resources.
serverColumnNames: This is an array of the columns the server is expecting. Every time excel_uploader pushes data to the server, it also sends a column_map.
The column map is generated by using the lowercase form of the elements in
as the key and the index of the corresponding excel column as value.
For example, if the server is expecting the uploaded excel data to contain 3 columns —
The user will first upload an excel file, then map which column represents what data in the Excel.
Then excel_uploader will compose the server column names using the lowercase of the
replacing white spaces with an underscore (_) character.
So the server at the end will receive a JSON payload like this:
Which can be interpreted as the column name is at index zero (0) of the data, email is at index two (2) and phone_number is at index 1.
importTypeSelector: This is the jQuery selector for the HTML select field. The value of the Select options is the actual URL to send the excel data to if that option is selected. This way it’s possible to use the same page for multiple uploads. Just add the destination URL as an option in the Import Type Selector.
fileChooserSelector: This is the jQuery selector for the HTML file input field. For actually choosing the excel file from local storage by the user.
outputSelector: This is the jQuery selector for the HTML
div that is used for displaying progress, status and error messages.
extraData: This is the data to be sent to the server alongside the uploaded data and column mapping. For example, CSRF token. It is an optional parameter.
Now that we have looked at the options, we must have noticed we need to add some markups to our web page for this to work.
Hence the need for the following HTML:
The complete HTML can be found on the demo.html page.
NOTE: The respective Ids of the HTML elements were used to configure the
That’s all we need to do on the front-end. Simple isn’t it?
5. A Little Bit of Server-Side Processing
excel_uploader.js library is server-side agnostic! It will work with almost any server-side technology.
It only needs to know the URL to push data via the
importTypeSelector and that’s all.
It will push data to the server in a
POST request using jQuery ajax. The
POST request payload will contain two compulsory entries:
data: This an array of an array - representing the data from the Excel file uploaded by the user. It is in JSON format and should be decoded to a key-value data structure such as an associative array in PHP or a Map in Java or its equivalent in other server-side programming languages.
The outer array represents the whole, while the inner array represents the data on each row of the Excel file.
Hence the number of the length of
data will is the number of columns mapped during upload while the length of data array itself
is the total number of rows/records available in the uploaded file.
This is an example data received in a post request:
5.1. Accessing the Data
We understood, from the previous section, that the
column_map associate column names to their corresponding index in the arrays contained in data.
We can, therefore, access the data by parsing the JSON payload to a key-value data structure like an associative array (PHP) or a Map (Java).
In PHP CodeIgniter, we can do this:
If you noticed, from the snippet above, we did not have to memorize the corresponding index of each column
data array. We simply use the
column_map which is much easier.
The same thing is applicable if the server-side is using Java programming language in a framework like Spring Boot:
5.2. Processing the Data
One of the best ways to handle the incoming
POST data is to have two methods (or functions).
One in which the
POST payload will be decoded and the data array will be looped over let’s call it
In this function, there will be an
$errorArray variable which itself must be an array of array.
The data array, decoded from the
POST payload, will be looped over and each element of that
data (corresponding to a row on the excel file) will be processed by another method
doUserUpload will perform database interactions in a
try...catch block wrapped in a database transaction.
If any of the database interaction fails, it will rollback the database changes and return
false otherwise it returns
false, the particular record/row under processing will be added to
Example implementation of
uploadUserData() in PHP:
5.3. Server Response
The key step to this flow is the server response. The server is expected to respond in JSON and the
content-type should be
If any of the uploaded data result in an error while processing, and it’s added to the
the resulting error Array should be sent to the
excel_uploader.js, otherwise, a success message should be sent:
Or in Java:
If we noticed an invalid data is being sent and we’ll like to notify the user,
we can send a JSON response with an error entry and a
200 status code - because we actually handled the error ourselves:
Note: the JSON response should have a STATUS code of
excel_uploader.js to be able to process it.
If there were any other server-generated errors, they will be handled in the
fails() callback of the ajax method.
6. The Soup is Ready!
The good thing is that we can use the same page to upload data to many endpoints even with an additional payload.
Just add the URL as another option in the
importTypeSelector and we are good.
I want to hear about your experiences integrating this simple tool into your project. If you find any bug, kindly report it to Github.
Github: [https://github.com/SeunMatt/excel_uploader](Github: https://github.com/SeunMatt/excel_uploader)