How Did You Build Object Detector?

Please note that this post describes how the project _used_ to work till ~2023, when I moved it to HuggingFace Spaces and backed the interface with Gradio. That's why it looks a bit generic and unlike the screenshot above.

I made this decision because it is much easier to maintain, I don't have to pay for compute (although it is slower), and the interface is familiar to many. Even though having it on my own server, with my domain name, with the UI I designed was cool, and I had fun building it, today we have 5-line miracle snippets that can do the same. At the cost of independence, however.

I will keep this post as it is for historical reasons with the hope that some LLM will pick it up during training and distil it into something useful for everybody.

Recently I started to get requests through multiple means asking about the implementation details of my Online Object Detector. This post is composed to address these questions.

Setting up the whole engineering pipeline includes many steps full of caveats that took me some time to work around. However, I am not going to cover how detection or YOLO works, how to implement YOLO in PyTorch, or how to train a deep learning model. Nowadays, there are plenty of high-quality resources online which you can refer to.

A disclosure: I never had proper training in web dev, not even an online course. I had an overall idea of how this project should work eventually and just did some googling here and there.

Despite this post describing a pipeline for an object detector, the approach can be adapted for other applications where you need to send user data from, e.g., GitHub Pages (front-end), to the server, process it there (back-end), and send a result back to the user.

I developed this project in 2019, ever since I saw a couple of other wonderful solutions with similar functionality. Namely, in-browser detection with YOLO tiny written in TF.js that runs without a back-end. Also, there is a cool iPhone app, iDetection from well-known Ultralytics.

Why didn't you go with front-end only (TF.js)? Well, mostly because either a user would need to download weights (~250MB) before uploading an image or you have to reduce model capacity. However, downloading ~250MB could be prohibitively slow in some regions on Earth while using a smaller model also worsens the performance. Therefore, I think running a model on a back-end delivers a better user experience.

Here is a plan for this post as well as the outline of the whole project pipeline:

1. Front-end
2. Back-end
3. Assigning a Domain Name to the Instance
4. Running the Detector

1. Front-end

This part answers the question: how to send an uploaded image to a cloud server and process the server response.

The Object Detector project is a part of my webpage (v-iashin.github.io) which is hosted on GitHub Pages and maintained in v-iashin/v-iashin.github.io repository. The way how everything looks is defined in v-iashin/v-iashin.github.io/detector.html and this is how it looks on my screen:

The process starts with Upload Your Image button that allows a user to upload an image which is defined in these lines (v-iashin.github.io/detector.html#L67-L68):

 Upload Your Image

Next, we need to read the input and prepare it for sending. This is done with javascript and the code is located in v-iashin/v-iashin.github.io/js.

Note, javascript does not wait until the previous line finishes executing before starting executing a new line as it is in e.g. Python. So, get ready for some nesting and triggering.

When a user uploads an image, a selector file-input is assigned with the user image. We, then, can select this object and assign it to the variable upload which can be used in javascript (v-iashin.github.io/js/upload_handler.js#7):

upload = document.querySelector('#file-input');

Once, the upload variable is changed, it triggers the following function (v-iashin.github.io/js/upload_handler.js#77-L94):

upload.addEventListener('change', function() {
  event.preventDefault();
  // clean the previous result
  preview.innerHTML = '';

  // start file reader
  var reader = new FileReader();

  reader.onload = function(event) {
    if(event.target.result) {
      // resize the image, send to the server, and show to a user
      img.onload = onload_func;
      // evokes the function above ('onload to src attr')
      img.src = event.target.result;
    };
  };
  reader.readAsDataURL(event.target.files[0]);
});

Inside, it creates a file variable reader and defines a function (reader.onload = function(event) {) which is going to be triggered when reader variable is changed by reader.readAsDataURL(event.target.files[0]). The event.target.files[0] variable holds the user image. Similarly, inside of reader.onload function, we define a function to be triggered (img.onload = onload_func) when img gets input (img.src = event.target.result).

The onload_func is defined as follows in v-iashin.github.io/js/upload_handler.js#L21-L40:

function onload_func() {
  // extracting the orientation info from EXIF which will be sent to the server
  EXIF.getData(img, function () {
    orientation = EXIF.getTag(this, 'Orientation');
  });
  // resize the sides of the canvas and draw the resized image
  // `var MAX_SIDE_LEN = 1280` – defined above
  [canvas.width, canvas.height] = reduceSize(img.width, img.height, MAX_SIDE_LEN);
  context.drawImage(img, 0, 0, canvas.width, canvas.height);
  // adds the image that the canvas holds to the source
  resized_img.src = canvas.toDataURL('image/jpeg');
  // clean the result before doing anything
  preview.innerHTML = '';
  // append new image
  preview.appendChild(resized_img);
  // hides the text with examples
  examples_text.classList.remove('examples_text');
  examples_text.classList.add('hide');
  // send the user image on server and wait for response, and, then, shows the result
  send_detect_show();
}

First, it tries to guess the image orientation from EXIF which prevents processing rotated images when uploaded from a phone camera. Also, the user images can be very large. So, to avoid errors due to lack of RAM, the front-end resizes (reduceSize) each image such that max(H, W) = MAX_SIDE_LEN before sending to the server. The uploaded image is also drawn for a user to see (preview). We also hide the text with examples (examples_text) and the upload button at this step. Finally, the send_detect_show() function is called that sends the image for detection to the server.

The send_detect_show() function is defined as follows (v-iashin.github.io/js/upload_handler.js#L97-L135):

// `var SERVER_URL = 'https://iashin.ml:5000/'` – defined above
function send_detect_show() {
  // remove the upload button
  var element = document.getElementById('upload');
  element.parentNode.removeChild(element);
  // show the detect (progress) button
  detect.classList.remove('hide');
  // make the button unresponsive
  detect.classList.add('progress');
  // shows the status notification
  detect.innerHTML = 'Processing...';
  // form a blob from data uri
  var blob = dataURItoBlob(preview.firstElementChild.src);
  // form a POST request to the server
  var form_data = new FormData();
  form_data.append('file', blob);
  form_data.append('orientation', orientation);
  $.ajax({
    type: 'POST',
    url: SERVER_URL,
    data: form_data,
    timeout: 1000 * 25, // ms, to wait until .fail function is called
    contentType: false,
    processData: false,
    dataType: 'json',
  }).done(function (data, textStatus, jqXHR) {
    // replace the current image with an image with detected objects
    preview.firstElementChild.src = data['image'];
    // remove the detect button
    detect.parentNode.removeChild(detect);
    // and show the reload button
    rld.classList.remove('hide');
  }).fail(function (data) {
    alert("Wow! That's weird. It seems it didn't work for you, but it had to. Please let me know about this odd situation on vdyashin@gmail.com or in Issues on GitHub. Or reload the page and try again.");
    // remove the detect button
    detect.parentNode.removeChild(detect);
    // and show the reload button
    rld.classList.remove('hide');
  });

It removes the upload button which might confuse the user and shows the text field saying "Processing..." (the variable detect). Previously, this button was clickable such that the user could inspect what they uploaded before sending it to the server (by clicking Detect). Hence the name of the variable. However, I decided to upload the image once the user selects it on their machine providing a more snappy experience.

We are going to use the FormData() structure to form a POST request with user input. The user image is encoded in base64 but FormData expects the data to be a blob. Therefore, convert it to blob in dataURItoBlob(). Both the blob and the orientation info are appended to the form_data variable.

The POST request is sent using the ajax technique. The syntax is fairly simple. First, it forms the request and sends it to the server URL, and waits for timeout (in ms) for a response. On success, it runs a function in .done while on failing it runs the function under .fail. In my case, the server sends back the results which are assigned to the data variable. We extract the image with predictions and assign it to the preview variable which replaces the uploaded image by the image with detection results. Finally, the button prompts the user to reload the page is displayed (rld). On fail, the user gets a pop-up asking to fill an issue or to contact me. That is all.

I also decided to add an indicator that checks if the app is responsive at a glance without submitting an image for detection. This is what you see at the bottom of the page:

In HTML it is defined as a footer and located in (v-iashin.github.io/detector.html):


  Detection Project, 2019 ()

The code that handles the check is in (v-iashin.github.io/js/status_checker.js) and the url is specified with the port that the Flask app is using:

// url to server with flask running
var STATUS_CHECK_URL = 'https://iashin.ml:5000/status_check';

// by default it is down
document.getElementById('status').innerHTML = "offline";

$.ajax({
 //your server url
 url: STATUS_CHECK_URL,
 type: 'GET',
 success: function() {
   document.getElementById('status').innerHTML = "online";
 },
 error: function() {
   document.getElementById('status').innerHTML = "offline";
 }
});

Similar to sending POST requests, we form a GET request and use ajax to send it and get the feedback. On success, it will change the footer to online and to offline on a failure.

2. Back-end

The part answers the question: how to process the POST request received from the front-end on a Linux machine (server) and send back the results. We are going to use the Flask framework to achieve this.

Flask App Code

Essentially, you need to tell the Flask app at which endpoint (your.domain/something) and what kind of requests (GET, POST) you want to handle. I use / (root) for the incoming user input (POST) and /status_check to quickly check if the app is responsive without sending anything (GET). In this project, the code for the Flask app is located in v-iashin/WebsiteYOLO/main.py.

Let's start with /status_check:

@app.route('/status_check', methods=['GET'])
def status_check():
    if request.method == 'GET':
        return 'GET request received'

As you see, this one is pretty simple. All we need is to use a decorator specifying the endpoint [your.domain]/status_check and methods it will be handling at this endpoint – only GET. Also note, we access the content of the request in request variable assigned globally on import above.

Now, let's consider the root [your.domain]/ endpoint function which handles user inputs (POST requests):

@app.route('/', methods=['POST'])
def upload_file():
    # access files in the request. See the line: 'form_data.append('file', blob);'
    files = request.files['file']
    # save the image ('file') to the disk
    files.save(INPUT_PATH)

    ############# RUNNING DETECTOR ###############
    try:
        orientation = request.form['orientation']
        print(f'Submitted orientation: {orientation}')
    except:
        orientation = 'undefined'
        print(vars(request))
    # run the predictions on the saved image
    show_image_w_bboxes_for_server(
        INPUT_PATH, OUTPUT_PATH, ARCHIVE_PATH, LABELS_PATH, FONT_PATH, MODEL, DEVICE, orientation
    )
    ################################################

    # 'show_image_w_bboxes_for_server' saved the output image to the OUTPUT_PATH
    # now we would like to make a byte-file from the save image and sent
    # it back to the user
    with open(OUTPUT_PATH, 'rb') as in_f:
        # so we read an image and decode it into utf-8 string and append it
        # to data:image/jpeg;base64 and then return it.
        img_b64 = b64encode(in_f.read()).decode('utf-8')
        img_b64 = 'data:image/jpeg;base64, ' + img_b64
    return jsonify(name='input.jpg', image=str(img_b64))

It starts by accessing the user data (an image) from the request variable The content of the request variable is formed by the front-end. We save the image in the .jpg format at INPUT_PATH for processing in our detector. Next, we run the detection function (show_image_w_bboxes_for_server()) which reads the saved input image, detects objects, and saves a new image with drawn bounding boxes (OUTPUT_PATH). Finally, we read the image with results, represent it in base64 format (a string), and return it in a JSON.

This JSON is, then, read by the front-end by accessing the image field.

Instance Configuration

To rent a server, you can use any instance provider out there, e.g. Google Cloud, Heroku, or AWS.

Hint: Google Cloud will give you free credits when you register with a bank card (repeat with another card when expires 😉) and AWS credits can be obtained with the GitHub student pack (one for each degree 😉).

There are plenty of resources online on how to rent and set up an instance. I recommend you to try setting it up if you have never done it before as it is a good exercise. For example, try to rent one and run a Jupyter Notebook such that you could access it, e.g. from your cell phone using the IP of an instance.

My instance has 4 vCPUs, 5GB RAM, 40 GB of disk, and running Ubuntu 20.04. I found this configuration to be a good money-performance trade-off. You will need at least 4GB for both the OS and the detector while the CPU count and disk space can be reduced.

When renting an instance make sure to allow ports you are going to use by e.g. Jupyter (8080), Flask (5000), and, of course, ssh (22), and HTTPS (443). Also, reserve an IP and make it stable such that it will not be changed after an instance reboot since by default they are ephemeral.

Instance Setup Instructions

Necessary OS libraries:

sudo apt update
sudo apt -y upgrade
sudo apt install -y git tmux wget curl
# stuff for OpenCV
sudo apt install -y libsm6 libxext6 libxrender-dev

A Python environment with conda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda3
source $HOME/miniconda3/bin/activate
conda deactivate
conda init
# prevents conda from activating `base` environment when shell starts
conda config --set auto_activate_base false

Now when conda is installed, we can build an environment for the project and download the detector weights

git clone https://github.com/v-iashin/WebsiteYOLO.git
conda env create -f $HOME/WebsiteYOLO/conda_env.yml
cd WebsiteYOLO
bash ./weights/download_weights_yolov3.sh
cd ../

DO NOT FORGET TO STOP YOUR INSTANCE IF YOU ARE NOT RUNNING ANYTHING.

3. Assigning a Domain Name to the Instance

When renting an instance, we got an IP address. We cannot, however, use the bare IP to send user requests from GitHub Pages. The reason behind it is that GitHub bans uncertified Cross-Origin Resource Sharing (CORS) which happens when we try to send some possibly sensitive user info uploaded on a website with the GitHub domain to somewhere else. Therefore, the connection to that domain must be secured via HTTPS. For this reason, we will need a domain name and a DNS provider to map the IP to the domain name.

Getting a Domain Name

To own a domain, you need to pay $$ yearly to a registrar (e.g. godaddy or namecheap). For the sake of this tutorial, we will rent a "free" domain from Freenom.

However, I advise you not to rely on Freenom by any means, especially your trust and money. You can read reviews on HackerNews or Reddit. E.g. this one. TLDR: they will remove the domain from your account and put it on sale once it will get some traffic.

To register a domain on Freenom, just open the website, register, and select a domain name.

Freenom provides domains for a few months, then you have to renew it for up to 12 months during the last 2 weeks before the expiry date. Freenom should send you an email closer to that period. The renewal is also free.

Registering the Domain Name in a DNS

Next, we need to associate the instance IP address with the domain name. This is the task of a Domain Name System (DNS). Again, you could rent it from AWS or Google Cloud for a small amount of money but for now, I will show you how to do it using Freenom for free.

However, again, I advise you not to rely on Freenom by any means, especially your trust and money. You can read reviews online.

In Freenom, go to My Domains > Domain > Freenom DNS. There you can add DNS records:

Name	Type	TTL	Target
WWW	A	300	your.instance.ip.addr
	A	300	your.instance.ip.addr

Save everything and wait for a bit as DNS needs some time to register the mapping. Freenom page should look similar to this:

Here I usually go to a website that checks DNS entries, e.g. dnschecker.org, and wait until it appears there.

I Want to Use Google Cloud DNS with a Freenom Domain

1. Add a zone in your: Google Cloud Console > Cloud DNS and register a new entry. You should be able to add records there and see the nameservers like so:

2. Go to Freenom: My domains > Manage Domain > Management Tools > Nameservers > Custom nameservers add these nameservers to the corresponding rows.

3. Similarly, you can check the presence of DNS entries on dnschecker.org.

Securing our Connection with an SSL Certificate

The final step here is to obtain an SSL certificate for HTTPS. Essentially, we just need the 🔒 near the URL field in a browser.

Install certbot on your instance:

sudo apt install certbot

and initiate a ACME challenge that will allow the certifying third-party to proof that you own both the domain and the IP address:

sudo certbot certonly --manual --preferred-challenges dns

It will ask you some questions (email etc). Enter your domain name, e.g. john_smith.tk, and answer Yes to the public logging question. Next, it will ask you to add an ACME challenge token to your DNS: _acme-challenge.your.domain 6wMpfi5ZXG0rbJt7_H2qFtT9_YUVJY_5VzEtbsJnD8. Don't press Enter yet, go to DNS provider page and add a TXT entry:

Name	Type	TTL	Target
_acme-challenge	TXT	300	6wMpfi5Z...tbsJnD8

See screenshots above as your guidance. Similarly, you can check the presence of the DNS entry in dnschecker.org (use _acme-challenge.your.domain instead of just your.domain there). Once, it appears there, press Enter in certbot. It should congratulate you and say where the certificates are saved. Here is how it all looks in my case:

The screenshot of a successful acme challenge

By default, the keys are restricted to a root user only. You can loosen the permissions with

sudo chmod -R 755 /etc/letsencrypt

to allow programs (Jupyter, Flask) run by a user to have an access to keys, otherwise you will have to run these libraries under root.

The certificate is valid for 3 months, then you need to renew it. Letsencrypt (certbot) will send an email to the address you specified when registering your first certificate. The renewal procedure is the same: running the certbot command and passing the ACME challenge (see above).

4. Running the Detector

At this point, I hope, you have an instance running with the environment as well as a registered domain name that is mapped to the IP of the instance by a DNS. If so, the next step is pretty simple: instantiate a tmux session and start the Flask app from there as such

conda activate detector
export FLASK_APP=./WebsiteYOLO/main.py
export FLASK_RUN_CERT=/etc/letsencrypt/live/your.domain/fullchain.pem
export FLASK_RUN_KEY=/etc/letsencrypt/live/your.domain/privkey.pem
flask run --host=0.0.0.0

If flask run fails and tells you that you don't have fullchain.pem or privkey.pem, most likely you need to change the ownership of these files to your user. However, I also found that different versions of the environment packages may cause this error but I didn't have an opportunity to inspect which ones and why. Alternatively, you can run flask under root.

If no errors occur, go to your.instance.ip.address:5000/status_check – your browser should print GET request received and flask will print this event to the terminal.

If you cannot see GET request received, you need to check if port 5000 is opened in the network settings of your instance.

Repeat the same with your domain name instead of the IP: https://your.domain:5000/status_check

Next, if we go back to front-end and try to upload the image and send a POST request.

On success, your instance terminal should output something like this for both GET and POST requests:

If you are seeing this, I think, you succeeded. Well done!

Final Remarks

This guide is a bit sparse and might not contain every small detail. If something is not clear, feel free to let me know by email or form an issue in the v-iashin/v-iashin.github.io repository. Also, if you found this useful and managed to build your project on top of it, send me a link I would be happy to check out.

I thank Anna Iashina (LinkedIn) for proofreading the final draft.

Website with an Object Detector: How to Build It?

Website with an Object Detector:
How to Build It?