Note: At the client's request, we do not disclose the platform's real name.
Auto Marketplace aims to be the number one used car search aggregator in the US. Today we made an MVP that gathers car advertisements from a variety of sources including auction portals and personal ad boards, and helps users find cars over different sites in one place with a single search query.
Currently, people have to go through over 25 sources to find cars for resale which is very difficult and time-consuming to do on a daily basis. The more units available on the market, the harder it is to find these specific vehicles.
So the co-founders of Auto Marketplace decided to create a kind of consolidator of sources into one aggregated search and make it a mass product, using the technology as free for most people with the option to upgrade to premium accounts for dealers or just professional users who need more advanced features. In the future, the platform will have subscription plans and a number of premium features for individual users and businesses (notifications, saved search, personal ad feed, etc), in addition to features to search for specific car models on classifieds and auction sites.
Auto Marketplace co-founder, who has extensive experience in e-commerce and digital marketing, had no development expertise to start the platform. Since the creation of the MVP had to be within a certain budget, hiring and coordinating an in-house team was not beneficial both in terms of costs and management burden. Therefore, it was decided to look for a development agency that could provide a professional team to start the work and, importantly, manage it independently.
While looking for a technology partner through UpWork and other sources, the client came across a podcast where John Darbyshire talked about our collaboration to create SmartSuite. That's how we met.
For this project, we built the whole infrastructure starting with no designs. Clients only provided us with a list of sources they wanted to scrape. We chose a stack for the backend based on Django + Сelery + Redis + Scrapy. After we started to get first automatically parsed data, we made a simplified UI to display the results. Then we wrapped up the application with basic authentication and got the designs from our partners.
From the client side, this is a classic application based on React, where almost 100% of the components are used from the Ant Design system, but there are much more interesting solutions at the backend and infrastructure levels. We started with a simple Scrapy setup that runs alongside the main Django application, and pretty quickly ran into the limitations of this approach. We then often used Celery to process items from parsers and run them in separate process threads. We also did a lot of optimizations on both scraping and parallel processing and the infrastructure level as well.
Once we encountered scraping protection in some sources, we had to tune our parser and use some general methods to overcome it. However, it turned out that the most effective way was to use a proxy provider.
As we approached 1M items in the database, we had to take care of the API response time and the admin interface to handle that much data. So we had to optimize the entire stack from client-side requests to database indexes and infrastructure services.
The main thing is that we had to constantly increase the efficiency of parsing, distribute it between several services, and build a queue of tasks in such a way that as search requests grow, we continue to fit into the schedule.
At this stage, we have a fairly concise client UI with basic functions, as well as interesting admin portal settings and a complex parsing system.
On the Home page, visitors can search our entire database using two filters. In a simple scenario, a user can select a make, model, generation, and trim to get tons of results, but here, too, advanced filters allow you to search by gearbox, year, mileage, and price range. We use infinite pagination and a set of ordering options to make it more convenient. If the user does not initially use filters, then the ad feed is formed by default based on popular searches.
The Auctions page displays all active results by default. Here, the search is performed somewhat differently, namely by keywords that the user enters himself.
The main entity is the ads that we parse from the sources, and in the admin portal they are listed in the Cars section and sorted by the date of addition. The filter panel we see on the main site is additionally configured here.
Also, the admin portal has entities filled by administrators, such as make, model, generation, and trim. All these entities have a complex interdependence, namely model depends on the make, generation depends on the model, and it can be sorted by make and model. Trims are attached to generations, and trims are built in a special order so that users see them in filters on the main site. All these entities contain custom fields and links inside for correct communication and display according to filters in the client UI.
The system also contains fields with so-called checklists, which, on the one hand, are related to parsing, and on the other hand, to the work of administrators who add data to the system. For example, if it is necessary to add some new model for parsing, then there is a certain list, according to which it is worth making sure that the added information is correctly processed by parsers, namely, checking the source, correct parsing of trims and gearboxes.
In the admin portal, we have a separate Source section where all the sources that we parse in a certain prioritization are collected. If we have the same ad on different sources, we check it by the unique car identifier and show the ad from the higher priority source.
The most interesting thing is that for a more clear and accurate search, and most importantly for the correct display of results, we had to form more than 130 match search rules. Since in ads on different sources, the same cars for sale can be named differently, especially often this applies to numbers in trims, we wrote a whole set of rules that help to correctly identify cars and show ads according to a set of filters that the user sees on the site.
The entire infrastructure is deployed in DigitalOcean. We use PostgreSQL as a database, and Reddis for the cache.
There is a set of interesting components on the main app. As frontend components, we have a client UI based on React, and as backend components the Django admin panel and an API. Additionally, we use Celery as the parsing environment and Flower to monitor the work of tasks.
To date, we have completed the MVP development and handed over the project to the client team for further improvements.
In collaboration with visionary entrepreneurs whose only missing puzzle is tech expertise, we brought to market over 70 products that make people and teams more productive.