The dissertation that won an award, a bot running on 70 servers, and what a data science degree actually teaches you about becoming a software engineer.
In the final year at Nottingham Trent, dissertation topics arrive as a list. Lecturers propose the projects; students rank their preferences. Most of the list reads like a catalogue of sensible, well-bounded problems — things you could finish in nine months and write about with confidence. Then, halfway down the page: re-derive Shazam from first principles.
It was proposed by Archontis Giannakidis — Archie to anyone who spent time in his lectures — was one of the more technically rigorous members of the department. He was, without a doubt, an exceptional tutor and lecturer. As such, the project went straight to the top of the list. First choice, no hesitation.
The appeal wasn't nostalgia for the app, though Shazam had been a fixture of growing up — that moment of holding a phone to a speaker and watching a song materialise from noise. The appeal was the constraint: re-derive it, not replicate it. No reading the source code. No following a tutorial that had already done the hard thinking. Start from the 2003 paper by Avery Wang and work forwards, in Python, from scratch.
At the same time, something was shifting in terms of what felt interesting. Data analytics had been the stated destination — the degree was called Data Science, after all. But the modules that were landing hardest were the ones that felt most like engineering: designing systems, writing specifications, thinking about how things fit together and fail. The Shazam dissertation sat exactly at that intersection. It was a mathematics problem that required an engineering solution. It got accepted.
The plan was clean: implement everything from scratch. If the goal was to re-derive Shazam rather than copy it, then the implementation should be original at every layer. That meant writing custom complex number arithmetic, a Fourier transform from first principles, the peak-finding and fingerprinting logic — all of it, in Python, by hand.
It took two months. The research alone was dense — working through the mathematics of the Short-Time Fourier Transform, understanding how energy peaks behave across a spectrogram, building a mental model of the constellation map before writing a single line that would produce one. Then translating all of it into code, step by step, testing at each stage, convinced the architecture was sound.
Then came the dry run. The test library was modest by design: thirty-five songs, indexed, waiting to be matched. The first song began processing. The system ran. And ran. Two hours and forty minutes later, one song was indexed.
The first song took nearly three hours. There were thirty-four more to go. The custom implementation had to go.
On the moment the architecture changedTwo months of code went into a folder, archived but not deleted — it felt wrong to delete it entirely, and the understanding it had built was real even if the implementation wasn't usable. What replaced it was a pipeline built on Librosa, SciPy, and NumPy: the same mathematical operations, but executed by libraries that had been optimised over years for exactly this kind of work.
The switch was the right call, and it taught something more durable than the code itself: knowing when to build from scratch and when to stand on what already exists is itself an engineering decision. The dissertation wasn't diminished by using Librosa. It was about the algorithm, not the arithmetic.
The final system worked like this: an audio clip gets converted to a spectrogram via the Short-Time Fourier Transform. The highest-energy peaks across that spectrogram are identified and plotted as a constellation map — a sparse set of points in time-frequency space. Each peak becomes an anchor point, and within a defined target zone ahead of it in time, nearby peaks get paired with it to form a hash: anchor frequency, partner frequency, time delta. That hash goes into the index.
When a query clip arrives — recorded on a phone, possibly noisy, possibly starting mid-song — the same process runs on that clip. The resulting hashes get looked up against the index. Because the hash encodes relative frequency relationships rather than absolute amplitudes, noise and compression don't break it. The matching runs in under eight seconds on a laptop. Against thirty-five songs, it works.
The dissertation was the headline, but the decision to move towards software engineering rather than pure data science was made earlier — in a module that asked for a chatbot.
The brief was open enough: build a conversational agent backed by a database. The chatbot that emerged was about music. A pre-built dataset held information about albums and artists — The Weeknd, Drake, the kinds of names that populate a playlist at 1am in a student library. Ask it "Who is The Weeknd?" and it would answer. Ask it "When was Take Care released?" and it would tell you. When the database didn't have an answer, it reached out to the Spotify API, pulled the information, and stored it so it wouldn't need to ask again.
What made the module land differently from others wasn't the chatbot itself. It was the process. The brief required a design specification before a line of code. It required thinking about CRUD operations as a system rather than a set of queries. Proper object-oriented structure. File organisation that assumed other people might need to read the code. These felt like the concerns of engineering, not analysis — and they felt more alive than the statistics modules that had preceded them.
Clifton Campus sits south of Nottingham city centre, a bus ride from where most students live. The No. 4 runs often enough that you learn its rhythm — late enough to catch the 10:30 arrival for an 11am lecture, early enough to be back in the city by mid-afternoon. Most of the lectures weren't 9ams, which suited the schedule.
The library sessions were the hinge. Not because two hours is a lot of time — it isn't — but because they were consistent. The habit of turning up, sitting down, and making progress on something for a defined window is what got the dissertation to ninety pages. Not bursts of inspiration. Routine.
Alongside the degree, and mostly independent of it, two projects got built, shipped, and used by people who didn't know or care that they were made by a student.
The degree finished in 2025 with a First Class classification. The dissertation was awarded Best Mathematics Project by the Department of Mathematics — a decision made by the people who had set the topics, including Archie, which felt like the right kind of acknowledgement.
Somewhere in between, there were two summers spent doing real work — a logistics company in Aylesbury whose fulfilment tracking moved from anecdote to dashboard, and a pharmaceutical acquisition where a Python scraper against a Belgian medicine database produced an Excel file that got used in the deal room. Neither of those are the story of the degree. They're footnotes. Important ones, but footnotes.
The story of the degree is simpler: three years of building things for people, starting before anyone asked and finishing when the thing actually worked. That tendency — to find something that could be better and make it so, whether the audience is one friend or ten thousand strangers — is the thread that runs through all of it. It still does.