Ayush Saraswat — the splitty chronicles

note: splitty is available on the App Store. check it out! since it’s also COVID season. splitty won't entirely be relevant until the world opens back up for business. but i thought i'd write some up thoughts anyways.

i've long had a fascination with food. i used to LOVE cooking. over the last few months, that's transitioned into going to restaurants with pals, checking out the bars, and dining at some random spots all over the world.

as a not-yet baller, recently-graduated college student, each dining experience introduced one complexity that perpetually made things a bit awkward: paying the bill. for a while, we'd split the bill equally. but introduce dietary restrictions and substantially different meal prices, drinks, or appetizers, and we had to iterate. we'd have one person in the group pay, and then do a bunch of back-of-the-napkin math to figure out who owed what. the person paying would be left to do the grunt of the work and just hope they'd get their money back.

after a while of dealing with this nonsense, i discovered Plates by Splitwise and it was frickin game changing. no more complicated math — instead, a cute lil app that made splitting the bill after a dining experience EASY. with a lot of inputting numbers and dragging items around each person would know exactly how much to pay back and dining was fun once again.

so why make splitty?

the weird thing with Plates was that most of the work in splitting the bill was just having to input all of the items into the app. why couldn't it just take a picture of the receipt and make things easy? i'd eat out at least twice a week and this really started to get to me.

i couldn't stop thinking about fixing that. so i decided to. it turns out, that was also the hardest aspect of the experience to improve. more on that later.

my vision was simple. hold your camera up to a receipt and voila — things just work! i wanted it to work like scanning a barcode works. i didn't want to take a picture of the receipt. i wanted it to be smart enough to know if a receipt was present and take the picture by itself. and afterwards, i wanted splitty to know everything that was on the receipt.

that was the part i enjoyed most about Plates. the moment you added an item, you would assign it to a plate by a super elegant drag. and when you shared an item, you would drag it to a communal plate and tap on other plates to assign it to the group. as i used it more, i realized i was sharing a whole lotta food. and therefore, i wanted splitty to just split everything among everyone, but also make it really easy to assign items to smaller groups of people. oh and also, while we're at it, i wanted splitty to make it easy to charge people on Venmo right there and then so it's easier to get your money back!

to me, this was the minimum viable product. it's what i had wanted, and i'd asked around and my friends seem to love it as well. a million more ideas had come to mind — why not pull in photos of your meal from the restaurant's Yelp profile? why not have splitty automatically know who you were eating with so you could charge them automatically? why not make a Venmo, but specifically for splitty, so the experience could be so much more seamless? or integrate it with expensing platforms to target businesses? or even just make a version of Splitwise i could be more down with? those ideas would have to wait though. they were untested, and first and foremost, i just wanted to ensure splitty was a better experience for me and for anyone who used it.

building even this tiny version of splitty was the most complex challenge i've taken on over the last few years. let's jump back to that question i would continuously ask myself: why couldn't it just take a picture of the receipt and make things easy? as i worked more on splitty, i realized the answer was a helluva lot more complicated than i had anticipated:

let's start with receipts

i've always found receipts to be straightforward to read. there's some information about the business on top, a list of items and their prices below, a section on the amount you owed, as well as payment information towards the bottom. some receipts even had a few promotions, all below the important stuff.

and while that's mostly consistent, did you know there's a multi-hundred page document that governs what you see on a receipt in the US? on top of that, each point-of-sale (POS) system has a different interpretation of that. the end result: receipts are kinda wack in the US. while they all contain similar information, there are a lot of permutations in the types of receipts that can be printed.

these permutations are similar enough to where they're almost a non-factor to you and me, but they make receipts a whole lot more complicated for a computer to read. and computers reading receipts is something that's happening now more than ever — people turn to invoice tools, budgeting software, and other organizational tools to keep track of their finances. and because of that, there's bound to be progress here. the EU, decades ahead of us in receipt policy, has an entire series of legislation in place for ensuring there's a digital copy of a receipt that's easy to interpret. as Square, Clover, and other more digitally-advanced POS systems are used in more establishments, an increasing number of receipts are going digital, making them easier to interpret.

longer story short: receipts are complicated. there are a lot of people working hard to understand this domain. but it's still early, and that's part of the reason why you don't really see any tools that can understand receipts in the US very well. i started this project a bit of a dummy, naively thinking it was easy. little did i realize.

how splitty tackles receipts

note: the world of machine-learning is relatively new to me, so i apologize if this all sounds basic.

when you launch splitty, it instantly goes into search mode. it obviously just hunting for a receipt? not quite... it turns out training a model to search for a receipt is a whole lot of work, so it's starts by searching for rectangles. it does a bit of magic to figure out which the most probable rectangle are. as you hold still, splitty takes a high resolution photo of the whole screen and relies on this rectangle information to come up with a cropped image of just your receipt. often, the receipt will be on a flat surface (like a table) and you'll be taking the image from an angle, so splitty accommodates for that as well by skewing the image until it's a rectangle instead of a quadrilateral. thanks to a lot of this being built-in-ish functionality with the latest versions of iOS, starting here was relatively easy.

okay, so now we have an image of the receipt. to breakdown the items on it, we need to understand what it says. for that, splitty relies on a lot of the optical-character recognition work that Apple has already done. they're pretty frickin good at this (but not perfect) and splitty just latches on because there's no chance i was going to make something better. by the way, this is the same type of stuff that enables you to search some PDFs, or Google Translate's live camera translation to just work. as splitty does this, it identifies the full text of the receipt. mind you, this really is just a massive list of words and information about where they are positioned on the image.

splitty knows what's on the receipt, but it certainly doesn't know how everything goes together. does "$15.00" go with the carrot cake or the quesadilla? and because i couldn't come up with a better way, i decided that geometry must be the right solution. it knows where the text is positioned, so it tries its damndest to identify which boxes are on the same line. that way, splitty's interpretation of the receipt will be closer to "Quesadilla $15.00" instead of everything being spread apart. this doesn't always work as some receipts will put this information on multiple lines. screw them. this is good enough.

how splitty understands a receipt. yellow boxes are groups of words it things belong together. red lines indicate groups of boxes that are on the same line.

cool. now we have a transcribed receipt! all we have to do is figure out what the items are. easy, ya? not quite.

there's a lot of gibberish on a receipt and it's hard to make up a set of rules that makes it easy to identify what something is. for instance, this is just a small example of the variation of text that a receipt may have on the item line:

"5 Quesadilla $15.00 T"
"Quesadilla 15.00 5"
"Quesadilla 15 X"
"15.0 Quesadilla"

it's just all over the place! i started by doing what seemingly everyone on the internet recommended — subtotal, tax, tip, and total all were easy to identify because there was a limited number of words that could preceded the amount that mattered:

"Amount Due: $15.00"
"Sub total: $10.00"

and anything else that had a price in it — it's likely to be an item! this approach was straightforward but boi did it get complicated and fragile quick. it turns out, there are a lot of ways to say "total" on a receipt. and because the OCR didn't always get each character correct (t0tal or tootal were common), things just never seemed to work.

this was all that mattered to me. it was the most important part of splitty and it had to work, at least most of the time. and so after countless other trials, i ended up trying to leverage more unfamiliar tools the tech world was fascinated by. i made a cute little machine learning model trained on a few thousand receipts i scraped on the interwebs (this took AGES) to help splitty out. it runs each line of the receipt through this model and the model spits out what it thinks the line is most likely to be: item, subtotal, tax, tip, total, or other. business information (name, location, phone number, email address) is also extracted and captured. the best part: it worked ~98% of the time. a whole lot more than i was getting classifying things with a bunch of rules myself.

the best part of all this: it all happens in a split second. splitty finds a bunch of rectangles in your image and identifies the one most likely to be a receipt. it takes a photo and makes it look pretty. it reads through it and does 6th grade geometry to figure out what goes where. and finally, it works real hard to be like "this line's an item. this line's the total." and that's mostly how splitty works!

what's next for splitty

it's a weird time for splitty. all this COVID talk is certainly not helping me test it more thoroughly and get feedback from the friends that have played around with it. delivery and grocery store receipts receipts only do so much. because of that, i'm going to pause work on splitty to tackle other projects (such as this kicking off this blog) for the next few weeks. and as soon as the world re-opens for business, i'll push to get more feedback around making the experience a whole lot better for splitty's users.

i already have a bunch of short-term improvements in the works i'd love to add to splitty. i want to make it better at the whole receipt detection thing, but the small incremental improvements require a lot of effort. i also want to make it remember where you've been so you can access receipts from the past. and lastly, i want to make the Venmo integration it has a bit better by recognizing the basic folks that you dine out with the most. but i'll pause on that for now and weather this storm with the rest of you.

and that's all. have any comments or questions about anything? or have thoughts on ways to make splitty better? i'd love to hear from you! reach out at saraswatayu@gmail.com or via @saraswatayu on Instagram!

the splitty chronicles

so why make splitty?

let's start with receipts

how splitty tackles receipts

what's next for splitty

on Universe