<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Who Else But Me?]]></title><description><![CDATA[Who Else But Me?]]></description><link>https://blog.whoelsebut.me</link><generator>RSS for Node</generator><lastBuildDate>Tue, 07 Apr 2026 19:45:23 GMT</lastBuildDate><atom:link href="https://blog.whoelsebut.me/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Inexpensive Receipt Repository - OCR]]></title><description><![CDATA[Background
I have a LOT of old documents. In my brief time as a contractor as part of reporting my taxes the CRA (Canadas version of the IRS) required that I keep accurate records for any purchases that I wanted write off as a business expense for 5 ...]]></description><link>https://blog.whoelsebut.me/inexpensive-receipt-repository-ocr</link><guid isPermaLink="true">https://blog.whoelsebut.me/inexpensive-receipt-repository-ocr</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[AWS]]></category><category><![CDATA[aws lambda]]></category><dc:creator><![CDATA[Simmo]]></dc:creator><pubDate>Mon, 21 Feb 2022 23:41:21 GMT</pubDate><content:encoded><![CDATA[<h2 id="heading-background">Background</h2>
<p>I have a LOT of old documents. In my brief time as a contractor as part of reporting my taxes the CRA (Canadas version of the IRS) required that I keep accurate records for any purchases that I wanted write off as a business expense for 5 years. Knowing this, I was quite neurotic about record keeping; a habit that stuck with me long after I moved on to full time employment. Fuel receipts, dentist appointments, groceries; you name an expense and I probably have a receipt or some kind of paper record saved for it somewhere in my stash. This led to a rather large stack of various records, both personal and business, piling up in a corner of the room I use as a home office. Keeping so many records means that while I'm generally pretty sure that I HAVE a record of something, finding that slip of paper is usually pretty difficult. Digitizing all these records seemed like a fun project, but I was always worried that the CRA wouldn't like it if I couldn't produce the original document. I have only recently seen that the CRA has shifted its <a target="_blank" href="https://www.canada.ca/en/revenue-agency/services/tax/businesses/topics/keeping-records/acceptable-format-imaging-paper-documents-backing-electronic-files.html#mgr">stance on digitization</a>, allowing the use of digital records so long as:</p>
<ul>
<li>It is an accurate reproduction with the intention of it taking the place of the paper document</li>
<li>It gives the same information as the paper document</li>
<li>The significant details of the image are not obscured because of limitations in resolution, tonality, or hue.</li>
</ul>
<p>So, I thought now might be as good a time as any to embark on converting all my old paper records into their digital equivalents. The goal here is to build a personal OCR / Document search engine that I can use securely across devices.</p>
<h3 id="heading-disclaimers">Disclaimers</h3>
<ol>
<li>I have not done much, if any, research into existing solutions for this because I'm approaching it as a personal project more than a business necessity. If it turns out better than expected, I may put some more work into refining the components.</li>
<li>I'm building this on a budget that can only be described as next to nothing. This is mostly to make it more of a challenge; most people could very easily spin up a service like this with a big budget.</li>
</ol>
<h3 id="heading-requirements">Requirements</h3>
<h4 id="heading-security">Security</h4>
<p>These documents can vary in sensitivity from fuel receipts to medical records. For the most part, everything should be encrypted both in transit and at rest. No unencrypted data should ever be stored online, and the only way to get data from device to device is to authorize it via a previously used device. If any documents get leaked it might expose my crippling sugar addiction.</p>
<h4 id="heading-tech-reqs">Tech Reqs</h4>
<p>Documents will be coming through this service infrequently, so it's important to me both for price and overall system efficiency that the service not be running while its not in use. I also want to be able to test it locally in case of any weird document edge cases where the text being returned is not what I expected. Finally, while I do want a an accurate OCR implementation, accuracy is the lowest priority item right now because even decent quality OCR now is good enough for plain searching through documents if you add some kind of fuzzy matching.</p>
<h3 id="heading-ocr-engine">OCR Engine</h3>
<p>I'm a python developer by day and part of my job actually does happen to be interacting with different OCR engines. We've used some self-run solutions like <a target="_blank" href="https://github.com/tesseract-ocr/tesseract">Tesseract</a> and some hosted solutions like <a target="_blank" href="https://aws.amazon.com/textract/">Textract</a> for products where accuracy matters more. For this project I'd love to use Textract but theres no free tier for it as far as I can see. So, I'll be rolling my own OCR API. As part of my job I did some analysis of different self hosted OCR solutions and as far as I could tell tesseract was the only OCR that had a low enough startup time for a transient service. The other solutions generally relied on a longer startup process that loaded their models into memory. Therefore, to manipulate the images before sending them into tesseract I'll be using <code>OpenCV</code>, and for the actual OCR I'll be using the python tesseract binding library <code>PyTesseract</code>.</p>
<h2 id="heading-implementation">Implementation</h2>
<h3 id="heading-containers">Containers</h3>
<p>Both the image processing library <code>OpenCV</code> and the OCR library <code>PyTesseract</code> are not <a target="_blank" href="https://insidelambda.com/">libraries</a> that come in the default lambda environment. Complicating things even more, the OCR library I'm using is actually a binding around an executable, which means that will need to be available to the functions runtime as well. So, my options were to either upload a zip archive with the code and executables I wanted to run, or I could make a container that gets sent to ECR and run the lambda from there. In the end, I went with the container because it seemed a little bit cleaner and would likely give me a bit more useful experience.</p>
<p>Something thats important to note if you want to tackle building a container that will be run on lambda: you either need to build off of a lambda specific container base image. If you want to build off a custom base container you'll need to install a special "Lambda Runtime Interface Client (RIC)" that lets lambda call the code in the container. There was precious little information regarding the RIC, so I didn't dig too far into it.</p>
<p>I wont detail the full process of creating the container; there was a lot of swearing and trying to figure out what the alpine linux versions were of specific Pillow / Tesseract build requirements were. In the end, though, the <code>Dockerfile</code> looked like this:</p>
<pre><code>FROM python:3.9-alpine
<span class="hljs-comment"># Install dependencies</span>
<span class="hljs-comment"># This should probably be moved to a builder image</span>
RUN apk add <span class="hljs-comment">--no-cache build-base \</span>
    jpeg-dev \
    zlib-dev \
    tesseract-ocr \
    cmake \
    g++ \
    make \
    unzip \
    curl-dev \
    autoconf \
    automake \
    libtool \
    libexecinfo-dev 

RUN pip <span class="hljs-keyword">install</span> awslambdaric <span class="hljs-comment">--no-cache-dir</span>
RUN pip <span class="hljs-keyword">install</span> pytesseract <span class="hljs-comment">--no-cache-dir</span>

<span class="hljs-comment"># Copy in the app function code</span>
COPY app.py /

<span class="hljs-comment"># Heres where AWS will call the fn</span>
ENTRYPOINT [ <span class="hljs-string">"/usr/local/bin/python"</span>, <span class="hljs-string">"-m"</span>, <span class="hljs-string">"awslambdaric"</span> ]
CMD [ <span class="hljs-string">"app.handler"</span> ]
</code></pre><p>All in, our container weighs a total of <code>438MB</code> on the building instance, which for me was a <code>t2.micro</code> that I got for free under the AWS free trial. Once <a target="_blank" href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html">pushed to ECR using docker</a> it actually only appeared as <code>159.5MB</code> which I suppose means that they have one or more of my layers cached for free?</p>
<h3 id="heading-lambda">Lambda</h3>
<h4 id="heading-lambda-function">Lambda Function</h4>
<p>With the container ready to go, I began testing different functions contained in the <code>app.py</code> file. Including the RIC in your container allows you to test everything locally using a <a target="_blank" href="https://github.com/aws/aws-lambda-python-runtime-interface-client#local-testing"><code>Lambda Runtime Interface Emulator</code></a>. Heres how I was running the emulator during most of my testing:</p>
<pre><code>docker run <span class="hljs-operator">-</span>v <span class="hljs-operator">~</span><span class="hljs-operator">/</span>.aws-lambda<span class="hljs-operator">-</span>rie:<span class="hljs-operator">/</span>aws<span class="hljs-operator">-</span>lambda \
    <span class="hljs-operator">-</span><span class="hljs-operator">-</span>env AWS_LAMBDA_FUNCTION_MEMORY_SIZE<span class="hljs-operator">=</span><span class="hljs-number">512</span> \
    <span class="hljs-operator">-</span>p <span class="hljs-number">9000</span>:<span class="hljs-number">8080</span> \
    <span class="hljs-operator">-</span><span class="hljs-operator">-</span>entrypoint <span class="hljs-operator">/</span>aws<span class="hljs-operator">-</span>lambda<span class="hljs-operator">/</span>aws<span class="hljs-operator">-</span>lambda<span class="hljs-operator">-</span>rie \
    <span class="hljs-operator">&lt;</span>container<span class="hljs-operator">&gt;</span>:latest  \
    <span class="hljs-operator">/</span>usr<span class="hljs-operator">/</span>local<span class="hljs-operator">/</span>bin<span class="hljs-operator">/</span>python <span class="hljs-operator">-</span>m awslambdaric app.handler
</code></pre><p>This was super helpful for testing different memory allocation sizes, because that would be my main factor in determining the OCR cost per page. Weirdly enough, the local url you have to make the request to is constant, and seems to be from around the time that they were first developing python lambda functions:</p>
<pre><code><span class="hljs-attribute">http</span>://localhost:<span class="hljs-number">9000</span>/<span class="hljs-number">2015</span>-<span class="hljs-number">03</span>-<span class="hljs-number">31</span>/functions/function/invocations
</code></pre><p>In the end, anything under a 512MB allocation would lead to an error when initializing whatever models tesseract was using under the hood. Because we're using a relatively low amount of memory, AWS will also give us a proportionally lower share of the CPU's time. In the end, the average of 5 seconds processing time per document isn't the worst. You might be thinking "But what if I submit a really big image, wont that make it really slow". Well yes, but there is a max request size for lambda function of around 6MB so there is a hard upper limit on the size of image that you can extract from. Generally I've found good results for images that are at least <code>2000x2000</code> which comes in well under the limit with any good image format.</p>
<h4 id="heading-api-gateway">API Gateway</h4>
<p>Lambda functions only get you 90% of the way to a working API; a lambda is just a piece of code that gets run on a 'trigger'. Part of the creation of a lambda function is defining what that trigger actually is. In my case, I want my lambda function to be triggered on an HTTP(S) request. So I need to create a new endpoint from which we can actually request the extraction of a documents information. The only way that I'm aware of to do this is through AWS' <code>API Gateway</code> service. To set this up I followed <a target="_blank" href="https://docs.aws.amazon.com/lambda/latest/dg/services-apigateway.html">their documentation</a> on setting up an API gateway with Lambda.</p>
<h4 id="heading-code">Code</h4>
<p>The code for the lambda function was quite simple, only made longer by a lot of error handling I had to put in when debugging things. You can find the full app code in <a target="_blank" href="https://gist.github.com/xmaayy/9fa310a2cff1ca832d3c052c148e4a28">this gist</a>, but the main handler looks like this:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">handler</span>(<span class="hljs-params">event, context</span>):</span>
    <span class="hljs-keyword">try</span>:
        body = json.loads(event[<span class="hljs-string">'body'</span>])
        image_bytes = body[<span class="hljs-string">'image'</span>].encode(<span class="hljs-string">'utf-8'</span>)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"error"</span>:<span class="hljs-string">"Could not get image from request"</span>,
            <span class="hljs-string">"exception"</span>:str(e),
            <span class="hljs-string">"event"</span>:event
        }
    <span class="hljs-keyword">try</span>:
        img_b64dec = base64.b64decode(image_bytes)
        img_byteIO = io.BytesIO(img_b64dec)
        image = Image.open(img_byteIO)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"error"</span>:<span class="hljs-string">"Error decoding and opening image"</span>,
            <span class="hljs-string">"exception"</span>:str(e),
            <span class="hljs-string">"event"</span>:event
        }
    <span class="hljs-comment">#result = pytesseract.image_to_data(Image.open('ocr-test.jpg'))</span>
    <span class="hljs-keyword">try</span>:
        result = pytesseract.image_to_data(image)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"error"</span>:<span class="hljs-string">"Error in tesseract"</span>,
            <span class="hljs-string">"exception"</span>:str(e),
            <span class="hljs-string">"event"</span>:event
        }
    <span class="hljs-keyword">return</span> json.dumps({
        <span class="hljs-string">'statusCode'</span>: <span class="hljs-number">200</span>,
        <span class="hljs-string">'headers'</span>: {<span class="hljs-string">'Content-Type'</span>: <span class="hljs-string">'application/json'</span>},
        <span class="hljs-string">'body'</span>: json.dumps(tsv2json(result))
    })
</code></pre>
<p><strong> Very Important Note </strong> When you're using the API Gateway to trigger a lambda function, any JSON that you send to the endpoint gets turned into a <code>UTF-8</code> string and put in the <code>body</code> section of the event. You might have noticed in the code above that I had to call <code>json.loads()</code>. This was the source of a long debugging session because I was used to a flask&lt;-&gt;gunicorn combo where the request data comes in directly as a dictionary. It's also passed directly in as a dictionary when you're calling the lambda function directly with the RIE, or when you send in a 'test' from the online lambda console. </p>
<h2 id="heading-appendix">Appendix</h2>
<h3 id="heading-extra-docker-commands">Extra Docker Commands</h3>
<p><strong> Authenticating Docker With ECS </strong></p>
<pre><code class="lang-bash">aws ecr get-login-password --region ca-central-1 | docker login --username AWS --password-stdin &lt;ECS URL&gt;
</code></pre>
<p><strong> Tagging and Pushing Your Built Image </strong></p>
<pre><code class="lang-bash">docker tag &lt;IMAGE ID&gt; &lt;ECS URL&gt;
docker push &lt;ECS URL&gt;
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Holy Crap, Systems Are Complicated]]></title><description><![CDATA[The problem
I've stumbled upon what I think could be a fun small business to run; essentially its taking care of reminding people to send documents. I like the idea because its something that I can very likely have ready to test with customers within...]]></description><link>https://blog.whoelsebut.me/holy-crap-systems-are-complicated</link><guid isPermaLink="true">https://blog.whoelsebut.me/holy-crap-systems-are-complicated</guid><category><![CDATA[AWS]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[System Architecture]]></category><category><![CDATA[server hosting]]></category><category><![CDATA[hosting]]></category><dc:creator><![CDATA[Simmo]]></dc:creator><pubDate>Thu, 06 Jan 2022 14:00:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1641475035477/1tsLocksxd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-the-problem">The problem</h1>
<p>I've stumbled upon what I think could be a fun small business to run; essentially its taking care of reminding people to send documents. I like the idea because its something that I can very likely have ready to test with customers within probably 1-2 weeks. With such a quick time to MVP I can test it without being too attached to the outcome. The only problem is, I don't have much experience building a full system like this, its been mostly CRUD web apps and backend services that don't need to deal with pesky user authentication thus far. My engineering degree is finally coming in use because I need to know about security methodologies (RBAC, UBAC, etc) and the very basics of systems design. </p>
<p>The very basics of what I'll need to get an MVP going are:</p>
<ul>
<li>Front End (ofc, this will take the most dev time from me)</li>
<li>User management </li>
<li>Authentication</li>
<li>Object / Blob Storage</li>
<li>A database for User &lt;=&gt; Object  mapping </li>
</ul>
<p>I drew up a little diagram for how I imagine the whole backend will work:
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1641475035477/1tsLocksxd.png" alt="Untitled.png" /></p>
<p>This all needs to be hosted in Canada, because the target market is privacy regulation concious.</p>
<p>Now there is a decision to be made though -- do I go with a cloud provider for everything, or do I manage everything myself? We're going to assume 100 users for the first year, each user with an average of 5GB (some will have 10GB+, some will have &lt;1GB. Its text documents so you'd have to be massive to hit 10GB). So a total of 500GB in storage. Additionally, let's assume an average of 100GB/month of traffic.</p>
<h2 id="heading-cloud">Cloud</h2>
<p>Everything here is going to be priced as if I was buying it from AWS because that is likely where I would be buying from if I did end up actually using a cloud provider. Front end could be hosted on a nano instance, or just on Vercel for free. It cant be statically hosted because its a web app and will be fetching things from the BE and populating pages, and will need some API routes of its own to avoid CORS, etc. I'll price in about 3$ for that. User management and authentication would be done with AWS Cognito, and with &lt;50k users its free (though 5 cents per user above that, so it could end up at 5$ 😱). We'd need  a database as well, and just looking at the pricing for any service with RDS in its name makes me sick (the lowest MySQL instance is 300$/month) so we're going to integrate the database with the backend server. Because the back end will be performing its normal functions, plus the database I went with 2vCPUs and 8GiB of ram. Without a reservation (I don't want to reserve without knowing that I'll continue the project) it comes to 57$ per month. Lastly, if we do some very approximate averaging to the amount of data stored and transferred with S3 we get around 13$/month. So the total charges for AWS end up at 🥁🥁🥁🥁🥁 :</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1641476398081/G6RnIrcz4.png" alt="image.png" />
<a target="_blank" href="https://calculator.aws/#/estimate?nc2=h_ql_pr_calc&amp;id=594370afec94ce75a56bc51f11ab64a9eba60b62">calculator</a>
The largest part of this is EC2 at 57$/month. </p>
<h2 id="heading-dedicated-virtual-server-self-managed">Dedicated / Virtual Server / Self-Managed</h2>
<p>I'd end up running K3S or something and hosting the services all on one machine. So I'll upgrade it to a 4vCPU machine, which ends up at 30$ CAD per month. For block storage, assuming 300GiB of in/out/storage per month (which is just about the busiest month I can imagine this product having) it'll be 8$ CAD. So in total everything would be about half of the AWS bill at 38$/month, but I have to manage it myself.</p>
<h2 id="heading-third-hidden-option-self-host">Third Hidden Option -- Self-host</h2>
<p>I have a 16 thread, 24GiB Ram machine collecting dust at home. I used to be an avid gamer but sold my GPU in the hot market we have now because it was worth 200$ more than I paid for it 2 years ago. Theres also approximately 20TiB worth of HDD sitting around in it as well, because I also have a lot of.... Linux ISO's. So if I use that (which would probably run me in the neighbourhood of 300$/month if I rented it from someone else in Canada) I can host all of this for ~ FREEEEEEEE ~. There is of course the possibility of my power going out, but I live close enough to the city core that I could probably offer a 99.9% SLA and not worry about it.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>I'll likely host everything for free on my own hardware until the idea is validated. If I can get 1-2 paying customers before the end of February, I'll upgrade to a virtual server from OVH. By then I'll know the real processing power requirements as well! Finally, if I can get a decent customer base, I'll switch to AWS so that I can do multi-region more easily.</p>
<p>Thanks for reading</p>
]]></content:encoded></item><item><title><![CDATA[Optimizing Against You]]></title><description><![CDATA[As it stands, social media companies seem to be acting like the cigarette companies of the 21st century. Millions in advertising trying to get as many people as possible to join the app they know to be harmful. While it's a companies responsibility t...]]></description><link>https://blog.whoelsebut.me/optimizing-against-you</link><guid isPermaLink="true">https://blog.whoelsebut.me/optimizing-against-you</guid><category><![CDATA[optimization]]></category><category><![CDATA[algorithms]]></category><category><![CDATA[social media]]></category><dc:creator><![CDATA[Simmo]]></dc:creator><pubDate>Tue, 04 Jan 2022 12:45:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1641299989333/mSr-0lyQT.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As it stands, social media companies seem to be acting like the cigarette companies of the 21st century. Millions in advertising trying to get as many people as possible to join the app they know to be harmful. While it's a companies responsibility to take care of their investors, who is responsible for making sure this is not at the expense of its users? I consider myself to be pretty cognizant of the addictive algorithms in use by corporations, and still got caught up in one unknowingly.</p>
<h2 id="heading-my-experience">My Experience</h2>
<p>When Canada's lockdown started and leaving the safety of your house felt like putting your life on the line, I started to go back through all the games I'd missed since I had quit before university. I played just about every type of game there is before finally landing on a game called Apex Legends. The most important thing to note about Apex Legends in this context is that it is free to play, the only method of supporting the game's continued development being cash shop cosmetic items and a once-a-month 'battle-pass' that allows you to unlock unique cosmetics by playing more.</p>
<p>Unfortunately, I didn't often have friends to play with, so I would play with other random teammates. This was fine while I was learning, but over time the inconsistency of my teammate's skill levels started to wear on me. Even in the Ranked mode where you're supposed to be matched with people of equal skill levels, I found myself being matched with other players that were 5-6 ranks off of my own. When I posted screenshots of this happening online, it seemed like many other players were experiencing this weird matchmaking as well and were similarly displeased. Eventually, the time cost of playing the game was too much, and I had to quit. I was curious though, the strange matchmaking wasn't a mistake; they were intentionally matching players of unequal rank against each other. Why was it done this way when it seems to cause so much anger in the community? As it turns out, it seems to be another case of a "free" product taking advantage of its users.</p>
<h3 id="heading-skill-based-matchmaking-sbmm">Skill Based Matchmaking (SBMM)</h3>
<p>When you want to play something competitively, you likely want to be matched up with people that are within a similar skill bracket. Generally, you'd want a player that is a little bit above your rank to help you improve, or a player that is a little bit below your level to help you practice. The first popular system for ranking players in this way was created by a physics professor named Arpad Elo in 1950, initially to be used for ranking chess players.</p>
<p>The Elo rating measured the relative strength of a player in chess compared to other players in the league. Your ranking is inferred from your opponents and the results of the games you've had against them. Many modern rating systems in online games will, at their core, have a similar idea behind their ranking system: there is a variable assigned to every player, let's just call it Elo, that represents how skilled the player is in reference to the population. During regular play, players should generally be matched against those who have a similar ranking. When you win, your Elo goes up by an amount relative to the ranking of your opponent. Same thing if you lost. The bigger the disparity, the larger the change. So, over time, you should end up seeing your rank stabilize somewhere around your true current skill level.
For most, this feels like a fair way to match players, but is it really the 'best'?</p>
<h3 id="heading-engagement-optimized-matchmaking-eomm">Engagement Optimized Matchmaking (EOMM)</h3>
<p>In 2016 and 2017, EA filed patents for Dynamic Difficulty Adjustment (DDA) and Engagement Optimized Match Making (EOMM). Shortly thereafter, EA worked with a professor at UCLA to write and publish a paper that aimed to show the benefits of EOMM when compared with other matchmaking methods. The EOMM system they described is designed to try and learn enough about your playing habits that it can keep you engaged with the game as long as possible. The patent for it says "The longer a user is engaged with the software, the more likely that the software will be successful". They didn't give their definition of success in the patent, though in the paper they do say that the objective of an EOMM system can be optimized for both in-game time as well as real-money spending.</p>
<p>So, how do you keep your users engaged with the game for as long as possible? EA concluded that the best way to keep players engaged is to vary the difficulty of your game on the fly. It can do this through the use of what the patent calls knobs; controllable game parameters that will affect your player's experienced difficulty. The choice of knobs is one of the more important factors in DDA because it needs to be something that will go unnoticed by the user. The patent uses the example of a race car — if you adjust the max speed of the car based on if the user is winning or losing, that's going to be a very jarring experience for everyone involved. This limits the scope of what we can adjust in an online game; because most of the entities you engage with are other human players, it would be unfair to change an opponent's stats to affect the outcome of a duel. So, EA concluded that the fairest way to change the difficulty of a match to vary is the skill level of the players themselves.</p>
<p>But how does the system know who should be matched up with who? Using Machine Learning the game's operator can continually monitor each player's gaming habits and, after a certain threshold of time or matches, try to match you into a group of other similar players, called a cluster. Your habits can include anything from how often you quit after a win or loss, when do you spend money, how quickly do you start another match, etc. The cluster definitions, i.e. the approximate description of everyone in a cluster, and your assignment to that cluster won't stay the same over time. The churn risk between you and a potential opponent is calculated based on your habits as well as the cluster details of you and your opponent(s). The ideal set of matches is determined by minimizing the total churn risk across all possible matches that can be made.</p>
<p>In its paper, EA concludes based on a simulation that for games with a significantly large player population6 EOMM will at least match, if not outright beat, the churn-avoiding performance of all other matchmaking methods by around 1% per game. Which, calculated over a whole play session, will end up increasing the retention by 10-15%! This was taken conclusive proof that their EOMM system was the best for player engagement, but does that necessarily mean that it's the best for the players themselves? How does this affect the mentality of a player when the outcome of their matches seems to follow no obvious pattern.</p>
<h2 id="heading-terms-of-engagement">Terms of Engagement</h2>
<p>Engagement Optimized Matchmaking is just one of many examples of time optimizing algorithms in use on the web today. Many of the world's most popular websites: YouTube, Facebook and TikTok all have algorithms whose sole purpose is to make sure you spend as much time on their platform as possible. Eugene Wei has done an amazing series of articles on his blog about how the TikTok algorithm sees and interacts with the users of the platform. In it, he describes how quickly the algorithm can lock on to your particular content preferences, serving you up content that even you didn't know you'd enjoy. What happens when an algorithm gets to know your habits and weak points, even better than you know yourself? Anecdotally, it means that people report long sessions with the platform without really noticing how much time has passed. What makes these apps even more dangerous is that for every hour you spend consuming media, at least another hour of content that the algorithm can recommend to you has been uploaded. You can never reach the of your infinitely scrolling timeline.</p>
<p>What I want to ask is: <strong> Who has the responsibility of taking care of the users of these platforms? </strong> The companies are concerned mainly with their responsibility to their shareholders, and rightly so. In the past, when an industry has been created around something harmful (tobacco, alcohol, etc) there was government intervention in the form of heavy regulation. It seems now, though, that the problems we face from social media and the systems that create them are so complex that trying to regulate the allowable use of these algorithms in industry will be akin to playing "Whack-a-mole".</p>
<p>I think, in the end, it will come down to consumer education; it's the responsibility of those using these platforms to understand how they're being taken advantage of. But first, that information needs to be made public. We need to study how these algorithms work, and the information about the effects they have on their users made public. We should implement a warning similar to what you see on North American cigarette boxes or South American sweets so people know ahead of time what apps to watch out for.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1641300149741/piCQMN3NH.png" alt="TikTokWarning.png" /></p>
<h2 id="heading-tinfoil-hat">Tinfoil Hat</h2>
<p>Some people talk about 'the singularity'; a point at which AI becomes smarter than humans and we are immediately enslaved by the superior brainpower of an artificial master. What if instead of all at once, it was a gradual erosion of our free will as we find ourselves under the control of algorithms meant to optimize how we act. It seems to me that these social media algorithms are already kind of doing that: it's taking control of how we spend our time, by understanding us better than we know ourselves.</p>
]]></content:encoded></item></channel></rss>