Changelog

A constant update of how I spend my time at work.

Tuesday Aug 27

UIUC.chat

  1. Add cron job + background job queue runner.

Monday Aug 26

MMLI

  1. Fully self-hosted the MMLI backend Kubernetes stack. Debugging required applying some K8 PVCs and ConfigMaps. Now it's all fully documented and self hosted.

Friday Aug 23

Meet with Amazon AICE research team.

Tried and failed to configure ELK stack to use Elastic Fleet to make it easier to connect external servers to the central ELK instance. Faced issues with TLS self signed certificates, even though I don't need or want TLS certs because I'm using Tailscale to connect services. It's a secure tunnel with automatic https, which removes the need for TLS certs.

Thursday Aug 22

UIUC.chat

  1. Several hours of pair programming with Drshika, our new hire. Nearly did an end-to-end feature implementation: support for Azure OpenAI models, with user-supplied API keys. Great on boarding session to our code base.

Wednesday, Aug 21

UIUC.chat

  1. Onboarded two new devs to UIUC.chat using our brand-new Developer Quickstart docs, they worked flawlessly! Pretty slick.

  2. Fixed a critical production bug. Did incremental refactoring, the platonic ideal of production code.

  3. Finally set up centralized log monitoring. Still WIP, need to add filebeat and metricbeat to other physical servers in our fleet.

    1. 100% self hosted, because log-hosting companies have horrible pricing. My storage is cheap, nearly free, 10x better than hosted offerings. Therefore, worth the extra effort.

ELK Stack (Elasticsearch, Logstash, Kibana). All the 5 docker containers on a single server, viewable in a single browser! Finally.

Tuesday, Aug 20

Infinite meetings on Tuesday. Updated stakeholders 😵‍💫

UIUC.chat

  1. Fix Llama 3.1 context windows size.

  2. Fix automatic LLM selection function. Default to the best available model, with price tradeoffs.

Monday, Aug 19

  1. Did an incredible amount of Kubernetes debugging so I can self host the MMLI backend. Lots of PVCs and kubectl apply

Sunday, Aug 18

Dotfiles refactor

  1. Migrated my dotfiles from Gitlab to Github.

  2. Notable mentions:

    • Powerlevel 10k , greatest thing since sliced bread

    • glances - a better htop

    • lsd - a better ls

    • bat - a better cat

    • fzf - a better grep // excellent at reverse terminal search (command history search)

    • ag - a better find

    • some great OhMyZsh plugins

    • miniconda. Edit: actually mini-mamba , the faster version of conda developed by high frequency traders.

Aug 15-16, 2024

UIUC.chat

  • Extensive debugging of the Postgres database, it looks like a JSONB column in a table is causing pg_dump -> pg_restore to error out. Our guess is malfromed json snuck into the DB.

Amazon AICE

  • Synthetic data generation with Distilabel. Implementing our filtering rules via their classes.

  • More manual data cleaning, removed a further 30 bad questions of our 800. Quality is everything. It's worth manual filtering for the 'last bit of ultra-high quality post-training data'.

Home servers

  • Provisioned ZFS on mirrored Optane drives, passed through to the VM so the VM gets the raw performance of the Optane drives (instead of making a NFS share, which adds tons of latency and overhead).

Aug 14, 2024

UIUC.chat

See: UIUC.chat Vision & Medium Term Plan

  • Group planning for automated Metadata extraction and Insights on UIUC.chat.

    • Key ideas: structured outputs, creating charts and visualizations from user's documents, creating insights over multiple documents specifically hierarchical summarization and contradiction identification.

Servers

  • Storage server: Installed 4x 18TB HDDs

    • It’s still much cheaper to DIY than anything else. Cloud is cheapest $7/TB/mo, and I can buy (raw) at $9/TB. So even at 9*1.33 = 11.97 I’m still at under 2 month payback period.

  • Web server: installed 2x 1TB Optane P905 U.2 drives.

    • These are the lowest-latency drives ever made for random database access, e.g. Q-depth-1 (QD-1) reads from disk). Optane U.2 drives are extremely impressive as ZFS Special Metadata devices, now I'm provisioning a DB server with mirrored Optane drives. This setup (fast CPU, tons of DDR4 memory and Optane storage) be perfect for qdrant and postgres web serving.

Paragliding

My first time flying. PooPoo point, WA.

Aug 13, 2024

MMLI

  • Fixed database initialization (was missing a secrets file, so I'm glad I asked for help).

  • Met with Bingji to plan frontend-backend integration. We wrote a spec for the two endpoints we need. It was a delightfully efficient meeting: as long as we agree on the API shape, we're good.

UIUC.chat

  • Merge my PR Improve Default Model on /chat page

    • Respect previous choices. E.g. the "last selected model" will attempt to be used, fallback to the preference list.

    • Respect enabled/disabled models when selecting the default

    • Bugfix: gpt-4o-mini can now be disabled, previously that one model bypassed checks.

  • Redirect NCSA.ai -> UIUC.chat because it's now deprecated and superseded by UIUC.chat.

Aug 12, 2024

Research & Development

Administrative

  • Handled extensive email and Slack communications

  • UIUC.chat (chat.illinois.edu) campus adoption progressing

    • Awaiting budget approval from Illinois CIO this week

  • Gies expansion advancing successfully with professors Melanie Wiscount and Vishal Sachdev

Bug Fixes

Aug 9, 2024

Amazon AICE Research

  • Manually cleaned 1,000 real user questions (90-minute process)

  • Focused on retaining highest quality STEM questions

  • Data cleaning process. Reduced dataset from 5,000 to 800 high-quality questions:

    1. De-duplication using longest common substring

    2. De-duplication near-match via embedding cosine similarity

    3. AI filtering for STEM-related questions

    4. Final manual human review

  • Key takeaway: Quality > Quantity 🎯

Here's the full questions dataset (as .jsonl)

Aug 8, 2024

  • Finalized Vyriad project allocations. This project should officially begin in 2 weeks, then I'll transition off of MMLI and onto this for 25% time.

  • Created this changelog so my team has greater visibility into how I spend my time.

  • Setup Posthog monitoring of our Docs website traffic via GitBook integration. Our docs are getting really good, so I want to make sure people are finding them.

  • UIUC.chat: support Ollama models in the API. Assist Rohan with adding Tools support to our API.

    • Significant refactor to improve maintainability, especially RE our API. Now the /chat page and our internal API use the exact same functions to invoke a chat. Previously they were separate because we had one "client-centric implementation" and one "server-only implementation". Now we broke up the server-only part and call that from the client. No more duplicated code. Trivial support of new features, like Ollama and tool calling, in our API.

    • Upgrade from Clerk v4 to v5 for new features (google "one tap" sign in), better designed components.

    • Bugfix: gpt-4o-mini can now be disabled, previously that one model bypassed checks.

    • Logging: add Posthog logs to monitor the distribution of which LLM models are used.

Aug 7, 2024

  • Brought new home server online. It's designed to be a phenomenal single-purpose web server for UIUC.chat. It'll host our vector database (mostly in memory) and some of our helper docker containers, like our new secrets manager.

Using mirrored (raid 1) Optane drives as the boot drive, and main database host. This should make Q-depth 1 database lookups extremely low latency, and extremely high IOPs. It should be about 6x faster than the best M2 SSDs. I've had great experiences using Optane 905P as a special metadata device for ZFS, and now I'm doing the same for my vector databases.
This server is using a consumer platform, i.e. Intel 8700K that turbos to 5 GHz clock speeds, quite meaningful single core performance even today (~6 years later). But consumer motherboards don't typically support PCI-e bifurcation 😭 so I had to buy 2x these U.2 -> PCI-e adaptors for the drives.
On my EPYC server I use this with 4x Optane 905P U.2 for ZFS "Special Metadata" devices. 2x mirrors of RAID-1 because I value redundancy these days. https://www.amazon.com/dp/B0D3XX7PSF
  • New secrets manager, Infisical, like Bitwarden for Devs with .env secrets. It's delightful to self host. It's here: env.ncsa.ai. It enables a delightful developer experience, no more shipping around .env files via Slack!

    • Light refactor of our frontend and backend repo to automatically use the secrets manager. just do npm run dev and your secrets are auto-injected (so log as you login with our secrets CLI tool). It's actually great.

  • Wrote fantastic developer onboarding docs for UIUC.chat contributors. Primarily for our new HPC-GPT project, funded by NSF CSSI.

Aug 6, 2024

  • Spent a few hours to get Grobid running on Delta. I used a gradle build instead of Docker/Apptainer.

    • Wrote a script to automatically port forward all the way from my personal server to an active Delta compute node. This enables me to run Grobid as a web server and make API requests against it.

    • Next: write a script to create a reverse proxy against a pool of Delta compute nodes to enable multi-node scaling of my ingest process. On a single node my job might take a month of CPU-node-hours. So, I need ~30x parallel nodes to do it in a day.

  • Spent many hours over the weekend (August 4th) on adding Posthog logging to this process, now I have beautiful dashboard to track my overall speeds (avg and 95% latencies). Good monitoring is fundamental to my mission of increasing speed.

Grobid processing time.
Overall PDFs per second of the parallel ingest.

Last updated