preface

documentation for the code behind pomudex in case something happens or someone is curious.
I'd write it on the site, but building a markdown engine seemed like a lot.


how to use

/schedule/{group}

  • Current Settings at any point to save your changes
  • unlabeled checkbox for more settings
    • unlabeled slider 1 is the time filter. options are:
      • upcoming live or next hour
      • today next 24 hours
      • three days next 72 hours
      • scheduled generally the ones marked more than a week out are for freechat or schedules
      • all unscheduled waiting rooms and no shows (late for over an hour)
    • unlabeled slider 2 is the thumbnail slider. on dense view, it's behind unlabeled checkbox 1. options are:
      • default 120px wide
      • mq 320px wide
      • hq 480px wide
      • sd 640px wide
      • maxres 1280px wide
      • (I don't know why youtube labels sd as higher quality than hq)
  • filter= is a url param for a toggleable positive filter based on the vtuber org structure, comma separated
    - you might find it preset on the secret shortcut urls
    - you can use multiple tags, eg filter=PRISM+Project,Petalight,Aura
    - full list of filter keywords are available on /channels
  • hide= is a url param for a non-toggleable negative filter based on the vtuber org structure, comma separated
    - it works the same as filter=, so you can hide branches/individuals that aren't to your taste
  • after a week of inactivity, the frequency of checks go from every 10 minutes to every 6 hours
    • can be an issue if someone goes on hiatus and doesn't give much notice for their return

/multi

  • drag a url onto the honse or use the unlabeled text box
    • if it contains a valid youtube id or twitch url, it will open an embed
    • if you refresh, it reloads the selected active videos
  • +/- buttons will change the width of embeds
  • unlabeled checkbox 1 opens more settings
    • there's a slider for custom embed width
    • a few diagnostic measures for the width and loaded streams
    • an open all streams button
    • a few different templates for viewing the loaded streams
      • if using mado or dex view, the profile picture search is WIP and will fallback to a picture of a cat
    • a test button that outputs the current loaded streams to console
    • unlabeled checkbox 2 to toggle spinning profile pictures
    • unlabeled checkbox 3 to make your next click remove the stream
      • shortcut shift-click
    • unlabeled checkbox 4 to make your next click open chat
      • shortcut alt-click
  • id= is a url param for preloading embeds, comma separated

/channels/{group}

  • see what channels are supported under this url {group}
  • hide= also works here
  • with how fast affiliations change, accuracy not guaranteed

/channels/{channel}

  • contains unarchived data from May 2024 onwards
  • unlabeled text box allows for filtering
  • slider allows filtering by uploads or streams
  • there's no pagination, so the page might be large for older channels

tools

hardware

  • any device that can run python
    • I just let my PC run overnight, but 3am windows updates caused issues
    • raspberry pi has been low voltage and reliable after turning off wifi powersaver
      • crontab has also been way nicer than
        1
        2
        3
        4
        while True:
            if (datetime.datetime.now().minute + 5) % 10 == 0:
                main()
        time.sleep(1 * 60)
        

software

  • config file with youtube ids
    • youtube channel > description > scroll down to share channel > copy channel id
    • optional: I used a yaml format to maintain org>gen>member relationships
  • back-end: git, python, requests, youtube, twitch
    • youtube: default limit is 10k queries/day or about 7 queries/min
    • /search is super expensive and not recommended
    • you can install the cli to make auth/calls easier but I opted for fewer installs and use requests
  • front-end: sveltekit

back-end

server v1

  • call /channels
    • grab channel names and thumbnails from channel ids (up to 50)
    • videoCount seems useful at first, but it only shows completed uploads, ie not waiting rooms
  • call /playlistItems
    • grab video ids of uploads by channel id
      • 'playlistId': channel.replace('UC', 'UU', 1),
    • this is the bulk of your quota since you have to call this for every channel
    • will throw an error if there are no uploads, eg predebut or terminated
  • call /videos
    • grab title and time data from video ids (up to 50)
    • for filtering, waiting rooms and premieres have activeLiveChatId in liveStreamingDetails
    • optional: save the data so you can skip old ids next query

server v2

this is a complicated mess, but the main improvement is less API load by checking fewer youtube channel and video ids, and therefore higher capacity. this is accomplished by saving the previous results and referencing it with each minimal scrape of API data. since the final API call isn't as immediately exportable as v1, a lot more care has to go into retrieving old data, cleaning up old waiting rooms, adding new data to your files, and generating the list of waiting rooms, all without overwriting the data. by shelling out the extra storage for the vod list, a catalog of all uploads is also possible.

  • parse a map of org > gen > member > platform into platform arrays
    1
    2
    3
    4
    5
    6
    BlueJump:
      Lillian:
        yt: UCjcb7PvPHmj3MNUysVUeJcg
        tv: lillianthemaid
      Nineteen:
        tv: ninete_en
    
    turns into
    yt: [ UCjcb7PvPHmj3MNUysVUeJcg ]
    tv: [ lillianthemaid, ninete_en ]
    
  • pull up previous archive of overall data
    • something that lists the last upload timestamp of each channel
    • call /channels
    • save any channel name updates
  • loop through overall channels and look at last activity
    • if none, ie channel wiped or pre-debut, skip it
    • if inactive for more than a week, also skip it
      • a few times a day, I scan everything, just in case
    • for active channels, call /playlistItems
      • just for 1 video. this lightens the load for the /videos call later
  • loop through individual channel archives
    • if there's no archive, download the channel via /playlistItems and write it somewhere
    • while the archive is open, add any older waiting rooms to the pile for /videos
      • defined as not private, has a scheduledStartTime, no actualEndTime, and not a past premiere
    • if the 1 video from /playlistItems is already in the archive, we're up to date and don't need to look into it further
    • if not, we have to check if they uploaded more than that 1 video. call /playlistItems again for more
    • for any new videos, add it to the pile for /videos
  • call /videos on the pile of new uploads and old waiting rooms
    • if nothing came back, the video is private. make a list of these for tagging
    • if there was a new upload after the last noted channel activity, update the last activity timestamp
    • if it's a premiere, tag it
      • premieres are defined as duration != P0D and no actualEndTime
    • make a map of channel > videos from these confirmed waiting rooms
      • maybe not needed but it's nicely filtered at this point
    • for any confirmed waiting rooms, we can add them to the final output file
  • loop through the channels again
    • for that list of privated videos, update the individual archive so you can skip it next run
    • add any new waiting rooms to the individual archive and save it
  • grab twitch streams with /streams and user thumbnails with /users
    • write it to the waiting rooms file and overall channels file, key can be just the user_login

database

  • write the list of waiting rooms somewhere
    • you can save everything to a relational db and filter/sort after the fact
    • non-relational db also works well with video ids
    • I just push json files to github and have the front-end find it
      • pros: free, reliable, file diff, version control
      • cons: no security, no queries, dubious practice, 5 minute cache
  • future: maybe self-hosted mariaDB
    • not needed for current waiting rooms
    • maybe needed for storing vod list + community posts
  • snapshot 4/30/24
    • channels: 337 listed -> 274 active
    • waiting rooms: 0 new + 548 old -> 548 active
    • archive: 93,455 videos
      • 0 min, 2400 max
      • 190 median, 277 average

front-end

hosting

host domain notes
cloudflare pages.dev "pomu" was available
google web.app "Site ID is has to be at least 6 characters"
github github.io limit 1 page per account, "Username pomu is not available"
vercel vercel.app found this one after, UI is nice

filtering

for most cases, appending the channel name to the video title and searching that string for the org works. less useful when the org dissolves but the group stays relatively connected, eg Tsunderia or PRISM. one solution is to reference a file that includes org>gen>member mappings. in my case, sveltekit has a server component that calls both the files for the list of waiting rooms and the map of vtuber groups. from the filter= param, it generates a list of channels that match the org/gen/talent name, and on the page it displays videos where the channel id matches that list.


postscript

warning: yappage ahead

this project came about when I saw a bunch of dead waiting rooms on mado. the web.app domain also looked funny, so I dug around and it turns out there's a ton of resources for hosting websites (for free!!). I could just do it myself -- how hard could it be? I've scraped webpages before.

two months later, it's basically done! it took about two weekends to get the script working for just personal linkdumping and eventually spin up the front-end for initial feedback. then a weekend and a half to optimize it for the current load and structural filtering capability. I wonder if I could have done it faster if I didn't have work and life stuff, but two months of trial and error helped to catch the bulk of issues I think. so many times I pushed a late night fix, only for it to reveal a break somewhere else, or for a totally new input to show up and obliterate my pipeline.

now I think the basic infrastructure is done. there are bells and whistles to add, like nice css and saved profiles and qol buttons and new features, but for now I'm able to deliver a page without dropping half the waiting rooms on the floor. graduations and debuts will also happen, meaning a whole job of maintaining the links, but I think I can manage it as long as I'm into vtubers. if the day ever comes and I'm not able to maintain the page any more, I leave the blueprints here for another anon to find and maybe improve. until then, my fairy's in her minecraft, all's right with the world.


FAQ

Q A
will you share the code? the repo will probably stay private. there's exposed keys and general opsec issues with the repo, and on top of that, swapping to a new repo would require releasing the pomu domain.
my oshi isn't listed submit here! general feedback also accepted
Edit
Pub: 22 Apr 2024 16:35 UTC
Edit: 01 May 2024 19:41 UTC
Views: 330