@pfaffman here’s another issue with some missing topics. Can you look in to this as well please?
Weekend Challenge Entry Posts - and Many Others - Missing / Not Migrated from Old Forum
I don’t know what’s up with this. I spent literally a week working only on getting the formatting of these topics as good as possible.
I’ll see what I can find.
My current guess is that these topics were created by a user with an invalid email address. There is code that generates a valid email address so that those users get created anyway, but some upgrade broke it. I modified the code so that those users do get created, so those users should exist, but that’s my current best guess.
I’m downloading the database to my local machine to see what I can find now.
But it’s wrong.
Check with @Helge , he created both the entry posts that are missing and the voting posts, that got transferred. - He used some script to create the voting posts afik, maybe that has something to do with it? - Just guessing of course.
Thanks for looking into this.
One short question: Would it be save for me to just create a new entry thread, now? Or would this cause problems in case you would ‘reimport’ the original WC sub-forum threads?
Btw. some old entry threads seem to have survived the migration.
E.g: Challenge #608 (13/2/15) Entries OPEN
One thing I just noticed:
The threads that did get migrated, do seem to be based on a previous/older state of the database.
For example I renamed the last voting thread to “Challenge #775 Voting CLOSED”. But it is called “Challenge #775 Voting OPEN” right now. Apart from that, at least one post seems to be missing in that thread. DM9 mentioned the car entry being used as a splash screen of BforArtists: “The classic car also made it to the splash screen of the latest BforArtists release. :D”. (including a screenshot: Attachment 523159 (https://blenderartists.org/forum/attachment.php?attachmentid=523159)
This post doesn’t seem to be there anymore.
It would be fine. The importer sticks a custom field in that ties stuff to the old database. It will ignore your new posts.
It seems like all of 2017 and most of 2018 is what’s missing.
I’ve tried re-running the script for posts since 2017-01-01. That didn’t help. I looked for some kind of clash with post/thread IDs, and don’t see anything.
That is to be expected for posts changed since the last import ran. The importer ignores what it’s already imported. We started an import about 8 days ago and then froze the forum and ran it again with the new data on Thursday/Friday (depending on time zones).
I see the missing posts in the database, though. I still don’t have an explanation.
Could the missing posts be exported on their own from the old forum and then merged from that separate instance into the new forum?
Perhaps that would help fill in some missing requisite data the current forum expects (and/or filter-out irrelevant data the new forums deprecates/discards)?
I’m, concerned yet remain hopeful =)
I found the problem, though I do not understand it. Somehow, the code that was replacing links to local topics was calling a function that never returned, causing any topic or post that contained
not to be imported. And if a topic doesn’t get imported, its posts can’t get imported.
I think that’s 15,295 topics, which is 3.6%. Making matters worse, we know that some of those topics are among the most important.
I’ll work with @bartv over email to figure out how to move forward.
YeaY … - ([^"]+?) - might become my new username. ^^ … Good luck!
There’s a certain strange comfort knowing the problem is widespread, kind of like going to downforeveryoneorjustme and hearing it’s not “just you”
Hopefully this means a solution must and will arise, since so much stands to be lost otherwise.
We are rooting for you!
Another thought on this overall situation:
Since Archive.org apparently did not save much, if any, of the previous forum, I’m wondering if the new Discourse format will be more Archive.org - friendly, or if it can be made to be so (e.g. by not using robot exclusion protocol).
The forum is very much like an invaluable research database and even a sort of history maintainer (of Blender and its users) that, when not available, it is far more than just an inconvenience but rather a significant absence of a critical service.
So, having some sort of “off-site” backup of that, be it at Archive.org or somewhere else, would be great.
It took so many years to get to where we are with this forum, with Blender, with our projects, and with our community, and I think it’s all so worth protecting and preserving, for us and for others, and for Blender.
Just for fun, I tried that regular expression in Ubuntu’s gedit, and it only highlights (when examining through search/replace in regex search mode) a portion of the entries URL as is.
If that last character in the regex is changed from * to .*, then the rest of the URL gets matched in my test.
I just confirmed the partial regex vs. full matching regex in Ubuntu medit with the same results.
The .* matches the entire remainder of the URL.
+1 for making the forums more friendly to archive.org. It’s important to preserve history. The new URL format should make it easier to allow for that.
Beyond gedit and medit tests, I attempted a quick (and deliberately verbose) test directly in PERL to try to see what was being matched and where:
____________________________________________________ #!/usr/bin/env perl $forumURL = "https://blenderartists.org/forum/"; $testString = "https://blenderartists.org/forum/showthread.php?449464-Challenge-776-(04-05-18)-Entries-OPEN"; $regexOriginalPartA = $regexOriginalPartB = $regexOriginalPartC = $regexOriginalPartD = $regexModifiedPartA = $regexModifiedPartB = $regexModifiedPartC = $regexModifiedPartD = $testString; $regexOriginalPartA =~ s/($forumURL)showthread\.php\?(\d+)-([^\"\[\]]+?)[#post(\d+)]*/$1/i; $regexOriginalPartB =~ s/($forumURL)showthread\.php\?(\d+)-([^\"\[\]]+?)[#post(\d+)]*/$2/i; $regexOriginalPartC =~ s/($forumURL)showthread\.php\?(\d+)-([^\"\[\]]+?)[#post(\d+)]*/$3/i; $regexOriginalPartD =~ s/($forumURL)showthread\.php\?(\d+)-([^\"\[\]]+?)[#post(\d+)]*/$4/i; $regexModifiedPartA =~ s/($forumURL)(showthread\.php)\?(\d+)(\-.*)/$1/i; $regexModifiedPartB =~ s/($forumURL)(showthread\.php)\?(\d+)(\-.*)/$2/i; $regexModifiedPartC =~ s/($forumURL)(showthread\.php)\?(\d+)(\-.*)/$3/i; $regexModifiedPartD =~ s/($forumURL)(showthread\.php)\?(\d+)(\-.*)/$4/i; print << "endOfTest"; ____________________________________________________ Regex URL-matching Test for BlenderArtists.org: Original REGEX (Part A): $regexOriginalPartA Original REGEX (Part B): $regexOriginalPartB Original REGEX (Part C): $regexOriginalPartC Original REGEX (Part D): $regexOriginalPartD Modified REGEX (Part A): $regexModifiedPartA Modified REGEX (Part B): $regexModifiedPartB Modified REGEX (Part C): $regexModifiedPartC Modified REGEX (Part D): $regexModifiedPartD endOfTest
I’m not sure if your function is pulling out parts of the URL and storing those segments in string variables or is simply looking for a full match, but, if it is at all helpful, here was the PERL output from the original regex and a variation on that:
Regex URL-matching Test for BlenderArtists.org: Original REGEX (Part A): https://blenderartists.org/forum/hallenge-776-(04-05-18)-Entries-OPEN Original REGEX (Part B): 449464hallenge-776-(04-05-18)-Entries-OPEN Original REGEX (Part C): Challenge-776-(04-05-18)-Entries-OPEN Original REGEX (Part D): hallenge-776-(04-05-18)-Entries-OPEN Modified REGEX (Part A): https://blenderartists.org/forum/ Modified REGEX (Part B): showthread.php Modified REGEX (Part C): 449464 Modified REGEX (Part D): -Challenge-776-(04-05-18)-Entries-OPEN
Just looking for the Thread ID. Here it is in Rubular:
And the problem was that the function that was using that ID to match the Thread ID to the Discourse ID was broken.
Rubular is nice. I see there, too, it reveals a partial substring match.
If the ID is simply the number, then it does seem to grab that, but if the ID is actually the number - (dash) - and remainder of URL string, then the match is only partial with the existing REGEX.
Here a Rubular version of the added dot before the final asterisk, and this seems to grab the remainder of the string:
And this second Rubular example of mine demonstrates the alternate REGEX from my test above:
One thing I noticed: if Rubular comes up blank, at first, a quick F5 / browser refresh/reload should populate the fields and generate the example.
Now, though, if it was simply a faulty Discourse or migration script function that was the problem behind the missing threads, then that should be fixable.
Hoping to hear some good news!
The function must have worked in part earlier if it was part of the earlier thread remapping which did obviously succeed in bringing topics / posts over to Discourse.
I’m still wondering if the partial original REGEX match somehow created corresponding Discourse thread IDs which then precluded things like the Entries threads from coming in because it was only matching up to a certain point (i.e. if the thread ID requires the number, dash, and full remaining string).
If that is the case, perhaps the threads which did not make it in could simply be modified (e.g. with an additional character or substring such as “archived”) so they could enter Discourse sans conflict?
I was also thinking, if the function issue is not yet resolved at this point, maybe it could be (if it is open source, or yours and shareable) posted here for further inspection?
Since this issue goes well beyond the WC forum, and that there are so many talented developers in our community in addition to your own expertise, this should be solvable and attract high-priority review from the community given how much information remains at risk of being lost.
Beyond that, thanks again to all those who worked on the migration thus far! It’s well worth the effort!