Internet Archive

music-labels-will-regret-coming-for-the-internet-archive,-sound-historian-says

Music labels will regret coming for the Internet Archive, sound historian says

But David Seubert, who manages sound collections at the University of California, Santa Barbara library, told Ars that he frequently used the project as an archive and not just to listen to the recordings.

For Seubert, the videos that IA records of the 78 RPM albums capture more than audio of a certain era. Researchers like him want to look at the label, check out the copyright information, and note the catalogue numbers, he said.

“It has all this information there,” Seubert said. “I don’t even necessarily need to hear it,” he continued, adding, “just seeing the physicality of it, it’s like, ‘Okay, now I know more about this record.'”

Music publishers suing IA argue that all the songs included in their dispute—and likely many more, since the Great 78 Project spans 400,000 recordings—”are already available for streaming or downloading from numerous services.”

“These recordings face no danger of being lost, forgotten, or destroyed,” their filing claimed.

But Nathan Georgitis, the executive director of the Association for Recorded Sound Collections (ARSC), told Ars that you just don’t see 78 RPM records out in the world anymore. Even in record stores selling used vinyl, these recordings will be hidden “in a few boxes under the table behind the tablecloth,” Georgitis suggested. And in “many” cases, “the problem for libraries and archives is that those recordings aren’t necessarily commercially available for re-release.”

That “means that those recordings, those artists, the repertoire, the recorded sound history in itself—meaning the labels, the producers, the printings—all of that history kind of gets obscured from view,” Georgitis said.

Currently, libraries trying to preserve this history must control access to audio collections, Georgitis said. He sees IA’s work with the Great 78 Project as a legitimate archive in that, unlike a streaming service, where content may be inconsistently available, IA’s “mission is to preserve and provide access to content over time.”

Music labels will regret coming for the Internet Archive, sound historian says Read More »

internet-archive-played-crucial-role-in-tracking-shady-cdc-data-removals

Internet Archive played crucial role in tracking shady CDC data removals


Internet Archive makes it easier to track changes in CDC data online.

When thousands of pages started disappearing from the Centers for Disease Control and Prevention (CDC) website late last week, public health researchers quickly moved to archive deleted public health data.

Soon, researchers discovered that the Internet Archive (IA) offers one of the most effective ways to both preserve online data and track changes on government websites. For decades, IA crawlers have collected snapshots of the public Internet, making it easier to compare current versions of websites to historic versions. And IA also allows users to upload digital materials to further expand the web archive. Both aspects of the archive immediately proved useful to researchers assessing how much data the public risked losing during a rapid purge following a pair of President Trump’s executive orders.

Part of a small group of researchers who managed to download the entire CDC website within days, virologist Angela Rasmussen helped create a public resource that combines CDC website information with deleted CDC datasets. Those datasets, many of which were previously in the public domain for years, were uploaded to IA by an anonymous user, “SheWhoExists,” on January 31. Moving forward, Rasmussen told Ars that IA will likely remain a go-to tool for researchers attempting to closely monitor for any unexpected changes in access to public data.

IA “continually updates their archives,” Rasmussen said, which makes IA “a good mechanism for tracking modifications to these websites that haven’t been made yet.”

The CDC website is being overhauled to comply with two executive orders from January 20, the CDC told Ars. The Defending Women from Gender Ideology Extremism and Restoring Biological Truth to the Federal Government requires government agencies to remove LGBTQ+ language that Trump claimed denies “the biological reality of sex” and is likely driving most of the CDC changes to public health resources. The other executive order the CDC cited, the Ending Radical And Wasteful Government DEI Programs And Preferencing, would seemingly largely only impact CDC employment practices.

Additionally, “the Office of Personnel Management has provided initial guidance on both Executive Orders and HHS and divisions are acting accordingly to execute,” the CDC told Ars.

Rasmussen told Ars that the deletion of CDC datasets is “extremely alarming” and “not normal.” While some deleted pages have since been restored in altered versions, removing gender ideology from CDC guidance could put Americans at heightened risk. That’s another emerging problem that IA’s snapshots could help researchers and health professionals resolve.

“I think the average person probably doesn’t think that much about the CDC’s website, but it’s not just a matter of like, ‘Oh, we’re going to change some wording’ or ‘we’re going to remove these data,” Rasmussen said. “We are actually going to retool all the information that’s there to remove critical information about public health that could actually put people in danger.”

For example, altered Mpox transmission data removed “all references to men who have sex with men,” Rasmussen said. “And in the US those are the people who are not the only people at risk, but they’re the people who are most at risk of being exposed to Mpox. So, by removing that DEI language, you’re actually depriving people who are at risk of information they could use to protect themselves, and that eventually will get people hurt or even killed.”

Likely the biggest frustration for researchers scrambling to preserve data is dealing with broken links. On social media, Rasmussen has repeatedly called for help flagging broken links to ensure her team’s archive is as useful as possible.

Rasmussen’s group isn’t the only effort to preserve the CDC data. Some are creating niche archives focused on particular topics, like journalist Jessica Valenti, who created an archive of CDC guidelines on reproductive rights issues, sexual health, intimate partner violence, and other data the CDC removed online.

Niche archives could make it easier for some researchers to quickly survey missing data in their field, but Rasmussen’s group is hoping to take next steps to make all the missing CDC data more easily discoverable in their archive.

“I think the next step,” Rasmussen said, “would be to try to fix anything in there that’s broken, but also look into ways that we could maybe make it more browsable and user-friendly for people who may not know what they’re looking for or may not be able to find what they’re looking for.”

CDC advisers demand answers

The CDC has been largely quiet about the deleted data, only pointing to Trump’s executive orders to justify removals. That could change by February 7. That’s the deadline when a congressionally mandated advisory committee to the CDC’s acting director, Susan Monarez, asked for answers in an open letter to a list of questions about the data removals.

“It has been reported through anonymous sources that the website changes are related to new executive orders that ban the use of specific words and phrases,” their letter said. “But as far as we are aware, these unprecedented actions have yet to be explained by CDC; news stories indicate that the agency is declining to comment.”

At the top of the committee’s list of questions is likely the one frustrating researchers most: “What was the rationale for making these datasets and websites inaccessible to the public?” But the committee also importantly asked what analysis was done “of the consequences of removing access to these datasets and website” prior to the removals. They also asked how deleted data would be safeguarded and when data would be restored.

It’s unclear if the CDC will be motivated to respond by the deadline. Ars reached out to one of the committee members, Joshua Sharfstein—a physician and vice dean for Public Health Practice and Community Engagement at Johns Hopkins University—who confirmed that as of this writing, the CDC has not yet responded. And the CDC did not respond to Ars’ request to comment on the letter.

Rasmussen told Ars that even temporary removals of CDC guidance can disrupt important processes keeping Americans healthy. Among the potentially most consequential pages briefly removed were recommendations from the congressionally mandated Advisory Committee on Immunization Practices (ACIP).

Those recommendations are used by insurance companies to decide who gets reimbursed for vaccines and by physicians to deduce vaccine eligibility, and Rasmussen said they “are incredibly important for the entire population to have access to any kind of vaccination.” And while, for example, the Mpox vaccine recommendations were eventually restored unaltered, Rasmussen told Ars that she suspects that “one of the reasons” preventing interference currently with ACIP is that it’s mandated by Congress.

Seemingly ACIP could be weakened by the new administration, Rasmussen suggested. She warned that Trump’s pick for CDC director, Dave Weldon, “is an anti-vaxxer” (with a long history of falsely linking vaccines to autism) who may decide to replace ACIP committee members with anti-vaccine advocates or move to dissolve ACIP. And any changes in recommendations could mean “insurance companies aren’t going to cover vaccinations [and that] physicians will not recommend vaccination.” And that could mean “vaccination will go down and we’ll start having outbreaks of some of these vaccine-preventable diseases.”

“If there’s a big polio outbreak, that is going to result in permanently disabled children, dead children—it’s really, really serious,” Rasmussen said. “So I think that people need to understand that this isn’t just like, ‘Oh, maybe wear a mask when you’re at the movie theater’ kind of CDC guidance. This is guidance that’s really fundamental to our most basic public health practices, and it’s going to cause widespread suffering and death if this is allowed to continue.”

Seeding deleted data and doing science to fight back

On Bluesky, Rasmussen led one of many charges to compile archived links and download CDC data so that researchers can reference every available government study when advancing public health knowledge.

“These data are public and they are ours,” Rasmussen posted. “Deletion disobedience is one way to fight back.”

As Rasmussen sees it, deleting CDC data is “theft” from the public domain and archiving CDC data is simply taking “back what is ours.” But at the same time, her team is also taking steps to be sure the data they collected can be lawfully preserved. Because the CDC website has not been copied and hosted on a server, they expect their archive should be deemed lawful and remain online.

“I don’t put it past this administration to try to shut this stuff down by any means possible,” Rasmussen told Ars. “And we wanted to make sure there weren’t any sort of legal loopholes that would jeopardize anybody in the group, but also that would potentially jeopardize the data.”

It’s not clear if some data has already been lost. Seemingly the same user who uploaded the deleted datasets to IA posted on Reddit, clarifying that while the “full” archive “should contain all public datasets that were available” before “anything was scrubbed,” it likely only includes “most” of the “metadata and attachments.” So, researchers who download the data may still struggle to fill in some blanks.

To help researchers quickly access the missing data, anyone can help the IA seed the datasets, the Reddit user said in another post providing seeding and mirroring instructions. Currently dozens are seeding it for a couple hundred peers.

“Thank you to everyone who requested this important data, and particularly to those who have offered to mirror it,” the Reddit user wrote.

As Rasmussen works with her group to make their archive more user-friendly, her plan is to help as many researchers as possible fight back against data deletion by continuing to reference deleted data in their research. She suggested that effort—doing science that ignores Trump’s executive orders—is perhaps a more powerful way to resist and defend public health data than joining in loud protests, which many researchers based in the US (and perhaps relying on federal funding) may not be able to afford to do.

“Just by doing things and standing up for science with your actions, rather than your words, you can really make, I think, a big difference,” Rasmussen said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Internet Archive played crucial role in tracking shady CDC data removals Read More »

the-internet-archive-and-its-916-billion-saved-web-pages-are-back-online

The Internet Archive and its 916 billion saved web pages are back online

Last week, hackers defaced the Internet Archive website with a message that said, “Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!”

HIBP is a reference to Have I Been Pwned, which was created by security researcher Troy Hunt and provides information and notifications on data breaches. The hacked Internet Archive data was sent to Have I Been Pwned and “contains authentication information for registered members, including their email addresses, screen names, password change timestamps, Bcrypt-hashed passwords, and other internal data,” BleepingComputer wrote.

Kahle said on October 9 that the Internet Archive fended off a DDoS attack and was working on upgrading security in light of the data breach and website defacement. The next day, he reported that the “DDoS folks are back” and had knocked the site offline. The Internet Archive “is being cautious and prioritizing keeping data safe at the expense of service availability,” he added.

“Services are offline as we examine and strengthen them… Estimated Timeline: days, not weeks,” he wrote on October 11. “Thank you for the offers of pizza (we are set).”

The Internet Archive and its 916 billion saved web pages are back online Read More »

internet-archive’s-e-book-lending-is-not-fair-use,-appeals-court-rules

Internet Archive’s e-book lending is not fair use, appeals court rules

Internet Archive’s e-book lending is not fair use, appeals court rules

The Internet Archive has lost its appeal after book publishers successfully sued to block the Open Libraries Project from lending digital scans of books for free online.

Judges for the Second Circuit Court of Appeals on Wednesday rejected the Internet Archive (IA) argument that its controlled digital lending—which allows only one person to borrow each scanned e-book at a time—was a transformative fair use that worked like a traditional library and did not violate copyright law.

As Judge Beth Robinson wrote in the decision, because the IA’s digital copies of books did not “provide criticism, commentary, or information about the originals” or alter the original books to add “something new,” the court concluded that the IA’s use of publishers’ books was not transformative, hobbling the organization’s fair use defense.

“IA’s digital books serve the same exact purpose as the originals: making authors’ works available to read,” Robinson said, emphasizing that although in copyright law, “[n]ot every instance will be clear cut,” “this one is.”

The appeals court ruling affirmed the lower court’s ruling, which permanently barred the IA from distributing not just the works in the suit, but all books “available for electronic licensing,” Robinson said.

“To construe IA’s use of the Works as transformative would significantly narrow―if not entirely eviscerate―copyright owners’ exclusive right to prepare (or not prepare) derivative works,” Robinson wrote.

Maria Pallante, president and CEO of the Association of American Publishers, the trade organization behind the lawsuit, celebrated the ruling. She said the court upheld “the rights of authors and publishers to license and be compensated for their books and other creative works and reminds us in no uncertain terms that infringement is both costly and antithetical to the public interest.”

“If there was any doubt, the Court makes clear that under fair use jurisprudence there is nothing transformative about converting entire works into new formats without permission or appropriating the value of derivative works that are a key part of the author’s copyright bundle,” Pallante said.

The Internet Archive’s director of library services, Chris Freeland, issued a statement on the loss, which comes after four years of fighting to maintain its Open Libraries Project.

“We are disappointed in today’s opinion about the Internet Archive’s digital lending of books that are available electronically elsewhere,” Freeland said. “We are reviewing the court’s opinion and will continue to defend the rights of libraries to own, lend, and preserve books.”

IA’s lending harmed publishers, judge says

The court’s fair use analysis didn’t solely hinge on whether IA’s digital lending of e-books was “transformative.” Judges also had to consider book publishers’ claims that IA was profiting off e-book lending, in addition to factoring in whether each work was original, what amount of each work was being copied, and whether the IA’s e-books substituted original works, depriving authors of revenue in relevant markets.

Ultimately, for each factor, judges ruled in favor of publishers, which argued that granting IA was threatening to “‘destroy the value of [their] exclusive right to prepare derivative works,’ including the right to publish their authors’ works as e-books.”

While the IA tried to argue that book publishers’ surging profits suggested that its digital lending caused no market harms, Robinson disagreed with the IA’s experts’ “ill-supported” market analysis and took issue with IA advertising “its digital books as a free alternative to Publishers’ print and e-books.”

“IA offers effectively the same product as Publishers―full copies of the Works―but at no cost to consumers or libraries,” Robinson wrote. “At least in this context, it is difficult to compete with free.”

Robinson wrote that despite book publishers showing no proof of market harms, that lack of evidence did not support IA’s case, ruling that IA did not satisfy its burden to prove it had not harmed publishers. She further wrote that it’s common sense to agree with publishers’ characterization of harms because “IA’s digital books compete directly with Publishers’ e-books” and would deprive authors of revenue if left unchecked.

“We agree with Publishers’ assessment of market harm” and “are likewise convinced” that “unrestricted and widespread conduct of the sort engaged in by [IA] would result in a substantially adverse impact on the potential market” for publishers’ e-books, Robinson wrote. “Though Publishers have not provided empirical data to support this observation, we routinely rely on such logical inferences where appropriate” when determining fair use.

Judges did, however, side with IA on the matter of whether the nonprofit was profiting off loaning e-books for free, contradicting the lower court. The appeals court disagreed with book publishers’ claims that IA profited off e-books by soliciting donations or earning a small percentage from used books sold through referral links on its site.

“Of course, IA must solicit some funds to keep the lights on,” Robinson wrote. But “IA does not profit directly from its Free Digital Library,” and it would be “misleading” to characterize it that way.

“To hold otherwise would greatly restrain the ability of nonprofits to seek donations while making fair use of copyrighted works,” Robinson wrote.

Internet Archive’s e-book lending is not fair use, appeals court rules Read More »

appeals-court-seems-lost-on-how-internet-archive-harms-publishers

Appeals court seems lost on how Internet Archive harms publishers

Deciding “the future of books” —

Appeals court decision potentially reversing publishers’ suit may come this fall.

Appeals court seems lost on how Internet Archive harms publishers

The Internet Archive (IA) went before a three-judge panel Friday to defend its open library’s controlled digital lending (CDL) practices after book publishers last year won a lawsuit claiming that the archive’s lending violated copyright law.

In the weeks ahead of IA’s efforts to appeal that ruling, IA was forced to remove 500,000 books from its collection, shocking users. In an open letter to publishers, more than 30,000 readers, researchers, and authors begged for access to the books to be restored in the open library, claiming the takedowns dealt “a serious blow to lower-income families, people with disabilities, rural communities, and LGBTQ+ people, among many others,” who may not have access to a local library or feel “safe accessing the information they need in public.”

During a press briefing following arguments in court Friday, IA founder Brewster Kahle said that “those voices weren’t being heard.” Judges appeared primarily focused on understanding how IA’s digital lending potentially hurts publishers’ profits in the ebook licensing market, rather than on how publishers’ costly ebook licensing potentially harms readers.

However, lawyers representing IA—Joseph C. Gratz, from the law firm Morrison Foerster, and Corynne McSherry, from the nonprofit Electronic Frontier Foundation—confirmed that judges were highly engaged by IA’s defense. Arguments that were initially scheduled to last only 20 minutes stretched on instead for an hour and a half. Ultimately, judges decided not to rule from the bench, with a decision expected in the coming months or potentially next year. McSherry said the judges’ engagement showed that the judges “get it” and won’t make the decision without careful consideration of both sides.

“They understand this is an important decision,” McSherry said. “They understand that there are real consequences here for real people. And they are taking their job very, very seriously. And I think that’s the best that we can hope for, really.”

On the other side, the Association of American Publishers (AAP), the trade organization behind the lawsuit, provided little insight into how the day went. When reached for comment, AAP simply said, “We thought it was a strong day in court, and we look forward to the opinion.”

Decision could come early fall

According to Gratz, most of the questions for IA focused on “how to think about the situation where a particular book is available” from the open library and also available as an ebook that a library can license. Judges said they did not know how to think about “a situation where the publishers just haven’t come forward with any data showing that this has an impact,” Gratz said.

One audience member at the press briefing noted that instead judges were floating hypotheticals, like “if every single person in the world made a copy of a hypothetical thing, could hypothetically this affect the publishers’ revenue.”

McSherry said this was a common tactic when judges must weigh the facts while knowing that their decision will set an important precedent. However, IA has shown evidence, Gratz said, that even if IA provided limitless loans of digitized physical copies, “CDL doesn’t cause any economic harm to publishers, or authors,” and “there was absolutely no evidence of any harm of that kind that the publishers were able to bring forward.”

McSherry said that IA pushed back on claims that IA behaves like “pirates” when digitally lending books, with critics sometimes comparing the open library to illegal file-sharing networks. Instead, McSherry said that CDL provides a path to “meet readers where they are,” allowing IA to loan books that it owns to one user at a time no matter where in the world they are located.

“It’s not unlawful for a library to lend a book it owns to one patron at a time,” Gratz said IA told the court. “And the advent of digital technology doesn’t change that result. That’s lawful. And that’s what librarians do.”

In the open letter, IA fans pointed out that many IA readers were “in underserved communities where access is limited” to quality library resources. Being suddenly cut off from accessing nearly half a million books has “far-reaching implications,” they argued, removing access to otherwise inaccessible “research materials and literature that support their learning and academic growth.”

IA has argued that because copyright law is intended to provide equal access to knowledge, copyright law is better served by allowing IA’s lending than by preventing it. They’re hoping the judges will decide that CDL is fair use, reversing the lower court’s decision and restoring access to books recently removed from the open library. But Gratz said there’s no telling yet when that decision will come.

“There is no deadline for them to make a decision,” Gratz said, but it “probably won’t happen until early fall” at the earliest. After that, whichever side loses will have an opportunity to appeal the case, which has already stretched on for four years, to the Supreme Court. Since neither side seems prepared to back down, the Supreme Court eventually weighing in seems inevitable.

McSherry seemed optimistic that the judges at least understood the stakes for IA readers, noting that fair use is “designed to ensure that copyright actually serves the public interest,” not publishers’. Should the court decide otherwise, McSherry warned, the court risks allowing “a few powerful publishers” to “hijack the future of books.”

When IA first appealed, Kahle put out a statement saying IA couldn’t walk away from “a fight to keep library books available for those seeking truth in the digital age.”

Appeals court seems lost on how Internet Archive harms publishers Read More »

after-32-years,-one-of-the-’net’s-oldest-software-archives-is-shutting-down

After 32 years, one of the ’Net’s oldest software archives is shutting down

Ancient server dept. —

Hobbes OS/2 Archive: “As of April 15th, 2024, this site will no longer exist.”

Box art for IBM OS/2 Warp version 3, an OS released in 1995 that competed with Windows.

Enlarge / Box art for IBM OS/2 Warp version 3, an OS released in 1995 that competed with Windows.

IBM

In a move that marks the end of an era, New Mexico State University (NMSU) recently announced the impending closure of its Hobbes OS/2 Archive on April 15, 2024. For over three decades, the archive has been a key resource for users of the IBM OS/2 operating system and its successors, which once competed fiercely with Microsoft Windows.

In a statement made to The Register, a representative of NMSU wrote, “We have made the difficult decision to no longer host these files on hobbes.nmsu.edu. Although I am unable to go into specifics, we had to evaluate our priorities and had to make the difficult decision to discontinue the service.”

Hobbes is hosted by the Department of Information & Communication Technologies at New Mexico State University in Las Cruces, New Mexico. In the official announcement, the site reads, “After many years of service, hobbes.nmsu.edu will be decommissioned and will no longer be available. As of April 15th, 2024, this site will no longer exist.”

OS/2 version 1.2, released in late 1989.

OS/2 version 1.2, released in late 1989.

os2museum.com

We reached out to New Mexico State University to inquire about the history of the Hobbes archive but did not receive a response. The earliest record we’ve found of the Hobbes archive online is this 1992 Walnut Creek CD-ROM collection that gathered up the contents of the archive for offline distribution. At around 32 years old, minimum, that makes Hobbes one of the oldest software archives on the Internet, akin to the University of Michigan’s archives and ibiblio at UNC.

Archivists such as Jason Scott of the Internet Archive have stepped up to say that the files hosted on Hobbes are safe and already mirrored elsewhere. “Nobody should worry about Hobbes, I’ve got Hobbes handled,” wrote Scott on Mastodon in early January. OS/2 World.com also published a statement about making a mirror. But it’s still notable whenever such an old and important piece of Internet history bites the dust.

Like many archives, Hobbes started as an FTP site. “The primary distribution of files on the Internet were via FTP servers,” Scott tells Ars Technica. “And as FTP servers went down, they would also be mirrored as subdirectories in other FTP servers. Companies like CDROM.COM / Walnut Creek became ways to just get a CD-ROM of the items, but they would often make the data available at http://ftp.cdrom.com to download.”

The Hobbes site is a priceless digital time capsule. You can still find the Top 50 Downloads page, which includes sound and image editors, and OS/2 builds of the Thunderbird email client. The archive contains thousands of OS/2 games, applications, utilities, software development tools, documentation, and server software dating back to the launch of OS/2 in 1987. There’s a certain charm in running across OS/2 wallpapers from 1990, and even the archive’s Update Policy is a historical gem—last updated on March 12, 1999.

The legacy of OS/2

The final major IBM release of OS/2, Warp version 4.0, as seen running in an emulator.

Enlarge / The final major IBM release of OS/2, Warp version 4.0, as seen running in an emulator.

OS/2 began as a joint venture between IBM and Microsoft, undertaken as a planned replacement for IBM PC DOS (also called “MS-DOS” in the form sold by Microsoft for PC clones). Despite advanced capabilities like 32-bit processing and multitasking, OS/2 later competed with and struggled to gain traction against Windows. The partnership between IBM and Microsoft dissolved after the success of Windows 3.0, leading to divergent paths in OS strategies for the two companies.

Through iterations like the Warp series, OS/2 established a key presence in niche markets that required high stability, such as ATMs and the New York subway system. Today, its legacy continues in specialized applications and in newer versions (like eComStation) maintained by third-party vendors—despite being overshadowed in the broader market by Linux and Windows.

A footprint like that is worth preserving, and a loss of one of OS/2’s primary archives, even if mirrored elsewhere, is a cultural blow. Apparently, Hobbes has reportedly almost disappeared before but received a stay of execution. In the comments section for an article on The Register, someone named “TrevorH” wrote, “This is not the first time that Hobbes has announced it’s going away. Last time it was rescued after a lot of complaints and a number of students or faculty came forward to continue to maintain it.”

As the final shutdown approaches in April, the legacy of Hobbes is a reminder of the importance of preserving the digital heritage of software for future generations—so that decades from now, historians can look back and see how things got to where they are today.

After 32 years, one of the ’Net’s oldest software archives is shutting down Read More »