![]() When it comes to memory usage, problem solved!Īnd as far as runtime performance goes, the streaming/chunked solution with ijson actually runs slightly faster, though this won’t necessarily be the case for other datasets or algorithms. Here’s what memory usage looks like with this approach: In this case, "item" just means “each item in the top-level list we’re iterating over” see the ijson documentation for more details. The items() API takes a query string that tells you which part of the record to return. With this API the file has to stay open because the JSON parser is reading from the file on demand, as we iterate over the records. In the previous version, using the standard library, once the data is loaded we no longer to keep the file open. items ( f, "item" ): user = record repo = record if user not in user_to_repos : user_to_repos = set () user_to_repos. Import ijson user_to_repos = with open ( "large-file.json", "rb" ) as f : for record in ijson. There are a number of Python libraries that support this style of JSON parsing in the following example, I used the ijson library. The result data structure, which in our case shouldn’t be too large. ![]()
0 Comments
Leave a Reply. |